问题：

Avro模式进化测试和问题

巴照

2023-03-14

使用以下定义的Avro模式和测试代码，在考虑Avro模式演变以及如何存储Avro数据的第一个版本并随后使用模式的第二个版本检索时，我有几个问题。在我的示例中，Person.avsc表示第一个版本，PersonWithMiddleName.avsc表示第二个版本，其中我们添加了middleName属性。

< li >有没有办法在Java中将Avro模式和二进制编码数据存储为字节数组？我们希望将Avro对象存储到DynamoDB中，并且希望将Avro数据存储为一个blob，模式存储在它旁边(就像存储到文件中一样)？作为参考，请看下面我的测试输出(二进制内容没有复制，所以这一行只是显示< code>Person现在被序列化为一个字节数组:JoeCool)并比较当< code>Person被序列化为一个字节数组时与当它在测试期间被写到< code>person.avro文件时所存储的内容。正如您所看到的，模式似乎只是用文件写出，而不是用字节数组写出。 < li >我在测试过程中遇到的AvroTypeException是否真的如我在测试的catch块中的注释中所指出的那样是预期的？在这种情况下，我已经将< code>Person对象序列化为JSON，并尝试将其反序列化为< code > personwithmedrename 。

Java测试代码

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
import org.apache.avro.AvroTypeException;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.io.JsonDecoder;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SchemaEvolutionTest {
  Logger log = LoggerFactory.getLogger(this.getClass());

  @Test
  public void createAndReadPerson() {
    // Create the Person using the Person schema
    var person = new Person();
    person.setFirstName("Joe");
    person.setLastName("Cool");
    log.info("Person has been created: {}", person);
    SpecificDatumWriter<Person> personSpecificDatumWriter =
        new SpecificDatumWriter<Person>(Person.class);
    DataFileWriter<Person> dataFileWriter = new DataFileWriter<Person>(personSpecificDatumWriter);
    try {
      dataFileWriter.create(person.getSchema(), new File("person.avro"));
      dataFileWriter.append(person);
      dataFileWriter.close();
    } catch (IOException e) {
      Assertions.fail();
    }
    log.info("Person has been written to an Avro file");

    // ******************************************************************************************************

    // Next, read as Person from the Avro file using the Person schema
    DatumReader<Person> personDatumReader =
        new SpecificDatumReader<Person>(Person.getClassSchema());
    var personAvroFile = new File("person.avro");

    DataFileReader<Person> personDataFileReader = null;
    try {
      personDataFileReader = new DataFileReader<Person>(personAvroFile, personDatumReader);
    } catch (IOException e1) {
      Assertions.fail();
    }
    Person personReadFromFile = null;
    while (personDataFileReader.hasNext()) {
      // Reuse object by passing it to next(). This saves us from
      // allocating and garbage collecting many objects for files with
      // many items.
      try {
        personReadFromFile = personDataFileReader.next(person);
      } catch (IOException e) {
        Assertions.fail();
      }
    }
    log.info("Person read from the file: {}", personReadFromFile.toString());

    // ******************************************************************************************************

    // Read the Person from the Person file as PersonWithMiddleName using only the
    // PersonWithMiddleName schema
    DatumReader<PersonWithMiddleName> personWithMiddleNameDatumReader =
        new SpecificDatumReader<PersonWithMiddleName>(PersonWithMiddleName.getClassSchema());
    DataFileReader<PersonWithMiddleName> personWithMiddleNameDataFileReader = null;
    try {
      personWithMiddleNameDataFileReader =
          new DataFileReader<PersonWithMiddleName>(personAvroFile, personWithMiddleNameDatumReader);
    } catch (IOException e1) {
      Assertions.fail();
    }
    PersonWithMiddleName personWithMiddleName = null;
    while (personWithMiddleNameDataFileReader.hasNext()) {
      // Reuse object by passing it to next(). This saves us from
      // allocating and garbage collecting many objects for files with
      // many items.
      try {
        personWithMiddleName = personWithMiddleNameDataFileReader.next(personWithMiddleName);
      } catch (IOException e) {
        Assertions.fail();
      }
    }
    log.info(
        "Now a PersonWithMiddleName has been read from the file that was written as a Person: {}",
        personWithMiddleName.toString());

    // ******************************************************************************************************

    // Serialize the Person to a byte array
    byte[] personByteArray = new byte[0];
    ByteArrayOutputStream personByteArrayOutputStream = new ByteArrayOutputStream();
    Encoder encoder = null;
    try {
      encoder = EncoderFactory.get().binaryEncoder(personByteArrayOutputStream, null);
      personSpecificDatumWriter.write(person, encoder);
      encoder.flush();
      personByteArray = personByteArrayOutputStream.toByteArray();
    } catch (IOException e) {
      log.error("Serialization error:" + e.getMessage());
    }
    log.info("The Person is now serialized to a byte array: {}", new String(personByteArray));

    // ******************************************************************************************************

    // Deserialize the Person byte array into a Person object
    BinaryDecoder binaryDecoder = null;
    Person decodedPerson = null;
    try {
      binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
      decodedPerson = personDatumReader.read(null, binaryDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info("Decoded Person from byte array {}", decodedPerson.toString());

    // ******************************************************************************************************

    // Deserialize the Person byte array into a PesonWithMiddleName object
    PersonWithMiddleName decodedPersonWithMiddleName = null;
    try {
      binaryDecoder = DecoderFactory.get().binaryDecoder(personByteArray, null);
      decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, binaryDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info(
        "Decoded PersonWithMiddleName from byte array {}", decodedPersonWithMiddleName.toString());

    // ******************************************************************************************************

    // Serialize the Person to JSON
    byte[] jsonByteArray = new byte[0];
    personByteArrayOutputStream = new ByteArrayOutputStream();
    Encoder jsonEncoder = null;
    try {
      jsonEncoder =
          EncoderFactory.get().jsonEncoder(Person.getClassSchema(), personByteArrayOutputStream);
      personSpecificDatumWriter.write(person, jsonEncoder);
      jsonEncoder.flush();
      jsonByteArray = personByteArrayOutputStream.toByteArray();
    } catch (IOException e) {
      log.error("Serialization error:" + e.getMessage());
    }
    log.info("The Person is now serialized to JSON: {}", new String(jsonByteArray));

    // ******************************************************************************************************

    // Deserialize the Person JSON into a Person object
    JsonDecoder jsonDecoder = null;
    try {
      jsonDecoder =
          DecoderFactory.get().jsonDecoder(Person.getClassSchema(), new String(jsonByteArray));
      decodedPerson = personDatumReader.read(null, jsonDecoder);
    } catch (IOException e) {
      log.error("Deserialization error:" + e.getMessage());
    }
    log.info("Decoded Person from JSON: {}", decodedPerson.toString());

    // ******************************************************************************************************

    // Deserialize the Person JSON into a PersonWithMiddleName object
    try {
      jsonDecoder =
          DecoderFactory.get()
              .jsonDecoder(PersonWithMiddleName.getClassSchema(), new String(jsonByteArray));
      decodedPersonWithMiddleName = personWithMiddleNameDatumReader.read(null, jsonDecoder);
    } catch (AvroTypeException ae) {
      // Do nothing. We expect this since JSON didn't serialize anything out.
      log.error(
          "An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: {}",ae.getMessage());
    } catch (Exception e) {
      log.error("Deserialization error:" + e.getMessage());
    }

  }
}

Person.avsc

{
    "type": "record",
    "namespace": "org.acme.avro_testing",
    "name": "Person",
    "fields": [
        {
            "name": "firstName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "lastName",
            "type": ["null", "string"],
            "default": null
        }
    ]
}

PersonWithMiddleName.avsc

{
    "type": "record",
    "namespace": "org.acme.avro_testing",
    "name": "PersonWithMiddleName",
    "fields": [
        {
            "name": "firstName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "middleName",
            "type": ["null", "string"],
            "default": null
        },
        {
            "name": "lastName",
            "type": ["null", "string"],
            "default": null
        }
    ]
}

测试输出

Person has been created: {"firstName": "Joe", "lastName": "Cool"}
Person has been written to an Avro file
Person read from the file: {"firstName": "Joe", "lastName": "Cool"}
Now a PersonWithMiddleName has been read from the file that was written as a Person: {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to a byte array: JoeCool
Decoded Person from byte array {"firstName": "Joe", "lastName": "Cool"}
Decoded PersonWithMiddleName from byte array {"firstName": "Joe", "middleName": null, "lastName": "Cool"}
The Person is now serialized to JSON: {"firstName":{"string":"Joe"},"lastName":{"string":"Cool"}}
Decoded Person from JSON: {"firstName": "Joe", "lastName": "Cool"}
An AvroTypeException occurred trying to deserialize Person JSON back into a PersonWithMiddleName. Here's the exception: Expected field name not found: middleName

person.avro

Objavro.schema�{"type":"record","name":"Person","namespace":"org.acme.avro_testing","fields":[{"name":"firstName","type":["null","string"],"default":null},{"name":"lastName","type":["null","string"],"default":null}]}

壤驷安和

2023-03-14

对于问题一，我不是Java专家，但是在Python中，不是写入实际文件，而是有一个类似文件的对象的概念，它与文件具有相同的接口，但只是写入字节缓冲区。例如，这样做：

file = open(file_name, "wb")
# use avro library to write to file
file.close()

你可以这样做：

from io import BytesIO
bytes_interface = BytesIO()
# use bytes_interface the same way you would the previous "file" object
byte_output = bytes_interface.getvalue()

因此，最终的byte_output将是通常写入文件的字节，但现在只是一个可以存储在任何地方的字节缓冲区。Java有这样的概念吗？或者，如果您绝对必须执行写入实际临时文件的过程，我假设Java有某种方法可以将文件内容读回字节缓冲区。

对于问题二，我认为您正在遇到此Jira票中提到的相同问题：https://issues.apache.org/jira/browse/AVRO-2890。目前，JSON 解码器需要编写数据的架构，并且不能使用与写入数据时使用的架构不同的架构进行任何类型的架构演进。

Avro模式进化测试和问题

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档