当前位置: 首页 > 工具软件 > Apache Avro > 使用案例 >

Apache Avro Java手册

朱天逸
2023-12-01

Defining a schema

    Avro schema使用Json定义。schema由原始类型(null,boolean,int,long,float,double,byte和string)和复杂类型(record,enum,array,map,union,fixed)组成。
{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
    schema定义了一个代表user的record。一个record的最小定义必须包含类型("type":"record"),名称("name":"user")和fields。我们也可以定义命名空间("namespace":"example.avro"),它将与name属性一起使用构成全名(example.avro.User)。
    Fileds定义为对象数组,其中每个定义了name和type。

Serializing and deserializing with code generation

Compiling the schema

    Code generation允许我们自动创建基于schema的类。一旦我们定义了相关你的类,在程序中就没有必要直接使用schema。
    java -jar /path/to/avro-tools-1.7.3.jar compile schema <schema file> <destination>

Creating Users

    代码生成后,使用以下代码demo来创建user。
User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite color null
// Alternate constructor
User user2 = new User("Ben", 7, "red");
// Construct via builder
User user3 = User.newBuilder()
             .setName("Charlie")
             .setFavoriteColor("blue")
             .setFavoriteNumber(null)
             .build();

Serializing

// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class);
DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter);
dataFileWriter.create(user1.getSchema(), new File("users.avro"));
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();

    DatumWriter将Java对象转换内存中序列化格式,SpecificDatumWriter与生成的class使用,从特定的生成类型中抽取schema。DataFileWriter写入序列化records和schema。

Deserializing

// Deserialize Users from disk
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class);
DataFileReader<User> dataFileReader = new DataFileReader<User>(file, userDatumReader);
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
    SpecificDatumReader转换内存中序列化items为生成class的实例。DataFileReader读取磁盘上的文件。将user对象传递给next方法,重用user对象。

Serializing and deserializing without code generation

Creating users

Schema schema = new Parser().parse(new File("user.avsc"));
GenericRecord user1 = new GenericData.Record(schema);
user1.put("name", "Alyssa");
user1.put("favorite_number", 256);
// Leave favorite color null
GenericRecord user2 = new GenericData.Record(schema);
user2.put("name", "Ben");
user2.put("favorite_number", 7);
user2.put("favorite_color", "red");          

由于没有使用code generation,使用GenericRecord替代user。GenericRecord使用schema来验证有效的field。如果我们设置不存在的field,如user1.put("favorite_animal","cat"),会跑抛出异常。

Serializing

// Serialize user1 and user2 to disk
File file = new File("users.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);
DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);
dataFileWriter.create(schema, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.close();

Deserializing

// Deserialize users from disk
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(schema);
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, datumReader);
GenericRecord user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);    

 类似资料: