当前位置: 首页 > 知识库问答 >
问题:

avro-python3不提供模式演化?

马嘉勋
2023-03-14

我尝试使用avro-python3(向后兼容性)重新创建一个模式演变案例。

我有两个模式:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

schema_v1 = avro.schema.Parse("""
{
     "type": "record",
     "namespace": "com.example",
     "name": "CustomerV1",
     "fields": [
       { "name": "first_name", "type": "string", "doc": "First Name of Customer" },
       { "name": "last_name", "type": "string", "doc": "Last Name of Customer" },
       { "name": "age", "type": "int", "doc": "Age at the time of registration" },
       { "name": "height", "type": "float", "doc": "Height at the time of registration in cm" },
       { "name": "weight", "type": "float", "doc": "Weight at the time of registration in kg" },
       { "name": "automated_email", "type": "boolean", "default": true, "doc": "Field indicating if the user is enrolled in marketing emails" }
     ]
}
""")

schema_v2 = avro.schema.Parse("""
{
     "type": "record",
     "namespace": "com.example",
     "name": "CustomerV2",
     "fields": [
       { "name": "first_name", "type": "string", "doc": "First Name of Customer" },
       { "name": "last_name", "type": "string", "doc": "Last Name of Customer" },
       { "name": "age", "type": "int", "doc": "Age at the time of registration" },
       { "name": "height", "type": "float", "doc": "Height at the time of registration in cm" },
       { "name": "weight", "type": "float", "doc": "Weight at the time of registration in kg" },
       { "name": "phone_number", "type": ["null", "string"], "default": null, "doc": "optional phone number"},
       { "name": "email", "type": "string", "default": "missing@example.com", "doc": "email address"}
     ]
}
""")

第二个模式没有automated_email字段,但有两个附加字段:phone_numberemail

根据avro模式演化规则,如果我用schema_v1写入avro记录:

writer = DataFileWriter(open("customer_v1.avro", "wb"), DatumWriter(), schema_v1)
writer.append({
    "first_name": "John",
    "last_name": "Doe",
    "age" : 34, 
    "height": 178.0,
    "weight": 75.0,
    "automated_email": True
})
writer.close()

…我可以使用schema_v2读取它,前提是不存在字段有默认值

reader = DataFileReader(open("customer_v1.avro", "rb"), DatumReader(reader_schema=schema_v2))

for field in reader:
    print(field)

reader.close()

但我得到了以下错误:

SchemaResolutionException: Schemas do not match.

我知道这在Java中有效。这是一个视频课程的示例。有没有办法让它在python中工作?

共有2个答案

仲孙兴平
2023-03-14

如果您将第二个模式从CustomerV2更改为CustomerV1,它适用于avro-python3版本1.10.0。

艾骏喆
2023-03-14

fastavro,另一种python实现,可以很好地处理这个问题。

用第一个模式编写的代码如下:

s1 = {
    "type": "record",
    "namespace": "com.example",
    "name": "CustomerV1",
    "fields": [
        {"name": "first_name", "type": "string", "doc": "First Name of Customer"},
        {"name": "last_name", "type": "string", "doc": "Last Name of Customer"},
        {"name": "age", "type": "int", "doc": "Age at the time of registration"},
        {
            "name": "height",
            "type": "float",
            "doc": "Height at the time of registration in cm",
        },
        {
            "name": "weight",
            "type": "float",
            "doc": "Weight at the time of registration in kg",
        },
        {
            "name": "automated_email",
            "type": "boolean",
            "default": True,
            "doc": "Field indicating if the user is enrolled in marketing emails",
        },
    ],
}

record = {
    "first_name": "John",
    "last_name": "Doe",
    "age": 34,
    "height": 178.0,
    "weight": 75.0,
    "automated_email": True,
}

import fastavro

with open("test.avro", "wb") as fp:
    fastavro.writer(fp, fastavro.parse_schema(s1), [record])

用第二种模式阅读:

s2 = {
    "type": "record",
    "namespace": "com.example",
    "name": "CustomerV2",
    "fields": [
        {"name": "first_name", "type": "string", "doc": "First Name of Customer"},
        {"name": "last_name", "type": "string", "doc": "Last Name of Customer"},
        {"name": "age", "type": "int", "doc": "Age at the time of registration"},
        {
            "name": "height",
            "type": "float",
            "doc": "Height at the time of registration in cm",
        },
        {
            "name": "weight",
            "type": "float",
            "doc": "Weight at the time of registration in kg",
        },
        {
            "name": "phone_number",
            "type": ["null", "string"],
            "default": None,
            "doc": "optional phone number",
        },
        {
            "name": "email",
            "type": "string",
            "default": "missing@example.com",
            "doc": "email address",
        },
    ],
}

import fastavro

with open("test.avro", "rb") as fp:
    for record in fastavro.reader(fp, fastavro.parse_schema(s2)):
        print(record)

输出为预期的新字段:

{'first_name': 'John', 'last_name': 'Doe', 'age': 34, 'height': 178.0, 'weight': 75.0, 'phone_number': None, 'email': 'missing@example.com'}
 类似资料:
  • 我有两个问题: > 我曾尝试使用模式V1编写记录,并使用模式V2读取记录,但出现以下错误: org.apache.avro。AvroTypeException:找到foo,应为foo 我使用avro-1.7.3和: 以下是这两种模式的示例(我也尝试过添加命名空间,但没有成功)。 架构V1: 架构V2: 提前谢谢。

  • 我正试图了解更多关于我们在Kafka主题中使用的Avro模式的信息,我对这一点相对来说比较陌生。 我想知道是否有一种方法可以在特定情况下发展模式。我们用一个不能为null的新字段或任何默认值来更新模式,因为这些新字段是标识符。解决这个问题的方法是创建新主题,但是有没有更好的方法来改进现有模式?

  • 当我试图用一个简单的Java程序测试Avro模式演化时,我得到了一个< code>ClassCastException。 Avro版本: 你能让我知道如何解决这个错误吗?

  • 我正在使用JAX-WS开发WebService(我在jaxws maven插件上使用wsimport目标)。我编写了一个导入XSD模式的WSDL。 此外,我还生成了web服务类,并创建了endpoint和all。到目前为止,一切都很顺利。当我在Tomcat7上运行服务时,一切都正常。我可以从以下位置访问浏览器中的wsdl: 但是我无法访问xsd模式。问题是在这个wsdl: 当然,在生成类的过程中,

  • 如果我使用模式版本1序列化一个对象,然后将模式更新为版本2(比如添加一个字段),那么在以后反序列化该对象时是否需要使用模式版本2?理想情况下,我只希望使用模式版本2,并使反序列化对象具有在对象最初序列化后添加到模式中的字段的默认值。 也许一些代码会更好地解释... 架构 1: 方案2: 使用通用非代码生成方法: 导致EOFException。使用会导致AvroTypeException。 我知道如

  • 我在两个独立的AVCS模式文件中定义了记录的两个版本。我使用命名空间来区分版本SimpleV1.avsc 示例JSON 版本2只是有一个带有默认值的附加描述字段。 SimpleV2.avsc 示例JSON 这两个模式都序列化为Java类。在我的示例中,我将测试向后兼容性。V1写入的记录应由使用V2的读取器读取。我希望看到插入默认值。只要我不使用枚举,这就可以工作。 检查读者作家兼容性方法确认模式是