如何解码List的byte [] 到数据集在火花？

劳灵均

2023-03-14

问题内容：

我在我的项目中将spark-sql-2.3.1v和kafka与java8一起使用。我正在尝试将主题接收的byte []转换为kafka使用者方面的数据集。

这是详细信息

我有

class Company{
    String companyName;
    Integer companyId;
}

我定义为

public static final StructType companySchema = new StructType(
              .add("companyName", DataTypes.StringType)
              .add("companyId", DataTypes.IntegerType);

但是消息定义为

class Message{
    private List<Company> companyList;
    private String messageId;
}

我试图定义为

StructType messageSchema = new StructType()
            .add("companyList", DataTypes.createArrayType(companySchema , false),false)
            .add("messageId", DataTypes.StringType);

我使用序列化将消息作为byte []发送到kafka主题。

我在Consumer上成功接收到消息字节[]。我正在尝试将其转换为数据集?? 怎么做？

   Dataset<Row> messagesDs = kafkaReceivedStreamDs.select(from_json(col("value").cast("string"), messageSchema ).as("messages")).select("messages.*");

  messagesDs.printSchema();

  root
         |-- companyList: array (nullable = true)
         |    |-- element: struct (containsNull = true)
         |    |    |-- companyName: string (nullable = true)
         |    |    |-- companyId: integer (nullable = true)
         |-- messageId: string (nullable = true)

Dataset<Row> comapanyListDs = messagesDs.select(explode_outer(col("companyList")));

comapanyListDs.printSchema();

root
 |-- col: struct (nullable = true)
 |    |-- companyName: string (nullable = true)
 |    |-- companyId: integer (nullable = true)



Dataset<Company> comapanyDs = comapanyListDs.as(Encoders.bean(Company.class));

出现错误：

线程“主”中的异常org.apache.spark.sql.AnalysisException：无法解析companyName给定的输入列：[col];

如何获取数据集记录，如何获取？

问题答案：

爆炸时，您的结构以“ col”命名。

由于您的Bean类没有“ col”属性，因此失败，并提到了错误。

线程“主要” org.apache.spark.sql.AnalysisException中的异常：在给定输入列的情况下，无法解析“
companyName”：[col];

您可以执行以下选择以使相关列作为普通列：诸如此类：

    Dataset<Row> comapanyListDs = messagesDs.select(explode_outer(col("companyList"))).
select(col("col.companyName").as("companyName"),col("col.companyId").as("companyId"));

我还没有测试语法，但是一旦从struct的每一行中获取普通列，都必须立即进行下一步。

如何解码List的byte [] 到数据集在火花？

相关阅读

相关文章

相关问答

相关工具

相关文档

如何解码List的byte [] 到数据集 在火花？

相关阅读

相关文章

相关问答

相关工具

相关文档

如何解码List的byte [] 到数据集在火花？