当前位置: 首页 > 知识库问答 >
问题:

蜂巢:Kryo例外

壤驷敏学
2023-03-14

我正在执行一个HQL查询,该查询几乎没有连接、联合和插入覆盖操作,如果只运行一次,它就可以正常工作。
如果我第二次执行相同的作业,我就会面临这个问题。有人能帮我确定在哪种情况下我们会得到这个异常吗?

Error: java.lang.RuntimeException: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
    at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:364)
    at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:275)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:254)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:440)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:433)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:587)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:169)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 107
Serialization trace:
rowSchema (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
parentOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

共有1个答案

吕晟睿
2023-03-14

我尝试设置hive.exec.parallel=false;,然后它成功地运行了,尽管速度较慢。我的代码是:

SELECT
    CASE WHEN a.did IS NOT NULL THEN a.did ELSE b.did END AS device_id,
    CASE WHEN a.did IS NOT NULL THEN a.package ELSE b.package END AS package,
    CASE WHEN a.did IS NOT NULL THEN a.channel ELSE b.channel END AS channel,
    CASE WHEN a.did IS NOT NULL THEN a.time ELSE b.time END AS time
FROM
    (SELECT
      a1.package,
      a1.did,
      MIN(a1.source) AS channel,
      MIN(a1.time) AS time
    FROM
      (SELECT * FROM thetable
        WHERE date_hour = "20160601"
          AND source_type IN ('A', 'B', 'C')
      ) a1
      JOIN
      (SELECT
        package AS package,
        did AS did,
        MIN(time) AS time
      FROM thetable
      WHERE date_hour = "20160601"
        AND source_type IN ('A', 'B', 'C')
      GROUP BY package, did
      ) min
      ON (a1.package = min.package
        AND a1.did = min.did
        AND a1.time = min.time)
    GROUP BY a1.package, a1.did
    ) a
    FULL OUTER JOIN
    (SELECT
      a1.package,
      a1.did,
      MIN(a1.source) AS channel,
      MIN(a1.time) AS time
    FROM
      (SELECT * FROM thetable
        WHERE date_hour = "20160601"
          AND source_type IN ('D')
      ) a1
      JOIN
      (SELECT
        package AS package,
        did AS did,
        MIN(time) AS time
      FROM thetable
      WHERE date_hour = "20160601"
        AND source_type IN ('D')
      GROUP BY package, did
      ) min
      ON (a1.package = min.package
        AND a1.did = min.did
        AND a1.time = min.time)
    GROUP BY a1.package, a1.did
    ) b
    ON (a.package = b.package AND a.did = b.did);
 类似资料:
  • 蜂巢平台(OpenComb Platform)是一个基于 PHP 5.3 实现的深度云计算应用框架。蜂巢采用了扩展模式,系统中的功能和特性,都由扩展提供。 因此,用户可以通过开发和安装扩展来部署各种类型的互联网应用。

  • 我按照多个教程尝试使用RJDBC连接到Hive,但没有成功。 以下是我所拥有的: 我已经下载并放置在的文件。 我也尝试了最近的版本,但总是与相同的Cloudera版本同步。即使我的版本是5。XX。 我很确定是正确的,因为我已经使它在Python中与具有相同主机名/端口的一起工作。 错误: 错误在. jcall(drv@jdrv,"Ljava/sql/Connection;","Connection

  • 我正在尝试连接到R中的hive。我已经在我的R环境中加载了RJDBC和rJava库。我使用的是一台Linux服务器,hadoop(hortonworks sandbox 2.1)和R(3.1.1)安装在同一个盒子中。这是我用来连接的脚本: 我得到了这个错误: 错误。jcall(drv@jdrv,“Ljava/sql/Connection;”,“连接”,如图所示。字符(url)[1],:java。l

  • 在配置单元中执行select语句时,我得到了一个错误。

  • 我是火花三角洲湖的新手。我正在创建三角洲表顶部的配置单元表。我有必要的jars delta-core-shaded-assembly2.11-0.1.0.jar,hive-delta2.11-0.1.0.jar;在配置单元类路径中。设置以下属性。 但是在创建表时 两个表的架构匹配。堆栈详细信息:Spark:2.4.4Hive:1.2.1 任何帮助都是非常感谢的。提前谢了。