当前位置: 首页 > 知识库问答 >
问题:

配置单元火花子进程在连接回之前退出

范成周
2023-03-14

我在spark上运行简单查询时遇到了一个问题,如

select * from table_name

在蜂巢控制台上,一切都运行良好,但是当我执行时

select count(*) from table_name 

查询终止时出现以下错误:

Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

版本:

hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8

我已经设置:spark.masterhive-site.xml现在我得到:java.util.concurrent.执行异常:java.lang.运行异常:取消客户端'8ffe7ea3-aaf4-456c-ae18-23c572a766c5'。错误:子进程在连接回io.netty.util.concurrent.AbstractFuture.get之前退出(AbstractFuture.java:37)~[netty-all-4.0.23。final.jar:4.0.23。最终]在org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:101)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hive.spark.client.SparkClientFactory.create客户端(SparkClientFactory.java:80)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hadoop.hive.ql.exec.spark.--plhd--17/>)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hadoop.hive.ql.exec.spark.Remote teHiveSparkClient.(java: 94)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark.HiveSparkClientFactory. createHiveSparkClient(HiveSparkClientFactory. java: 63)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. spak. session.SparkSessionInp. open(SparkSessionInp. java: 55)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark. session.[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. sparkSessionManagerInv. getsession(SparkSessionManagerInv. java: 114)[hive-exec-2.0.0. jar: 2.0.0]。java: 131)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark.在org. apache. hadoop. hive. ql. exc执行(SparkTaskc. java: 106)[hive-exec-2.0.0. jar: 2.0.0]。java: 158)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc.taskRunner. runSequential(TaskRunner. java: 101)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql。[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.在org. apache. hadoop. hive. ql上执行(Driver. java: 1584)[hive-exec-2.0.0. jar: 2.0.0]。[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.Driver. run(Driver. java: 1184)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.Driver. run(Driver. java: 1172)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.在org. apache. hadoop. hive. cliv. cli上的[hive-cli-2.0.0. jar: 2.0.0]。java: 184)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. clii.java: 400)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cliv.CliDriver. run(CliDriver. java: 717)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.CliDriver. main(CliDriver. java: 645)[hive-cli-2.0.0. jar: 2.0.0]at sun-反射。Nativemethod odAccessorImp. invke0(本机方法)~[?: 1.8.0_77]at sun. reff.NativeMEDAccessorInp. invoke(NativeMEDAccessorInp. java: 62)~[?: 1.8.0_77]at sun.反射。0_77]方法调用(方法. java: 498)~[?: 1.8。0_77]在org. apache. hadoop. util。RunJar. run(RunJar. java: 221)在org. apache. hadoop. util上的[Spark-assating-1.6.0-hadoop2.6.0. jar: 1.6.0]。RunJar. main(RunJar. java: 136)[火花-汇编-1.6.0-hadoop2.6.0. jar: 1.6.0]引起:java. lang.取消客户端8ffe7ea3-aaf4-456c-ae18-23c572a766c5。错误:子进程在连接回org. apache. hive. spak. client. rpc之前退出。在org. apache. hive. Spark. client上的[hive-exec-2.0.0. jar: 2.0.0]。SparkClientIm3 dollars. run(SparkClientImp. java: 450)~[hive-exec-2.0.0. jar: 2.0.0]at java. lang.Thread. run(Thread. java: 745)~[?: 1.80_77] 16/05/16 18:00:33[驱动程序]: WARN客户端。SparkClientImpl:代码1退出的子进程

我已经构建了Spark 1.6.1和hive 2.0.0,因此错误已更改为:

Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
    at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
    at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
    at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
    at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

共有1个答案

傅鸿波
2023-03-14

我和你在Hive 2.0上讨论了同一个问题。0和Spark 1.6。1.如前所述,这一问题已在一些会议上讨论过。阿帕奇。org/jira/browse/HIVE-9970。

话虽如此,对于蜂巢:

  1. 下载Hive源程序包

对于Spark:

  1. 下载Spark源码包
  2. 在pom.xml设置正确的Hadoop版本
  3. 使用构建没有蜂巢的火花。/make-distribution.sh--name"hadoop2-non-hive"--tgz"-Pyarn,hadoop提供,hadoop 2.6,拼花提供"
  4. 结果在dist/。配置spark-defaults.conf.

由于您在没有Hadoop的情况下构建了Spark,因此需要将Hadoop包jars路径包含到$Spark\u DIST\u类路径中。请参阅此文档页。此外,您还可以阅读Spark上的蜂巢指南作为参考。

 类似资料:
  • 我正在研究建立一个JDBC Spark连接,以便从r/Python使用。我知道和都是可用的,但它们似乎更适合交互式分析,特别是因为它们为用户保留了集群资源。我在考虑一些更类似于Tableau ODBC Spark connection的东西--一些更轻量级的东西(据我所知),用于支持简单的随机访问。虽然这似乎是可能的,而且有一些文档,但(对我来说)JDBC驱动程序的需求是什么并不清楚。 既然Hiv

  • 我正在尝试连接Hive数据库与我的Java代码。我搜索了很多关于Hive_Client的信息,但是有很多错误和依赖,有人能帮我找到代码和库文件吗。

  • 更新:恰恰相反。实际上,我们的表非常大,就像3个TB有2000个分区。3TB/256MB实际上会达到11720,但我们的分区数量与表的物理分区数量完全相同。我只想了解任务是如何在数据量上生成的。

  • “无法加载db驱动程序类:com.microsoft.sqlserver.jdbc.sqlserverdriver” 有什么想法如何构造连接字符串吗?考虑servername=servername。

  • 我试图为我的本地配置单元服务器实例(thrift)创建一个连接和getMataData()。 以下是我正在尝试的代码: 线程“main”java.lang.noClassDefFounderror:org/apache/hadoop/hive/metaexception在org.apache.hadoop.hive.jdbc.hivedriver.connect(Hivedriver.java:1