我在spark上运行简单查询时遇到了一个问题,如
select * from table_name
在蜂巢控制台上,一切都运行良好,但是当我执行时
select count(*) from table_name
查询终止时出现以下错误:
Query ID = ab_20160515134700_795fc14c-e89b-4172-bcc6-0cfcffadcd88
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = d5e1856e-de67-4e2d-a914-ca1aae324b7f
Status: SENT
Failed to execute spark task, with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
版本:
hadoop-2.7.2
apache-hive-2.0.0
spark-1.6.0-bin-hadoop2
scala: 2.11.8
我已经设置:spark.masterhive-site.xml现在我得到:java.util.concurrent.执行异常:java.lang.运行异常:取消客户端'8ffe7ea3-aaf4-456c-ae18-23c572a766c5'。错误:子进程在连接回io.netty.util.concurrent.AbstractFuture.get之前退出(AbstractFuture.java:37)~[netty-all-4.0.23。final.jar:4.0.23。最终]在org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:101)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hive.spark.client.SparkClientFactory.create客户端(SparkClientFactory.java:80)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hadoop.hive.ql.exec.spark.--plhd--17/>)[hive-exec-2.0.0.jar: 2.0.0]在org.apache.hadoop.hive.ql.exec.spark.Remote teHiveSparkClient.(java: 94)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark.HiveSparkClientFactory. createHiveSparkClient(HiveSparkClientFactory. java: 63)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. spak. session.SparkSessionInp. open(SparkSessionInp. java: 55)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark. session.[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. sparkSessionManagerInv. getsession(SparkSessionManagerInv. java: 114)[hive-exec-2.0.0. jar: 2.0.0]。java: 131)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc. Spark.在org. apache. hadoop. hive. ql. exc执行(SparkTaskc. java: 106)[hive-exec-2.0.0. jar: 2.0.0]。java: 158)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql. exc.taskRunner. runSequential(TaskRunner. java: 101)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql。[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.在org. apache. hadoop. hive. ql上执行(Driver. java: 1584)[hive-exec-2.0.0. jar: 2.0.0]。[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.Driver. run(Driver. java: 1184)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. ql.Driver. run(Driver. java: 1172)[hive-exec-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.在org. apache. hadoop. hive. cliv. cli上的[hive-cli-2.0.0. jar: 2.0.0]。java: 184)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. clii.java: 400)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cliv.CliDriver. run(CliDriver. java: 717)[hive-cli-2.0.0. jar: 2.0.0]at org. apache. hadoop. hive. cil.CliDriver. main(CliDriver. java: 645)[hive-cli-2.0.0. jar: 2.0.0]at sun-反射。Nativemethod odAccessorImp. invke0(本机方法)~[?: 1.8.0_77]at sun. reff.NativeMEDAccessorInp. invoke(NativeMEDAccessorInp. java: 62)~[?: 1.8.0_77]at sun.反射。0_77]方法调用(方法. java: 498)~[?: 1.8。0_77]在org. apache. hadoop. util。RunJar. run(RunJar. java: 221)在org. apache. hadoop. util上的[Spark-assating-1.6.0-hadoop2.6.0. jar: 1.6.0]。RunJar. main(RunJar. java: 136)[火花-汇编-1.6.0-hadoop2.6.0. jar: 1.6.0]引起:java. lang.取消客户端8ffe7ea3-aaf4-456c-ae18-23c572a766c5。错误:子进程在连接回org. apache. hive. spak. client. rpc之前退出。在org. apache. hive. Spark. client上的[hive-exec-2.0.0. jar: 2.0.0]。SparkClientIm3 dollars. run(SparkClientImp. java: 450)~[hive-exec-2.0.0. jar: 2.0.0]at java. lang.Thread. run(Thread. java: 745)~[?: 1.80_77] 16/05/16 18:00:33[驱动程序]: WARN客户端。SparkClientImpl:代码1退出的子进程
我已经构建了Spark 1.6.1和hive 2.0.0,因此错误已更改为:
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/Iterable
at org.apache.hadoop.hive.ql.parse.spark.GenSparkProcContext.<init>(GenSparkProcContext.java:163)
at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.generateTaskTree(SparkCompiler.java:195)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:258)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10861)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:239)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:437)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:329)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1158)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1253)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: scala.collection.Iterable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
我和你在Hive 2.0上讨论了同一个问题。0和Spark 1.6。1.如前所述,这一问题已在一些会议上讨论过。阿帕奇。org/jira/browse/HIVE-9970。
话虽如此,对于蜂巢:
对于Spark:
构建没有蜂巢的火花。/make-distribution.sh--name"hadoop2-non-hive"--tgz"-Pyarn,hadoop提供,hadoop 2.6,拼花提供"
dist/
。配置spark-defaults.conf.由于您在没有Hadoop的情况下构建了Spark,因此需要将Hadoop包jars路径包含到$Spark\u DIST\u类路径中。请参阅此文档页。此外,您还可以阅读Spark上的蜂巢指南作为参考。
我正在研究建立一个JDBC Spark连接,以便从r/Python使用。我知道和都是可用的,但它们似乎更适合交互式分析,特别是因为它们为用户保留了集群资源。我在考虑一些更类似于Tableau ODBC Spark connection的东西--一些更轻量级的东西(据我所知),用于支持简单的随机访问。虽然这似乎是可能的,而且有一些文档,但(对我来说)JDBC驱动程序的需求是什么并不清楚。 既然Hiv
我正在尝试连接Hive数据库与我的Java代码。我搜索了很多关于Hive_Client的信息,但是有很多错误和依赖,有人能帮我找到代码和库文件吗。
更新:恰恰相反。实际上,我们的表非常大,就像3个TB有2000个分区。3TB/256MB实际上会达到11720,但我们的分区数量与表的物理分区数量完全相同。我只想了解任务是如何在数据量上生成的。
“无法加载db驱动程序类:com.microsoft.sqlserver.jdbc.sqlserverdriver” 有什么想法如何构造连接字符串吗?考虑servername=servername。
我试图为我的本地配置单元服务器实例(thrift)创建一个连接和getMataData()。 以下是我正在尝试的代码: 线程“main”java.lang.noClassDefFounderror:org/apache/hadoop/hive/metaexception在org.apache.hadoop.hive.jdbc.hivedriver.connect(Hivedriver.java:1