当前位置: 首页 > 知识库问答 >
问题:

尝试用datastax cassandra连接器启动spark thrift服务器

高朝明
2023-03-14

我已经启动了spark-thrift服务器,并使用Beeline连接到thrift服务器。当尝试查询时,创建一个表在hive转移,我得到以下错误。

create table meeting_details using org.apache.spark.sql.cassandra options (keyspace ‘TravelData’, table ‘meeting_details’)
select * from meeting_details

cassandra不是有效的Spark SQL数据源。

0:jdbc:hive2:/localhost:10000>select*from traveldata.employee_details;

Error: org.apache.hive.service.cli.HiveSQLException: Error running query: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.cassandra is not a valid Spark SQL Data Source.
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
    at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.cassandra is not a valid Spark SQL Data Source.
    at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
    at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
    at org.sparkproject.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.sparkproject.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
    at org.sparkproject.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
    at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
    at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
    at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
    at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:155)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:249)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:288)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:278)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$4(AnalysisHelper.scala:113)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:113)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$4(AnalysisHelper.scala:113)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
    at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:113)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:73)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:72)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:278)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:243)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:216)
    at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
    at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
    at scala.collection.immutable.List.foldLeft(List.scala:89)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:213)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:205)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:205)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:196)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:190)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:155)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:183)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
    at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:183)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:174)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:173)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
    at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
    at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63)
    at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:325)
    ... 16 more
Caused by: org.apache.spark.sql.AnalysisException: org.apache.spark.sql.cassandra is not a valid Spark SQL Data Source.
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:431)
    at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:261)
    at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
    at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
    at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
    ... 89 more (state=,code=0)
0: jdbc:hive2://localhost:10000> Closing: 0: jdbc:hive2://localhost:10000
^C%

共有1个答案

长孙承嗣
2023-03-14

您需要以与启动spark-shell/pyspark/spark-submit->相同的方式启动thrift server,您需要指定包和所有其他属性(请参阅quickstart文档):

sbin/start-thriftserver.sh \
  --packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.1 \
  --conf spark.cassandra.connection.host=127.0.0.1 \
  --conf spark.sql.extensions=com.datastax.spark.connector.CassandraSparkExtensions \
  --conf spark.sql.catalog.mycatalog=com.datastax.spark.connector.datasource.CassandraCatalog

然后使用:

>bin/beeline
Beeline version 2.3.7 by Apache Hive
beeline> !connect   jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
Enter password for jdbc:hive2://localhost:10000:
Connected to: Spark SQL (version 3.0.1)
Driver: Hive JDBC (version 2.3.7)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> SHOW NAMESPACES FROM mycatalog;
+------------+
| namespace  |
+------------+
| test       |
| zep        |
+------------+
2 rows selected (3,072 seconds)
0: jdbc:hive2://localhost:10000> SHOW TABLES FROM mycatalog.test;
+------------+-------------+
| namespace  |  tableName  |
+------------+-------------+
| test       | jtest1      |
| test       | roadworks5  |
| test       | zep1        |
+------------+-------------+
3 rows selected (0,139 seconds)
 类似资料:
  • 问题内容: 环境文件: Routes.php: 我得到的错误: *Connector.php第55行中的 *PDOException : SQLSTATE [HY000] [2002]连接尝试失败,因为一段时间后连接方未正确响应,或者由于连接的主机未能响应,所以建立的连接失败。 我的问题是: 我正在尝试从计算机连接到远程MySQL服务器 而且我不明白为什么它不起作用? 我应该怎么做才能连接? 我想

  • 我试图改变端口,我也这样做了,但它显示了相同的消息。“正在尝试启动mysql”,一个弹出窗口即将启动net解决方案。Apache端口已更改:80--

  • 我一直在忙着用angularjs前端构建一个REST应用程序,使用MAVEN jersey-quickstart-webapp,使用GLASSFISH WebServer在Eclipse上开发。今天当我开始在项目上做一些开发时,当我尝试在eclipse中启动服务器并部署webapp时,我得到了eclipse错误窗口中显示的以下错误消息: 需要注意的是,我并没有故意更改Eclipse或GlassFi

  • 我试图使用System.net.ftpWebResponse连接到FTP服务器,但遇到了TLS问题; 如果我使用此配置: 我得到这个错误: 正确的配置是什么? ------更新我不知道它是否有任何相关性,但我尝试了一个工具来检查ftp服务器,我得到了这个;我真的不知道这些意味着什么 通过NPN+ALPN以外的套接字测试协议 您不应继续,因为未检测到任何协议。如果你真的真的想,说“YES”-->YE

  • 问题内容: 当设备在android上启动时,我一直在尝试启动服务,但无法正常工作。我已经看了许多在线链接,但是这些代码都不起作用。我忘记了什么吗? 广播接收器 问题答案: 作为附加信息:BOOT_COMPLETE在挂载外部存储之前发送到应用程序。因此,如果将应用程序安装到外部存储,它将不会收到BOOT_COMPLETE广播消息。

  • 我正在尝试连接到另一个集成身份验证的盒子上的SQLServer 2008。我的环境包括64位Java7、64位Eclipse和64位Windows 7。我使用了Microsoft提供的JDBC驱动程序。使用32位Java一切正常。但是,使用64位Java,我得到了以下信息: “警告:无法加载sqljdbc_auth.dll原因:C:\程序文件 (x86)\适用于 SQL Server\sqljdb