当前位置: 首页 > 知识库问答 >
问题:

无法用Phoenix/Spark加载DataFrame

赫连明诚
2023-03-14

我在Phoenix中创建了一个名为'test'的表,我可以从Phoenix查询它,也可以在HBase的shell中扫描它。我正试图使用Phoenix-Spark库,如下所示,但DataFrame未被填充:

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.phoenix.spark._
val hadoopConf: Configuration = new Configuration()
val hbConf: Configuration = HBaseConfiguration.create(hadoopConf)
val df = sqlContext.phoenixTableAsDataFrame("TEST", Array("foo", "bar"), conf = hbConf)

相反,我得到的是以下内容:

16/05/11 11:10:47 INFO MemoryStore: ensureFreeSpace(413840) called with curMem=0, maxMem=4445479895
16/05/11 11:10:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 404.1 KB, free 4.1 GB)
16/05/11 11:10:47 INFO MemoryStore: ensureFreeSpace(27817) called with curMem=413840, maxMem=4445479895
16/05/11 11:10:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 27.2 KB, free 4.1 GB)
16/05/11 11:10:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39319 (size: 27.2 KB, free: 4.1 GB)
16/05/11 11:10:47 INFO SparkContext: Created broadcast 0 from newAPIHadoopRDD at PhoenixRDD.scala:41
16/05/11 11:10:47 INFO RecoverableZooKeeper: Process identifier=hconnection-0x72187492 connecting to ZooKeeper ensemble=localhost:2181
16/05/11 11:10:47 INFO ZooKeeper: Client environment:zookeeper.version=3.4.6-2950--1, built on 09/30/2015 17:44 GMT
16/05/11 11:10:47 INFO ZooKeeper: Client environment:host.name=some.server.com
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.version=1.8.0_40
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_40/jre
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.class.path=/usr/hdp/2.3.2.0-2950/spark/conf/:/usr/hdp/2.3.2.0-2950/spark/lib/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar:/usr/hdp/2.3.2.0-2950/spark/lib/datanucleus-api-jdo-3.2.6.jar:/usr/hdp/2.3.2.0-2950/spark/lib/datanucleus-core-3.2.10.jar:/usr/hdp/2.3.2.0-2950/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/hdp/current/hadoop-client/conf/:/usr/hdp/current/hadoop-client/hadoop-azure.jar:/usr/hdp/current/hadoop-client/lib/azure-storage-2.2.0.jar
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.io.tmpdir=/tmp
16/05/11 11:10:47 INFO ZooKeeper: Client environment:java.compiler=<NA>
16/05/11 11:10:47 INFO ZooKeeper: Client environment:os.name=Linux
16/05/11 11:10:47 INFO ZooKeeper: Client environment:os.arch=amd64
16/05/11 11:10:47 INFO ZooKeeper: Client environment:os.version=3.10.0-327.10.1.el7.x86_64
16/05/11 11:10:47 INFO ZooKeeper: Client environment:user.name=dude
16/05/11 11:10:47 INFO ZooKeeper: Client environment:user.home=/home/dude
16/05/11 11:10:47 INFO ZooKeeper: Client environment:user.dir=/home/dude
16/05/11 11:10:47 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x721874920x0, quorum=localhost:2181, baseZNode=/hbase
16/05/11 11:10:47 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/11 11:10:47 INFO ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
16/05/11 11:10:47 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x25494c0cb650086, negotiated timeout = 40000
16/05/11 11:10:47 INFO Metrics: Initializing metrics system: phoenix
16/05/11 11:10:47 INFO MetricsConfig: loaded properties from hadoop-metrics2.properties
16/05/11 11:10:47 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
16/05/11 11:10:47 INFO MetricsSystemImpl: phoenix metrics system started
16/05/11 11:10:48 INFO RecoverableZooKeeper: Process identifier=hconnection-0xd2eddc2 connecting to ZooKeeper ensemble=localhost:2181
16/05/11 11:10:48 INFO ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0xd2eddc20x0, quorum=localhost:2181, baseZNode=/hbase
16/05/11 11:10:48 INFO ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
16/05/11 11:10:48 INFO ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
16/05/11 11:10:48 INFO ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x25494c0cb650087, negotiated timeout = 40000
16/05/11 11:11:36 INFO RpcRetryingCaller: Call exception, tries=10, retries=35, started=48168 ms ago, cancelled=false, msg=
16/05/11 11:11:56 INFO RpcRetryingCaller: Call exception, tries=11, retries=35, started=68312 ms ago, cancelled=false, msg=
16/05/11 11:12:16 INFO RpcRetryingCaller: Call exception, tries=12, retries=35, started=88338 ms ago, cancelled=false, msg=
16/05/11 11:12:36 INFO RpcRetryingCaller: Call exception, tries=13, retries=35, started=108450 ms ago, cancelled=false, msg=
16/05/11 11:12:56 INFO RpcRetryingCaller: Call exception, tries=14, retries=35, started=128530 ms ago, cancelled=false, msg=
16/05/11 11:13:16 INFO RpcRetryingCaller: Call exception, tries=15, retries=35, started=148547 ms ago, cancelled=false, msg=
16/05/11 11:13:37 INFO RpcRetryingCaller: Call exception, tries=16, retries=35, started=168741 ms ago, cancelled=false, msg=
16/05/11 11:13:57 INFO RpcRetryingCaller: Call exception, tries=17, retries=35, started=188856 ms ago, cancelled=false, msg=

我已经找到了这篇文章,但我已经在使用上述方法并通过HBase配置。我做错了什么?

有趣的是,我的ZK quorom服务器不是localhost而是两个服务器的列表,尽管它在信息消息中似乎显示为localhost。我不确定这是否是它应该展示的。hbase.zookeeper.quorum参数在hbase-site.xml中设置正确,并且在检查hbconf时会列出该参数。此外,zookeeper.znode.parent被设置为/hbase-unsecure,尽管我在消息中看到/hbase。凤凰星火会忽视这些吗?!

我可以直接使用HBase API,但如果有Phoenix就好了,因为我可以立即将数据加载为DataFrame。

共有1个答案

东门俊智
2023-03-14

该死的!错误在于列名应该大写。如果凤凰城能告诉我专栏不存在,而不是等待什么都没有发生,那就太好了。我要把这个作为错误报告归档!

 类似资料:
  • 所有运行在同一主机上的技术堆栈: Apache Spark 2.2.0版本 Hbase 1.2版本 18/07/30 12:28:15警告hbaseConfiguration:已不推荐配置选项“hbase.regionserver.lease.period”。而是使用“hbase.client.scanner.timeout.period” 18/07/30 12:28:54 INFO rpcre

  • 我正在尝试使用Spark协同过滤实现推荐系统。 首先准备模型并保存到磁盘: 例外情况: 有没有我需要设置的配置来加载模型?任何建议都会很有帮助。

  • 我试图使用phoenix CsvBulkLoadTool()从JBoss中加载hbase表。它正在运行,但没有从Web-INF/classes文件夹中保存的hbase-site.xml中获取配置。 当我运行命令行时,它从其中指定的类路径获取hbase-site.xml属性。 INFO queryutil:335创建与jdbc URL的连接:jdbc:phoenix:localhost:2181:/

  • 我正在尝试从Spark开始。我的库中有Hadoop(3.3.1)和Spark(3.2.2)。我已经将SPARK_HOME、PATH、HADOOP_HOME和LD_LIBRARY_PATH设置为各自的路径。我还在运行JDK 17(在终端中回声和-version工作正常)。 然而,我仍然得到以下错误: 有没有办法解决这个问题?

  • 我有一张下面结构的桌子。 在Apache Storm中,我能够在configure方法中创建一个Phoenix Connection对象,并且能够每10秒使用一次相同的连接来upsert。 在Spark中,我不能创建一个连接对象并为RDD中的每个对象使用相同的对象。spark的输出将是一个JavadStream>,其中start_time、end_time、count都是映射中的键。 我最终为RD

  • 无法加载web应用程序,控制台显示以下内容 InstallationHandlers。isRegistered()失败,原因:sun.security.validator。ValidatorException:PKIX路径生成失败:sun.security.provider.certpath。SunCertPathBuilderException:无法找到请求目标的有效证书路径 这意味着什么?知道