我正在尝试用PySpark从HBase写/读。
环境:
from pyspark import SparkConf, SQLContext
from pyspark.sql import SparkSession
from datetime import datetime
import json
conf = (SparkConf()
.setAppName("RW_from_HBase"))
spark = SparkSession.builder \
.appName(" ") \
.config(conf=conf) \
.getOrCreate()
sc = spark.sparkContext
sqlc = SQLContext(sc)
data_source_format = 'org.apache.spark.sql.execution.datasources.hbase'
catalog = json.dumps(
{
"table":{"namespace":"spark", "name":"test_table"},
"rowkey":"id",
"columns":{
"id":{"cf":"rowkey", "col":"id", "type":"string"},
"filename":{"cf":"content", "col":"filename", "type":"string"},
"created_ts":{"cf":"content", "col":"created_ts", "type":"string"},
"html":{"cf":"content", "col":"html", "type":"string"}
}
})
# Writing into HBase
mydf.write\
.options(catalog=catalog, newtable = 5)\
.format(data_source_format)\
.save()
# Reading from Hbase
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
df.show()
我的火花提交是:
--master local[*] --packages com.databricks:spark-avro_2.11:4.0.0,com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/repositories/releases/ --queue PyCharmSpark pyspark-shell
当我写到HBase时,一切都很好,数据从mydf保存到HBase表中。
当我试图阅读时,它很好,只有在激发行动之前。df.show()-导致错误。
WARNING: Running spark-class from user-defined location.
http://repo.hortonworks.com/content/repositories/releases/ added as a remote repository with the name: repo-1
Ivy Default Cache set to: /home/cloudera/.ivy2/cache
The jars for the packages stored in: /home/cloudera/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-avro_2.11 added as a dependency
com.hortonworks#shc-core added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-avro_2.11;4.0.0 in central
found org.slf4j#slf4j-api;1.7.5 in central
found org.apache.avro#avro;1.7.6 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in central
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in central
found com.thoughtworks.paranamer#paranamer;2.3 in central
found org.xerial.snappy#snappy-java;1.0.5 in central
found org.apache.commons#commons-compress;1.4.1 in central
found org.tukaani#xz;1.0 in central
found com.hortonworks#shc-core;1.1.1-2.1-s_2.11 in repo-1
found org.apache.hbase#hbase-server;1.1.2 in central
found org.apache.hbase#hbase-protocol;1.1.2 in central
found org.apache.hbase#hbase-annotations;1.1.2 in central
found com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 in central
found log4j#log4j;1.2.17 in central
found junit#junit;4.11 in central
found org.hamcrest#hamcrest-core;1.3 in central
found com.google.protobuf#protobuf-java;2.5.0 in central
found org.apache.hbase#hbase-procedure;1.1.2 in central
found com.google.guava#guava;12.0.1 in central
found com.google.code.findbugs#jsr305;1.3.9 in central
found org.apache.hbase#hbase-client;1.1.2 in central
found commons-codec#commons-codec;1.9 in central
found commons-io#commons-io;2.4 in central
found commons-lang#commons-lang;2.6 in central
found io.netty#netty-all;4.0.23.Final in central
found org.apache.zookeeper#zookeeper;3.4.6 in central
found org.slf4j#slf4j-api;1.7.7 in central
found org.slf4j#slf4j-log4j12;1.6.1 in central
found org.apache.htrace#htrace-core;3.1.0-incubating in central
found org.jruby.jcodings#jcodings;1.0.8 in central
found org.jruby.joni#joni;2.1.2 in central
found commons-httpclient#commons-httpclient;3.1 in central
found commons-collections#commons-collections;3.2.1 in central
found com.yammer.metrics#metrics-core;2.2.0 in central
found com.sun.jersey#jersey-core;1.9 in central
found com.sun.jersey#jersey-server;1.9 in central
found commons-cli#commons-cli;1.2 in central
found org.apache.commons#commons-math;2.2 in central
found org.mortbay.jetty#jetty;6.1.26 in central
found org.mortbay.jetty#jetty-util;6.1.26 in central
found org.mortbay.jetty#jetty-sslengine;6.1.26 in central
found org.mortbay.jetty#jsp-2.1;6.1.14 in central
found org.mortbay.jetty#jsp-api-2.1;6.1.14 in central
found org.mortbay.jetty#servlet-api-2.5;6.1.14 in central
found org.codehaus.jackson#jackson-jaxrs;1.9.13 in central
found tomcat#jasper-compiler;5.5.23 in central
found org.jamon#jamon-runtime;2.3.1 in central
found com.lmax#disruptor;3.3.0 in central
found org.apache.hbase#hbase-prefix-tree;1.1.2 in central
found org.mortbay.jetty#servlet-api;2.5-20081211 in central
found tomcat#jasper-runtime;5.5.23 in central
found commons-el#commons-el;1.0 in central
found org.apache.hbase#hbase-common;1.1.2 in central
found org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 in central
found org.apache.tephra#tephra-api;0.9.0-incubating in central
found org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating in central
found org.apache.tephra#tephra-core;0.9.0-incubating in central
found com.google.code.gson#gson;2.2.4 in central
found com.google.guava#guava;13.0.1 in central
found com.google.inject#guice;3.0 in central
found javax.inject#javax.inject;1 in central
found aopalliance#aopalliance;1.0 in central
found org.sonatype.sisu.inject#cglib;2.2.1-v20090111 in central
found asm#asm;3.1 in central
found com.google.inject.extensions#guice-assistedinject;3.0 in central
found ch.qos.logback#logback-classic;1.0.9 in central
found ch.qos.logback#logback-core;1.0.9 in central
found org.apache.thrift#libthrift;0.9.0 in central
found org.apache.httpcomponents#httpcore;4.1.3 in central
found it.unimi.dsi#fastutil;6.5.6 in central
found org.apache.twill#twill-common;0.6.0-incubating in central
found com.google.code.findbugs#jsr305;2.0.1 in central
found org.apache.twill#twill-core;0.6.0-incubating in central
found org.apache.twill#twill-api;0.6.0-incubating in central
found org.apache.twill#twill-discovery-api;0.6.0-incubating in central
found org.apache.twill#twill-zookeeper;0.6.0-incubating in central
found org.apache.twill#twill-discovery-core;0.6.0-incubating in central
found org.ow2.asm#asm-all;5.0.2 in central
found io.dropwizard.metrics#metrics-core;3.1.0 in central
found org.antlr#antlr-runtime;3.5.2 in central
found jline#jline;2.11 in central
found sqlline#sqlline;1.2.0 in central
found joda-time#joda-time;1.6 in central
found com.github.stephenc.jcip#jcip-annotations;1.0-1 in central
found junit#junit;4.12 in central
found org.apache.httpcomponents#httpclient;4.0.1 in central
found commons-logging#commons-logging;1.2 in central
found org.iq80.snappy#snappy;0.3 in central
found commons-collections#commons-collections;3.2.2 in central
found org.apache.commons#commons-csv;1.0 in central
found org.apache.hbase#hbase-annotations;1.1.3 in central
found org.apache.hbase#hbase-protocol;1.1.3 in central
found org.apache.hadoop#hadoop-common;2.7.1 in central
found org.apache.hadoop#hadoop-annotations;2.7.1 in central
found org.apache.commons#commons-math3;3.1.1 in central
found xmlenc#xmlenc;0.52 in central
found commons-net#commons-net;3.1 in central
found javax.servlet#servlet-api;2.5 in central
found com.sun.jersey#jersey-json;1.9 in central
found org.codehaus.jettison#jettison;1.1 in central
found com.sun.xml.bind#jaxb-impl;2.2.3-1 in central
found javax.xml.bind#jaxb-api;2.2.2 in central
found javax.xml.stream#stax-api;1.0-2 in central
found javax.activation#activation;1.1 in central
found org.codehaus.jackson#jackson-xc;1.9.2 in central
found net.java.dev.jets3t#jets3t;0.9.0 in central
found org.apache.httpcomponents#httpcore;4.2.5 in central
found com.jamesmurty.utils#java-xmlbuilder;0.4 in central
found commons-configuration#commons-configuration;1.6 in central
found commons-digester#commons-digester;1.8 in central
found commons-beanutils#commons-beanutils;1.7.0 in central
found commons-beanutils#commons-beanutils-core;1.8.0 in central
found org.apache.hadoop#hadoop-auth;2.7.1 in central
found org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 in central
found org.apache.directory.server#apacheds-i18n;2.0.0-M15 in central
found org.apache.directory.api#api-asn1-api;1.0.0-M20 in central
found org.apache.directory.api#api-util;1.0.0-M20 in central
found org.apache.curator#curator-framework;2.7.1 in central
found org.apache.curator#curator-client;2.7.1 in central
found com.jcraft#jsch;0.1.42 in central
found org.apache.curator#curator-recipes;2.7.1 in central
found org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 in central
found org.apache.hadoop#hadoop-yarn-common;2.7.1 in central
found org.apache.hadoop#hadoop-yarn-api;2.7.1 in central
found com.sun.jersey#jersey-client;1.9 in central
found com.google.inject.extensions#guice-servlet;3.0 in central
found com.sun.jersey.contribs#jersey-guice;1.9 in central
found org.slf4j#slf4j-log4j12;1.7.10 in central
found io.netty#netty;3.6.2.Final in central
found javax.servlet.jsp#jsp-api;2.1 in central
:: resolution report :: resolve 27998ms :: artifacts dl 2975ms
:: modules in use:
aopalliance#aopalliance;1.0 from central in [default]
asm#asm;3.1 from central in [default]
ch.qos.logback#logback-classic;1.0.9 from central in [default]
ch.qos.logback#logback-core;1.0.9 from central in [default]
com.databricks#spark-avro_2.11;4.0.0 from central in [default]
com.github.stephenc.findbugs#findbugs-annotations;1.3.9-1 from central in [default]
com.github.stephenc.jcip#jcip-annotations;1.0-1 from central in [default]
com.google.code.findbugs#jsr305;2.0.1 from central in [default]
com.google.code.gson#gson;2.2.4 from central in [default]
com.google.guava#guava;13.0.1 from central in [default]
com.google.inject#guice;3.0 from central in [default]
com.google.inject.extensions#guice-assistedinject;3.0 from central in [default]
com.google.inject.extensions#guice-servlet;3.0 from central in [default]
com.google.protobuf#protobuf-java;2.5.0 from central in [default]
com.hortonworks#shc-core;1.1.1-2.1-s_2.11 from repo-1 in [default]
com.jamesmurty.utils#java-xmlbuilder;0.4 from central in [default]
com.jcraft#jsch;0.1.42 from central in [default]
com.lmax#disruptor;3.3.0 from central in [default]
com.sun.jersey#jersey-client;1.9 from central in [default]
com.sun.jersey#jersey-core;1.9 from central in [default]
com.sun.jersey#jersey-json;1.9 from central in [default]
com.sun.jersey#jersey-server;1.9 from central in [default]
com.sun.jersey.contribs#jersey-guice;1.9 from central in [default]
com.sun.xml.bind#jaxb-impl;2.2.3-1 from central in [default]
com.thoughtworks.paranamer#paranamer;2.3 from central in [default]
com.yammer.metrics#metrics-core;2.2.0 from central in [default]
commons-beanutils#commons-beanutils;1.7.0 from central in [default]
commons-beanutils#commons-beanutils-core;1.8.0 from central in [default]
commons-cli#commons-cli;1.2 from central in [default]
commons-codec#commons-codec;1.9 from central in [default]
commons-collections#commons-collections;3.2.2 from central in [default]
commons-configuration#commons-configuration;1.6 from central in [default]
commons-digester#commons-digester;1.8 from central in [default]
commons-el#commons-el;1.0 from central in [default]
commons-httpclient#commons-httpclient;3.1 from central in [default]
commons-io#commons-io;2.4 from central in [default]
commons-lang#commons-lang;2.6 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
commons-net#commons-net;3.1 from central in [default]
io.dropwizard.metrics#metrics-core;3.1.0 from central in [default]
io.netty#netty;3.6.2.Final from central in [default]
io.netty#netty-all;4.0.23.Final from central in [default]
it.unimi.dsi#fastutil;6.5.6 from central in [default]
javax.activation#activation;1.1 from central in [default]
javax.inject#javax.inject;1 from central in [default]
javax.servlet#servlet-api;2.5 from central in [default]
javax.servlet.jsp#jsp-api;2.1 from central in [default]
javax.xml.bind#jaxb-api;2.2.2 from central in [default]
javax.xml.stream#stax-api;1.0-2 from central in [default]
jline#jline;2.11 from central in [default]
joda-time#joda-time;1.6 from central in [default]
junit#junit;4.12 from central in [default]
log4j#log4j;1.2.17 from central in [default]
net.java.dev.jets3t#jets3t;0.9.0 from central in [default]
org.antlr#antlr-runtime;3.5.2 from central in [default]
org.apache.avro#avro;1.7.6 from central in [default]
org.apache.commons#commons-compress;1.4.1 from central in [default]
org.apache.commons#commons-csv;1.0 from central in [default]
org.apache.commons#commons-math;2.2 from central in [default]
org.apache.commons#commons-math3;3.1.1 from central in [default]
org.apache.curator#curator-client;2.7.1 from central in [default]
org.apache.curator#curator-framework;2.7.1 from central in [default]
org.apache.curator#curator-recipes;2.7.1 from central in [default]
org.apache.directory.api#api-asn1-api;1.0.0-M20 from central in [default]
org.apache.directory.api#api-util;1.0.0-M20 from central in [default]
org.apache.directory.server#apacheds-i18n;2.0.0-M15 from central in [default]
org.apache.directory.server#apacheds-kerberos-codec;2.0.0-M15 from central in [default]
org.apache.hadoop#hadoop-annotations;2.7.1 from central in [default]
org.apache.hadoop#hadoop-auth;2.7.1 from central in [default]
org.apache.hadoop#hadoop-common;2.7.1 from central in [default]
org.apache.hadoop#hadoop-mapreduce-client-core;2.7.1 from central in [default]
org.apache.hadoop#hadoop-yarn-api;2.7.1 from central in [default]
org.apache.hadoop#hadoop-yarn-common;2.7.1 from central in [default]
org.apache.hbase#hbase-annotations;1.1.3 from central in [default]
org.apache.hbase#hbase-client;1.1.2 from central in [default]
org.apache.hbase#hbase-common;1.1.2 from central in [default]
org.apache.hbase#hbase-prefix-tree;1.1.2 from central in [default]
org.apache.hbase#hbase-procedure;1.1.2 from central in [default]
org.apache.hbase#hbase-protocol;1.1.3 from central in [default]
org.apache.hbase#hbase-server;1.1.2 from central in [default]
org.apache.htrace#htrace-core;3.1.0-incubating from central in [default]
org.apache.httpcomponents#httpclient;4.0.1 from central in [default]
org.apache.httpcomponents#httpcore;4.2.5 from central in [default]
org.apache.phoenix#phoenix-core;4.9.0-HBase-1.1 from central in [default]
org.apache.tephra#tephra-api;0.9.0-incubating from central in [default]
org.apache.tephra#tephra-core;0.9.0-incubating from central in [default]
org.apache.tephra#tephra-hbase-compat-1.1;0.9.0-incubating from central in [default]
org.apache.thrift#libthrift;0.9.0 from central in [default]
org.apache.twill#twill-api;0.6.0-incubating from central in [default]
org.apache.twill#twill-common;0.6.0-incubating from central in [default]
org.apache.twill#twill-core;0.6.0-incubating from central in [default]
org.apache.twill#twill-discovery-api;0.6.0-incubating from central in [default]
org.apache.twill#twill-discovery-core;0.6.0-incubating from central in [default]
org.apache.twill#twill-zookeeper;0.6.0-incubating from central in [default]
org.apache.zookeeper#zookeeper;3.4.6 from central in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-jaxrs;1.9.13 from central in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from central in [default]
org.codehaus.jackson#jackson-xc;1.9.2 from central in [default]
org.codehaus.jettison#jettison;1.1 from central in [default]
org.hamcrest#hamcrest-core;1.3 from central in [default]
org.iq80.snappy#snappy;0.3 from central in [default]
org.jamon#jamon-runtime;2.3.1 from central in [default]
org.jruby.jcodings#jcodings;1.0.8 from central in [default]
org.jruby.joni#joni;2.1.2 from central in [default]
org.mortbay.jetty#jetty;6.1.26 from central in [default]
org.mortbay.jetty#jetty-sslengine;6.1.26 from central in [default]
org.mortbay.jetty#jetty-util;6.1.26 from central in [default]
org.mortbay.jetty#jsp-2.1;6.1.14 from central in [default]
org.mortbay.jetty#jsp-api-2.1;6.1.14 from central in [default]
org.mortbay.jetty#servlet-api;2.5-20081211 from central in [default]
org.mortbay.jetty#servlet-api-2.5;6.1.14 from central in [default]
org.ow2.asm#asm-all;5.0.2 from central in [default]
org.slf4j#slf4j-api;1.7.7 from central in [default]
org.slf4j#slf4j-log4j12;1.7.10 from central in [default]
org.sonatype.sisu.inject#cglib;2.2.1-v20090111 from central in [default]
org.tukaani#xz;1.0 from central in [default]
org.xerial.snappy#snappy-java;1.0.5 from central in [default]
sqlline#sqlline;1.2.0 from central in [default]
tomcat#jasper-compiler;5.5.23 from central in [default]
tomcat#jasper-runtime;5.5.23 from central in [default]
xmlenc#xmlenc;0.52 from central in [default]
:: evicted modules:
org.slf4j#slf4j-api;1.7.5 by [org.slf4j#slf4j-api;1.7.7] in [default]
org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.7] in [default]
org.apache.hbase#hbase-protocol;1.1.2 by [org.apache.hbase#hbase-protocol;1.1.3] in [default]
org.apache.hbase#hbase-annotations;1.1.2 by [org.apache.hbase#hbase-annotations;1.1.3] in [default]
junit#junit;4.11 by [junit#junit;4.12] in [default]
com.google.guava#guava;12.0.1 by [com.google.guava#guava;13.0.1] in [default]
com.google.code.findbugs#jsr305;1.3.9 by [com.google.code.findbugs#jsr305;2.0.1] in [default]
org.slf4j#slf4j-log4j12;1.6.1 by [org.slf4j#slf4j-log4j12;1.7.10] in [default]
commons-collections#commons-collections;3.2.1 by [commons-collections#commons-collections;3.2.2] in [default]
commons-lang#commons-lang;2.5 by [commons-lang#commons-lang;2.6] in [default]
org.apache.httpcomponents#httpclient;4.1.3 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
org.apache.httpcomponents#httpcore;4.1.3 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
org.apache.zookeeper#zookeeper;3.4.5 by [org.apache.zookeeper#zookeeper;3.4.6] in [default]
org.codehaus.jackson#jackson-core-asl;1.9.2 by [org.codehaus.jackson#jackson-core-asl;1.9.13] in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.2 by [org.codehaus.jackson#jackson-mapper-asl;1.9.13] in [default]
org.apache.httpcomponents#httpcore;4.0.1 by [org.apache.httpcomponents#httpcore;4.2.5] in [default]
commons-codec#commons-codec;1.7 by [commons-codec#commons-codec;1.9] in [default]
org.codehaus.jackson#jackson-jaxrs;1.9.2 by [org.codehaus.jackson#jackson-jaxrs;1.9.13] in [default]
org.apache.httpcomponents#httpclient;4.2.5 by [org.apache.httpcomponents#httpclient;4.0.1] in [default]
org.apache.avro#avro;1.7.4 by [org.apache.avro#avro;1.7.6] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 142 | 9 | 9 | 20 || 122 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 122 already retrieved (0kB/387ms)
18/07/12 03:02:08 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 192.168.116.128 instead (on interface eth1)
18/07/12 03:02:08 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:> (0 + 1) / 1]18/07/12 03:04:37 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan;
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.org$apache$spark$sql$execution$datasources$hbase$HBaseTableScanRDD$$buildScan(HBaseTableScan.scala:223)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:280)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD$$anonfun$8.apply(HBaseTableScan.scala:279)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.compute(HBaseTableScan.scala:279)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Scan.setCaching(I)Lorg/apache/hadoop/hbase/client/Scan
这个问题通常是由于安装的版本与项目中使用的版本不同,或者依赖源的差异造成的。请在您的项目中检查hbase的版本。
问题内容: 嗨,我习惯了SQL,但是我需要从HBase表读取数据。任何帮助都会很棒。一本书,或者只是一些示例代码,可以从表中读取。有人说使用扫描仪可以解决问题,但我不知道如何使用。 问题答案: 从网站:
问题内容: 我正在尝试使用PySpark 2.4.0从Kafka读取avro消息。 spark-avro外部模块可以为读取avro文件提供以下解决方案: 但是,我需要阅读流式Avro消息。库文档建议使用 from_avro() 函数,该函数仅适用于Scala和Java。 是否有其他模块支持读取从Kafka流式传输的Avro消息? 问题答案: 您可以包括spark-avro软件包,例如使用(调整版本
问题内容: 如何使用PySpark读取以下JSON结构以触发数据帧? 我的JSON结构 我已经尝试过: 我希望将输出a,b,c作为列,并将值作为相应的行。 谢谢。 问题答案: Json字符串变量 如果您将 json字符串作为变量, 则可以 这会给你 Json字符串作为文件中的单独行(sparkContext和sqlContext) 如果 文件中 有 json字符串作为单独的行, 则可以 使用spa
则错误如下: AttributeError:“property”对象没有属性“parquet”
我正在尝试使用S3中的pyspark读取文件,并出现以下错误-- 我的代码很简单,但我可以使用BOTO3连接,但我需要使用pyspark,因为我正在处理的文件很大,还需要在CSV上进行一些聚合- java版本- python- Pypark/火花- 如果需要更多信息,请告诉我。
我的目标是在Cloudera集群上运行一个简单的MapReduce作业,该作业从虚拟HBase数据库读取并写入HDFS文件。 一些重要的注意事项:-我以前在这个集群上成功运行过MapReduce作业,这些作业将HDFS文件作为输入,并写入HDFS文件作为输出。-我已经将用于编译项目的库从“纯”HBase替换为HBase-cloudera jars-当我以前遇到这类问题时,我只是简单地将库复制到分布