val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val path = "/path/log.json"
val df = sqlContext.read.json(path)
df.show()
{“IFAM”:“EQR”,“KTM”:1430006400000,“COL”:21,“Data”:[{“MLRATE”:“30”,“NROUT”:“0”,“UP”:NULL,“板条箱”:“2”},{“MLRATE”:“31”,“NROUT”:“0”,“UP”:NULL,“板条箱”:“2”},{“MLRATE”:“30”,“NROUT”:“5”,“UP”:“NULL”:“2”},{“MLRATE”:“34”,“NROUT”:“0”,“UP”:NULL“33”,“nrout”:“0”,“up”:null,“板条箱”:“2”},{“mlrate”:“30”,“nrout”:“8”,“up”:null,“板条箱”:“2”}]}
在scala ide中发生错误时,我无法理解这一点:
信息SharedState:Warehouse path是'file://c://users/ben53/workspace/demo/spark-warehouse/'。线程“main”java.util.ServiceConfigurationerror:org.apache.spark.sql.sources.datasourceRegister:Provider org.apache.spark.sql.hive.orc.DefaultSource无法在java.util.ServiceLoader.fail(未知源)在java.util.ServiceLoader.Access$100(未知源)在java.util.ServiceLoader$LazyIterator.NextService(未知源)在java.util.ServiceLoader$LazyIterator.Next$jiteratorwrapper.next(wrappers.scala:43)在scala.collection.iterator$class.foreach(iterator.scala:893)在scala.collection.abstractiterator.foreach(iterator.scala:1336)在scala.collection.iterablelike$class.foreach(iterablelike.scala:72)在scala.collection.abstraction.foreach(iterablelike.scala:54)在在org.apache.spark.sql.execution.datasources.datasource$.lookupdatasource(datasource.scala:575)在org.apache.spark.sql.execution.datasources.datasources.providingClass$lzycompute(datasource.scala:86)在org.apache.spark.sql.execution.datasources.datasources.datasources.datasources.datasources.datasources:86)在在org.apache.spark.sql.dataframereader.load(dataframereader.scala:152)在org.apache.spark.sql.dataframereader.json(dataframereader.scala:298)在org.apache.spark.sql.dataframereader.json(dataframereader.scala:251)在com.dataflair.spark.querylog$.main(querylog.scala:27)在spark/sql/hive/orc/defaultsource.createrelation(lorg/apache/spark/sql/sqlcontext;[ljava/lang/string;lscala/option;lscala/option;lscala/collection/immutable/map;)lorg/apache/spark/sql/sources/hadoopfsrelations;@35:arether原因:类型“org/apache/spark/sql/hive/orc/orcrelelation”(当前帧,堆栈[0])不可分配给“org/apache/spark/sql/sources/hadoopfsrelations”(来自方法签名)当前帧:bci:@35标志:{}局部变量:{Apache/Spark/SQL/Hive/Orc/Orcrelation}字节码:0x0000000:b200 1C2B c100 1EBB 000E 592A b700 22B6 0x0000010:0026 bb00 2859 2C2D b200 2D19 0419 052B 0x0000020:b700 30B0
在java.lang.class.GetDeclaredConstructors0(本机方法)在java.lang.class.PrivateGetDeclaredConstructors(未知源)在java.lang.class.NewInstance(未知源)...还有20个
路径应该是正确的。但是提供的JSON是无效的。请更正示例JSON,然后尝试。您可以在https://jsonlint.com/上验证JSON
它显示了JSON的无效部分。
虽然我尝试了这个示例,但得到的输出如下所示:
+---+--------------------+----+-------------+
|COL| DATA|IFAM| KTM|
+---+--------------------+----+-------------+
| 21|[[2,30,0,null], [...| EQR|1430006400000|
+---+--------------------+----+-------------+
object Test {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val path = "/home/test/Desktop/test.json"
val df = sqlContext.read.json(path)
df.show()
}
}