我是Scala和Spark的初学者。
scala版本:2.12.10
spark版本:3.0.1
我正在scala中尝试一个非常简单的spark rdd函数。
但是我得到一个错误。
(1)build.sbt
scalaVersion := "2.12.10"
name := "hello-world"
organization := "ch.epfl.scala"
version := "1.0"
libraryDependencies += "org.scala-lang.modules" %% "scala-parser-combinators" % "1.1.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.1"
libraryDependencies += "org.apache.spark" %% "spark-core" % "3.0.1"
(2) 主要。斯卡拉
import org.apache.spark.sql.SparkSession
object Main extends App {
println("Hello, World!")
implicit val spark = SparkSession.builder()
.master("spark://centos-master:7077")
// .master("local[*]")
.appName("spark-api")
.getOrCreate()
val inputrdd = spark.sparkContext.parallelize(Seq(("arth",10), ("arth", 20), ("samuel", 60), ("jack", 65)))
println("inputrdd : ", inputrdd)
val mapped = inputrdd.mapValues(x => (x, 1))
println("mapped : ", mapped)
mapped.collect.foreach(println)
}
(3)错误发生时
似乎映射的中发生了错误。收集foreach(println)零件。
(4) 错误内容
21/04/17 20:54:19 INFO DAGScheduler: Job 0 failed: collect at Main.scala:16, took 6.083947 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 192.168.0.220, executor 0):
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda
to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in
instance of org.apache.spark.rdd.MapPartitionsRDD
[error] at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error] at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error] at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error] at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error]
[error] Driver stacktrace:
21/04/17 20:54:19 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 6) on 192.168.0.220, executor 0: java.lang.ClassCastException (cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 7]
21/04/17 20:54:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
[error] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 7, 192.168.0.220, executor 0): java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
[error] at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error] at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error] at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error] at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error]
[error] Driver stacktrace:
[error] at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
[error] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
[error] at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
[error] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
[error] at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
[error] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
[error] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
[error] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
[error] at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
[error] at scala.Option.foreach(Option.scala:407)
[error] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
[error] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
[error] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
[error] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
[error] at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
[error] at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
[error] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
[error] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
[error] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139)
[error] at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164)
[error] at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
[error] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[error] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
[error] at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
[error] at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
[error] at Main$.delayedEndpoint$Main$1(Main.scala:16)
[error] at Main$delayedInit$body.apply(Main.scala:2)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] at scala.collection.immutable.List.foreach(List.scala:392)
[error] at scala.App.main(App.scala:80)
[error] at scala.App.main$(App.scala:78)
[error] at Main$.main(Main.scala:2)
[error] at Main.main(Main.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
[error] at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
[error] at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2410)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2404)
[error] at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2328)
[error] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
[error] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1666)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
[error] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
[error] at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
[error] at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
[error] at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
[error] at org.apache.spark.scheduler.Task.run(Task.scala:127)
[error] at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
[error] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
[error] at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
如果我需要更多的库或者代码错误(但它在spark shell中运行良好)。。。。
如何解决?
您需要将JAR提交给spark,以便代码可以在那里运行。spark shell在幕后对你隐瞒了这一切。
这个答案提供了更好的细节https://stackoverflow.com/a/28367602/1810962有背景。
您可以使用bin/spark-提交作为解决方法,并使用--class
、--jars
和--驱动程序类路径
提供本地类路径
当我启动计算每个键平均值的应用程序时,我遇到了此错误。我使用带有lambda表达式(java8)的函数。我读取了一个带有三个寄存器(、、)的文件。我在工作程序和主程序中都有java 8 这是引发异常的代码: 我使用函数来分发包含所有依赖项的胖jar。
这是我的JSON: {信息:[{字段:“提供的期间”,表:“课程”,列:“学术水平*”},{字段:“默认评分基准*”,表:“课程”,列:“默认提供百分比”},{字段:“允许的地点”,表:“课程”,列:“允许提供”}]} 这是我的POST bodyRequest服务 这是我的错误: .W.S.M.S.DefaultHandlerExceptionResolver:解析[org.springframe
本文向大家介绍java对象序列化操作实例分析,包括了java对象序列化操作实例分析的使用技巧和注意事项,需要的朋友参考一下 本文实例讲述了java对象序列化操作。分享给大家供大家参考,具体如下: 在java中可以将对象进行序列化操作 要使对象能够被序列化,那么被序列化的对象要实现接口Serializable,此接口位于java.io包中 序列化对象案例程序,网上的教程是将序列化的对象输出到文件,但
本文向大家介绍Java 序列化和反序列化实例详解,包括了Java 序列化和反序列化实例详解的使用技巧和注意事项,需要的朋友参考一下 Java 序列化和反序列化实例详解 在分布式应用中,对象只有经过序列化才能在各个分布式组件之间传输,这就涉及到两个方面的技术-发送者将对象序列化,接受者将对象反序列化,下面就是一个很好的例子! 1.实体-Employee 2.SerializeHelper 3.测试类
这个示例直接取自Spark示例代码,所以我不太清楚到底发生了什么。 我在localhost上运行的Spark独立集群上运行这个。 工人始终失败: 我运行的是Java 11,使用的是Spark 3.0.1。 我确实发现了一个非常相似的问题,看起来它就是答案:java。lang.ClassCastException在远程服务器上的spark作业中使用lambda表达式 然而,在确保将TestSpark