当前位置: 首页 > 知识库问答 >
问题:

无法在Mesos集群上使用应用程序jar运行spark-submit

齐博厚
2023-03-14

Mesosphere在简化Mesos上运行Spark的过程方面做了很大的工作。我正在使用本指南在Google Cloud Compute上建立一个开发Mesos集群。

https://mesosphere.com/docs/tutorials/run-spark-on-mesos/

我可以使用spark-shell运行指南中的示例(查找小于10的数字)。但是,当我试图在本地提交一个与Spark一起正常工作的应用程序时,它会出现TASK_FAILED消息(即coarsemessSchedulerBackend:Mesos Task4现在是TASK_FAILED)。

下面是我在提供的Spark Pi示例中使用的命令。

./spark-submit--class org.apache.spark.examples.sparkpi--master mesos://10.173.40.36:5050~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100

和输出:

jclouds@development-5159-d9:~/learning-spark$ ~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050 ~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/22 16:44:02 INFO SparkContext: Running Spark version 1.3.0
15/03/22 16:44:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/22 16:44:03 INFO SecurityManager: Changing view acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: Changing modify acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jclouds); users with modify permissions: Set(jclouds)
15/03/22 16:44:03 INFO Slf4jLogger: Slf4jLogger started
15/03/22 16:44:03 INFO Remoting: Starting remoting
15/03/22 16:44:03 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:60301]
15/03/22 16:44:03 INFO Utils: Successfully started service 'sparkDriver' on port 60301.
15/03/22 16:44:03 INFO SparkEnv: Registering MapOutputTracker
15/03/22 16:44:03 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 16:44:03 INFO DiskBlockManager: Created local directory at /tmp/spark-27fad7e3-4ad7-44d6-845f-4a09ac9cce90/blockmgr-a558b7be-0d72-49b9-93fd-5ef8731b314b
15/03/22 16:44:03 INFO MemoryStore: MemoryStore started with capacity 265.0 MB
15/03/22 16:44:04 INFO HttpFileServer: HTTP File server directory is /tmp/spark-de9ac795-381b-4acd-a723-a9a6778773c9/httpd-7115216c-0223-492b-ae6f-4134ba7228ba
15/03/22 16:44:04 INFO HttpServer: Starting HTTP Server
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started SocketConnector@0.0.0.0:36663
15/03/22 16:44:04 INFO Utils: Successfully started service 'HTTP file server' on port 36663.
15/03/22 16:44:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/03/22 16:44:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/22 16:44:04 INFO SparkUI: Started SparkUI at http://development-5159-d9.c.learning-spark.internal:4040
15/03/22 16:44:04 INFO SparkContext: Added JAR file:/home/jclouds/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar at http://10.173.40.36:36663/jars/spark-examples-1.3.0-hadoop2.4.0.jar with timestamp 1427042644934
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
I0322 16:44:05.035423   308 sched.cpp:137] Version: 0.21.1
I0322 16:44:05.038136   309 sched.cpp:234] New master detected at master@10.173.40.36:5050
I0322 16:44:05.039261   309 sched.cpp:242] No credentials provided. Attempting to register without authentication
I0322 16:44:05.040351   310 sched.cpp:408] Framework registered with 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Registered as framework ID 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO NettyBlockTransferService: Server created on 44177
15/03/22 16:44:05 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 16:44:05 INFO BlockManagerMasterActor: Registering block manager development-5159-d9.c.learning-spark.internal:44177 with 265.0 MB RAM, BlockManagerId(<driver>, development-5159-d9.c.learning-spark.internal, 44177)
15/03/22 16:44:05 INFO BlockManagerMaster: Registered BlockManager
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/03/22 16:44:05 INFO SparkContext: Starting job: reduce at SparkPi.scala:35
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 100 output partitions (allowLocal=false)
15/03/22 16:44:05 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:35)
15/03/22 16:44:05 INFO DAGScheduler: Parents of final stage: List()
15/03/22 16:44:05 INFO DAGScheduler: Missing parents: List()
15/03/22 16:44:05 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31), which has no missing parents
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S1"
 due to too many failures; is Spark installed on it?
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now TASK_FAILED
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S0"
  due to too many failures; is Spark installed on it?
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1848) called with curMem=0, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1848.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1296) called with curMem=1848, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1296.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on development-5159-d9.c.learning-spark.internal:44177 (size: 1296.0 B, free: 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
  15/03/22 16:44:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:839
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now TASK_RUNNING
  15/03/22 16:44:05 INFO DAGScheduler: Submitting 100 missing tasks from Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
  15/03/22 16:44:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 100 tasks
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now TASK_FAILED
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave value: "20150322-040336-606645514-5050-2744-S2"
   due to too many failures; is Spark installed on it?
   15/03/22 16:44:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

我怀疑这可能与mesos从节点找不到应用程序jar有关,但是当我将它放在HDFS中并提供URL时,spark-submit告诉我它将跳过remote jar

jclouds@development-5159-d9:~/learning-spark$ ~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050 hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar 100Spark assembly has been built with Hive, including Datanucleus jars on classpath
Warning: Skip remote jar hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

--

编辑:为了总结这一点,spark用户列表中的hbogert为我指明了在我的一个从节点上调试spark日志的方向,问题非常明显。

jcloud@development-5159-d3d://tmp/mesos/slaves/20150322-040336-606645514-5050-2744-s1/frameworks/20150322-040336-606645514-5050-2744/executors/1/runs/20150322-040336-606645514-5050-2744-0037/executors/1/runs/lates$cat stderr I0329 20:34:26.10726710026 exec.cpp:132]版本:0.21.1 I0329 20:34:26.109591 10031 exec.cp:206]ds/spark-1.3.0-bin-hadoop2.4/bin/spark-class:not found jcloud@development-5159-d3d://tmp/mesos/slaves/20150322-040336-606645514-5050-2744-s1/frameworks/20150322-040336-60664514-5050-2744-s1/frameworks/20150322-040336-60664514-5050-2744-0037/executors/1/runs/last$cat stdout注册的Executor在10.217.7.180启动任务1 Forked命令10036 sh URL akka.tcp://sparkdriver@development-5159-d9.c.learning-spark.internal:54746/user/coarsegrainedscheduler--执行者-id 20150322-040336-606645514-5050-2744-s1--主机名10.217.7.180--核心10--app-id 20150322-040336-606645514-5050-2744-0037'命令退出,状态为127(PID:10036)

相关的:

  • 如何在Mesos集群上运行Hadoop?
  • http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-failing-on-cluster-td8369.html

共有1个答案

白星渊
2023-03-14

如果不知道Mesos沙箱日志中的stderr输出是什么,很难判断,但是通常需要确保正确设置mesos_native_library(在spark-env.sh中)和Spark.executor.uri(在spark-defaults.conf)指向Spark tar的URL。如果没有,则需要在每个从机的相同位置安装spark。

 类似资料:
  • 火花-应用: scala1.scala//我从这个类调用java类。 java.java//这将向yarn集群提交另一个spark应用程序。 我的项目的目录结构如下所示: 下面是调用第二个Spark应用程序的java代码: 用于在本地运行第一个spark应用程序的spark-submit命令:

  • 问题内容: 我正在运行一个.jar文件,其中包含需要打包在其中的所有依赖项。此依赖项之一是并且已经检查过它的类文件在此.jar文件中。 不幸的是,当我在Google的dataproc-cluster实例的主节点上单击命令spark-submit时,出现此错误: 似乎在覆盖我的依赖项方面发生了一些事情。已经从此.jar 反编译文件,并检查该方法是否存在。当我在该google dataproc实例上运

  • 我是Spark的新手。我有一个应用程序,通过调用spark shell来运行每个spark sql查询。因此,它将生成一组如下所示的查询,并调用spark shell命令逐个处理这些查询。 Val Query=spark.sql(""SELECT userid as userid,评级为评级,电影为电影从default.movie表""); 现在我想用spark submit而不是spark sh

  • 我试图在集群模式下实现Camel:Quartz2调度器。这是我的两条路线 我的Application.Properties如下所示

  • 我试图在AWS EMR上运行一个Spark应用程序。我遵循http://blogs.aws.amazon.com/bigdata/post/tx15ay5c50k70rv/installing-apache-spark-on-an-amazon-emr-cluster的说明 我从S3中加载fat-jar(通过执行“SBT Assembly”获得)和应用程序所需的输入文件。我在Spark-1.1.0

  • 另外,我在每个节点的“/var/lib/zookeeper”中创建了一个“myid”文件。例如,对于“150.20.11.157”,其ID在myid文件中为“1”。我也在码头上安装了Mesos和Spark。我也有一个由这三个节点组成的Mesos集群。我在这个文件中定义了从节点的IP地址:“spark/conf/slaves” 我在“spark/conf/spark-env.sh”中添加了以下几行: