当前位置: 首页 > 知识库问答 >
问题:

从远程客户端在Yarn集群上提交Spark作业

曹浩
2023-03-14
spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \  
 --class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif \  
 --deploy-mode cluster \  
 --master yarn

我被困在:

INFO yarn.Client: Application report for application_1519070657292_0088 (state: ACCEPTED)

在我得到这个之前:

 diagnostics: Application application_1519070657292_0088 failed 2 times due to AM Container for appattempt_1519070657292_0088_000002 exited with  exitCode: 10
    For more detailed output, check application tracking page:http://node1:8088/cluster/app/application_1519070657292_0088Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1519070657292_0088_02_000001
    Exit code: 10
    Stack trace: ExitCodeException exitCode=10:
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
            at org.apache.hadoop.util.Shell.run(Shell.java:482)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
            at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
            at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

当我签出应用程序跟踪页面时,我在stderr上得到以下信息:

18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for TERM
18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for HUP
18/03/13 14:48:05 INFO util.SignalUtils: Registered signal handler for INT
18/03/13 14:48:06 INFO yarn.ApplicationMaster: Preparing Local resources
18/03/13 14:48:08 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1519070657292_0088_000002
18/03/13 14:48:08 INFO spark.SecurityManager: Changing view acls to: kmansour
18/03/13 14:48:08 INFO spark.SecurityManager: Changing modify acls to: kmansour
18/03/13 14:48:08 INFO spark.SecurityManager: Changing view acls groups to: 
18/03/13 14:48:08 INFO spark.SecurityManager: Changing modify acls groups to: 
18/03/13 14:48:08 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(kmansour); groups with view permissions: Set(); users  with modify permissions: Set(kmansour); groups with modify permissions: Set()
18/03/13 14:48:08 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
18/03/13 14:50:15 ERROR yarn.ApplicationMaster: Failed to connect to driver at 132.156.9.98:50687, retrying ...
18/03/13 14:50:15 ERROR yarn.ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
    at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:577)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:433)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:256)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:785)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
18/03/13 14:50:15 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
18/03/13 14:50:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
18/03/13 14:50:16 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://132.156.9.142:8020/user/kmansour/.sparkStaging/application_1519070657292_0088
18/03/13 14:50:16 INFO util.ShutdownHookManager: Shutdown hook called
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://132.156.9.142:8020/events
spark.history.fs.logDirectory    hdfs://132.156.9.142:8020/events
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.cores               2
spark.driver.memory              5g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.executor.instances         4
spark.executor.cores             2
spark.executor.memory            6g
spark.yarn.am.memory             2g
spark.yarn.jars                  hdfs://node1:8020/jars/*.jar
<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node1</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>8192</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>7168</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>2</value>
    </property>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>5</value>
    </property>
</configuration>
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://132.156.9.142:8020</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>C:\Users\kmansour\Documents\hadoop-2.7.4\tmp</value>
    </property>
</configuration>

我对这一切都很陌生,也许我的推理有缺陷,任何投入或建议都会有所帮助。

共有1个答案

孙嘉悦
2023-03-14

您需要更改传递给spark-submit的顺序或参数。在您的配置中:

spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \  
 --class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif \  
 --deploy-mode cluster \  
 --master yarn

在默认模式下调用Spark(可能是yar-client),然后将--deploy-mode--master作为应用程序参数传递,因为在jar文件位置之后输入了这些参数。将其更改为:

spark-submit --jars hdfs:///user/kmansour/elevation/geotrellis-1.2.1-assembly.jar \  
 --deploy-mode cluster \  
 --master yarn \
 --class tutorial.CalculateFlowDirection hdfs:///user/kmansour/elevation/demo_2.11-0.2.0.jar hdfs:///user/kmansour/elevation/TIF/DTM_1m_19_E_17_108_*.tif  

你会得到真正的纱线簇模式。

 类似资料:
  • 我有我的主人的地址硬编码到我的应用程序在表单 但我得到的只是错误 或者如果我使用

  • 正如标题所预期的,我在向docker上运行的spark集群提交spark作业时遇到了一些问题。 我在scala中写了一个非常简单的火花作业,订阅一个kafka服务器,安排一些数据,并将这些数据存储在一个elastichsearch数据库中。 如果我在我的开发环境(Windows/IntelliJ)中从Ide运行spark作业,那么一切都会完美工作。 然后(我一点也不喜欢java),我按照以下说明添

  • 我正在尝试使用apache-spark读取和写入Ignite集群,我可以使用JDBC瘦客户机,但不是本机方法,正如几个spark+Ignite示例中提到的那样。 现在,所有的spark+ignite示例都启动了一个本地ignite集群,但我希望我的代码作为客户端连接到已经存在的集群。 完整代码:-(sparkDSLExample)函数无法使用thin连接ignite远程群集 示例-default.

  • 我是一名spark/纱线新手,在提交纱线集群上的spark作业时遇到exitCode=13。当spark作业在本地模式下运行时,一切正常。 我使用的命令是: Spark错误日志:

  • 在ResourceManager节点上启动flink作业(查找配置文件) 从ResourceManager下载配置文件到本地。 我想,这两种方式都不太好。如何将作业提交到远程纱线集群。有没有合适的办法?

  • 我有一个spark应用程序在本地模式下正确运行。在yarn集群上运行spark-submit时,会出现以下错误: 似乎找不到httpclient依赖项。这是我的构造 你知道吗?