问题表现在:
1、在本地模式下可以运行正常
2、在集群模式中则会报下面错误
java.io.FileNotFoundException: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
... ...
19/10/22 12:19:17 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1569084228812_0100 failed 2 times due to AM Container for appattempt_1569084228812_0100_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://hadoop-node-master:8088/proxy/application_1569084228812_0100/Then, click on links to logs of each attempt.
Diagnostics: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
java.io.FileNotFoundException: File does not exist: hdfs://192.168.10.178:9000/user/root/.sparkStaging/application_1569084228812_0100/__spark_libs__6623696109201875604.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1571717942046
final status: FAILED
tracking URL: http://hadoop-node-master:8088/cluster/app/application_1569084228812_0100
user: root
Exception in thread "main" org.apache.spark.SparkException: Application application_1569084228812_0100 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/10/22 12:19:17 INFO util.ShutdownHookManager: Shutdown hook called
19/10/22 12:19:17 INFO util.ShutdownHookManager: Deleting directory /data/server/spark-2.0.2-bin-hadoop2.6/spark-d85fba2d-cb14-4c09-9edc-ca1dc3b4106d
... ...
#!/bin/sh
hadoop fs -rm -r hdfs://hadoop-node-master:9000/output_cf
/data/server/spark/bin/spark-submit \
--master yarn-cluster \
--num-executors 2 \
--executor-memory 1g \
--executor-cores 2 \
--class org.vincent.chapter05.cf ./scalatest1008-1.1.jar \
hdfs://hadoop-node-master:9000/music_uis.data \
hdfs://hadoop-node-master:9000/output_cf
object cf {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
// conf.setMaster("local[2]") // 如果要在集群模式中运行,则需要将些行注释
conf.setAppName("CF Spark")
val sc = new SparkContext(conf)
val lines = sc.textFile(args(0))
val output_path = args(1).toString
}
}
代码中的模式和实际运行的模式不匹配,因为代码中是要在本地运行的,所以不会把文件分发至集群中的。
还是要注意一些细节问题啊
梳理下Scala在Spark运行的步骤