当前位置: 首页 > 知识库问答 >
问题:

Spark pipe在yarn中不起作用(java.io.ioException:无法运行程序“xxx.py”:error=13,权限被拒绝)

谢承
2023-03-14

我是火花编程的新手。我试图使用管道操作符来嵌入外部程序(一组包含编译的C程序、bash和Python脚本的文件)。代码如下所示:

sc.addFile("hdfs://afolder",true)
val infile =  sc.textFile("afile.txt").pipe("afolder/abash.sh").take(3)

sh将调用其他脚本和程序在afile.txt上执行以下操作。

输出错误:

> 16/05/18 16:04:09 INFO storage.MemoryStore: Block broadcast_2 stored
   > as values in memory (estimated size 212.1 KB, free 212.1 KB) 16/05/18
   > 16:04:09 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as
   > bytes in memory (estimated size 19.5 KB, free 231.6 KB) 16/05/18
   > 16:04:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in
   > memory on 210.107.197.201:42777 (size: 19.5 KB, free: 511.1 MB)
   > 16/05/18 16:04:09 INFO spark.SparkContext: Created broadcast 2 from
   > textFile at <console>:27 16/05/18 16:04:09 INFO
   > mapred.FileInputFormat: Total input paths to process : 1 16/05/18
   > 16:04:09 INFO spark.SparkContext: Starting job: take at <console>:27
   > 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Got job 1 (take at
   > <console>:27) with 1 output partitions 16/05/18 16:04:09 INFO
   > scheduler.DAGScheduler: Final stage: ResultStage 1 (take at
   > <console>:27) 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Parents
   > of final stage: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
   > Missing parents: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
   > Submitting ResultStage 1 (PipedRDD[5] at pipe at <console>:27), which
   > has no missing parents 16/05/18 16:04:09 INFO storage.MemoryStore:
   > Block broadcast_3 stored as values in memory (estimated size 3.7 KB,
   > free 235.3 KB) 16/05/18 16:04:09 INFO storage.MemoryStore: Block
   > broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB,
   > free 237.5 KB) 16/05/18 16:04:09 INFO storage.BlockManagerInfo: Added
   > broadcast_3_piece0 in memory on 210.107.197.201:42777 (size: 2.2 KB,
   > free: 511.1 MB) 16/05/18 16:04:09 INFO spark.SparkContext: Created
   > broadcast 3 from broadcast at DAGScheduler.scala:1006 16/05/18
   > 16:04:09 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from
   > ResultStage 1 (PipedRDD[5] at pipe at <console>:27) 16/05/18 16:04:09
   > INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 16/05/18
   > 16:04:09 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0
   > (TID 4, database, partition 0,NODE_LOCAL, 2603 bytes) 16/05/18
   > 16:04:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in
   > memory on database:51757 (size: 2.2 KB, free: 511.1 MB) 16/05/18
   > 16:04:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
   > (TID 4, database): java.io.IOException: Cannot run program
   > "afolder/abash.sh": error=13, Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more
   > 
   > 16/05/18 16:04:11 INFO scheduler.TaskSetManager: Starting task 0.1 in
   > stage 1.0 (TID 5, database, partition 0,NODE_LOCAL, 2603 bytes)
   > 16/05/18 16:04:12 INFO storage.BlockManagerInfo: Added
   > broadcast_3_piece0 in memory on database:52395 (size: 2.2 KB, free:
   > 511.1 MB) 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 5) on executor database:    java.io.IOException (Cannot run program "afolder/abash.sh": error=13,    Permission denied)
   > [duplicate 1] 16/05/18 16:04:12 INFO scheduler.TaskSetManager:
   > Starting task 0.2 in stage 1.0 (TID 6, database, partition
   > 0,NODE_LOCAL, 2603 bytes) 16/05/18 16:04:12 INFO
   > scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 6) on
   > executor database: java.io.IOException (Cannot run program
   > "afolder/abash.sh": error=13, Permission denied) [duplicate 2]
   > 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Starting task 0.3 in
   > stage 1.0 (TID 7, database, partition 0,NODE_LOCAL, 2603 bytes)
   > 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.3 in
   > stage 1.0 (TID 7) on executor database: java.io.IOException (Cannot
   > run program "afolder/abash.sh": error=13, Permission denied)
   > [duplicate 3] 16/05/18 16:04:12 ERROR scheduler.TaskSetManager: Task 0
   > in stage 1.0 failed 4 times; aborting job 16/05/18 16:04:12 INFO
   > cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all
   > completed, from pool 16/05/18 16:04:12 INFO cluster.YarnScheduler:
   > Cancelling stage 1 16/05/18 16:04:12 INFO scheduler.DAGScheduler:
   > ResultStage 1 (take at <console>:27) failed in 2.955 s 16/05/18
   > 16:04:12 INFO scheduler.DAGScheduler: Job 1 failed: take at
   > <console>:27, took 2.963885 s org.apache.spark.SparkException: Job
   > aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
   > recent failure: Lost task 0.3 in stage 1.0 (TID 7, database):
   > java.io.IOException: Cannot run program "afolder/abash.sh": error=13,
   > Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more
   > 
   > Driver stacktrace:
   >         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
   >         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   >         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   >         at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
   >         at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
   >         at scala.Option.foreach(Option.scala:236)
   >         at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
   >         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
   >         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
   >         at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
   >         at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
   >         at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1328)
   >         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
   >         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
   >         at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
   >         at org.apache.spark.rdd.RDD.take(RDD.scala:1302)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
   >         at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
   >         at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
   >         at $iwC$$iwC$$iwC.<init>(<console>:40)
   >         at $iwC$$iwC.<init>(<console>:42)
   >         at $iwC.<init>(<console>:44)
   >         at <init>(<console>:46)
   >         at .<init>(<console>:50)
   >         at .<clinit>(<console>)
   >         at .<init>(<console>:7)
   >         at .<clinit>(<console>)
   >         at $print(<console>)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.lang.reflect.Method.invoke(Method.java:498)
   >         at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
   >         at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
   >         at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
   >         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
   >         at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
   >         at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
   >         at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
   >         at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
   >         at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
   >         at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
   >         at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   >         at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
   >         at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
   >         at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
   >         at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
   >         at org.apache.spark.repl.Main$.main(Main.scala:31)
   >         at org.apache.spark.repl.Main.main(Main.scala)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   >         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   >         at java.lang.reflect.Method.invoke(Method.java:498)
   >         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
   >         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
   >         at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
   >         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
   >         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused    by: java.io.IOException: Cannot run program "afolder/abash.sh":
   > error=13, Permission denied
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
   >         at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
   >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
   >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
   >         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
   >         at org.apache.spark.scheduler.Task.run(Task.scala:89)
   >         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
   >         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   >         at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
   >         at java.lang.UNIXProcess.forkAndExec(Native Method)
   >         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
   >         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
   >         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   >         ... 9 more

共有1个答案

闻人望
2023-03-14

请尝试chmod+x/abash.sh

 类似资料:
  • 问题内容: 我正在尝试执行以下代码: 我收到以下错误: 我检查了我是否具有必要的权限,并通过终端找到了该权限: 关于如何使它工作的任何建议? 问题答案: 您需要更改xyz.exe的权限

  • 我在运行docker compose时得到了这个

  • 问题内容: 我正在尝试通过管理页面上传图片,但一直在说: 上载时动态创建文件夹 。 在Traceback中,我发现此命令期间发生错误: 在/usr/lib64/python2.6/os.py第157行中,同时调用 意思是,它不能创建任何文件夹,因为它没有执行此操作的权限 我在服务器中将OpenSuse作为OS。在httpd.conf中,我有这个: 我需要chmod或chown吗? 问题答案: 您需

  • 我添加了使用权限,包括WRITE_EXTERNAL_STORAGE,android.permission.相机,READ_EXTERNAL_STORAGEAndroidManifest.xml. 当我在Nexus6(API 24)中运行我的应用程序时,它向我抛出了以下错误: java.io.IOException:权限被拒绝 这是我的代码: 如何在权限相机运行时打开之前使用它?

  • 问题内容: 通过Flask / Python运行Selenium时收到以下错误 该功能是 如果直接进入应用程序目录并运行脚本(),则不会收到该错误。 基于此,通过Flask运行时,日志文件似乎不可写,但该文件应位于何处? 可执行文件安装在 问题答案: 这些错误为我们提供了一些有关发生了什么错误的提示,如下所示: 按照源代码中的 GeckoDriver 得到了两个默认参数发起和如下: 您的程序在这里

  • 问题内容: 我明白了,不知道这段代码有什么问题。 我正在尝试读取具有绝对路径(仅表示)的文件, 和相对路径(意思是),我希望程序将文件写入给定的任何路径-如果是绝对路径,则应将其写入当前目录;否则,转到给定的路径。 编码: 给出的错误: 我执行代码的方式: 我在这里做错了什么? 问题答案: 您似乎正在尝试使用以下代码替换扩展名: 但是,您似乎混合了数组索引。请尝试以下操作: 请注意在第二行代码中使