我是火花编程的新手。我试图使用管道操作符来嵌入外部程序(一组包含编译的C程序、bash和Python脚本的文件)。代码如下所示:
sc.addFile("hdfs://afolder",true)
val infile = sc.textFile("afile.txt").pipe("afolder/abash.sh").take(3)
sh将调用其他脚本和程序在afile.txt上执行以下操作。
输出错误:
> 16/05/18 16:04:09 INFO storage.MemoryStore: Block broadcast_2 stored
> as values in memory (estimated size 212.1 KB, free 212.1 KB) 16/05/18
> 16:04:09 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as
> bytes in memory (estimated size 19.5 KB, free 231.6 KB) 16/05/18
> 16:04:09 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in
> memory on 210.107.197.201:42777 (size: 19.5 KB, free: 511.1 MB)
> 16/05/18 16:04:09 INFO spark.SparkContext: Created broadcast 2 from
> textFile at <console>:27 16/05/18 16:04:09 INFO
> mapred.FileInputFormat: Total input paths to process : 1 16/05/18
> 16:04:09 INFO spark.SparkContext: Starting job: take at <console>:27
> 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Got job 1 (take at
> <console>:27) with 1 output partitions 16/05/18 16:04:09 INFO
> scheduler.DAGScheduler: Final stage: ResultStage 1 (take at
> <console>:27) 16/05/18 16:04:09 INFO scheduler.DAGScheduler: Parents
> of final stage: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
> Missing parents: List() 16/05/18 16:04:09 INFO scheduler.DAGScheduler:
> Submitting ResultStage 1 (PipedRDD[5] at pipe at <console>:27), which
> has no missing parents 16/05/18 16:04:09 INFO storage.MemoryStore:
> Block broadcast_3 stored as values in memory (estimated size 3.7 KB,
> free 235.3 KB) 16/05/18 16:04:09 INFO storage.MemoryStore: Block
> broadcast_3_piece0 stored as bytes in memory (estimated size 2.2 KB,
> free 237.5 KB) 16/05/18 16:04:09 INFO storage.BlockManagerInfo: Added
> broadcast_3_piece0 in memory on 210.107.197.201:42777 (size: 2.2 KB,
> free: 511.1 MB) 16/05/18 16:04:09 INFO spark.SparkContext: Created
> broadcast 3 from broadcast at DAGScheduler.scala:1006 16/05/18
> 16:04:09 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from
> ResultStage 1 (PipedRDD[5] at pipe at <console>:27) 16/05/18 16:04:09
> INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 16/05/18
> 16:04:09 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0
> (TID 4, database, partition 0,NODE_LOCAL, 2603 bytes) 16/05/18
> 16:04:11 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in
> memory on database:51757 (size: 2.2 KB, free: 511.1 MB) 16/05/18
> 16:04:11 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0
> (TID 4, database): java.io.IOException: Cannot run program
> "afolder/abash.sh": error=13, Permission denied
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 9 more
>
> 16/05/18 16:04:11 INFO scheduler.TaskSetManager: Starting task 0.1 in
> stage 1.0 (TID 5, database, partition 0,NODE_LOCAL, 2603 bytes)
> 16/05/18 16:04:12 INFO storage.BlockManagerInfo: Added
> broadcast_3_piece0 in memory on database:52395 (size: 2.2 KB, free:
> 511.1 MB) 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 5) on executor database: java.io.IOException (Cannot run program "afolder/abash.sh": error=13, Permission denied)
> [duplicate 1] 16/05/18 16:04:12 INFO scheduler.TaskSetManager:
> Starting task 0.2 in stage 1.0 (TID 6, database, partition
> 0,NODE_LOCAL, 2603 bytes) 16/05/18 16:04:12 INFO
> scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 6) on
> executor database: java.io.IOException (Cannot run program
> "afolder/abash.sh": error=13, Permission denied) [duplicate 2]
> 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Starting task 0.3 in
> stage 1.0 (TID 7, database, partition 0,NODE_LOCAL, 2603 bytes)
> 16/05/18 16:04:12 INFO scheduler.TaskSetManager: Lost task 0.3 in
> stage 1.0 (TID 7) on executor database: java.io.IOException (Cannot
> run program "afolder/abash.sh": error=13, Permission denied)
> [duplicate 3] 16/05/18 16:04:12 ERROR scheduler.TaskSetManager: Task 0
> in stage 1.0 failed 4 times; aborting job 16/05/18 16:04:12 INFO
> cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all
> completed, from pool 16/05/18 16:04:12 INFO cluster.YarnScheduler:
> Cancelling stage 1 16/05/18 16:04:12 INFO scheduler.DAGScheduler:
> ResultStage 1 (take at <console>:27) failed in 2.955 s 16/05/18
> 16:04:12 INFO scheduler.DAGScheduler: Job 1 failed: take at
> <console>:27, took 2.963885 s org.apache.spark.SparkException: Job
> aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most
> recent failure: Lost task 0.3 in stage 1.0 (TID 7, database):
> java.io.IOException: Cannot run program "afolder/abash.sh": error=13,
> Permission denied
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 9 more
>
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
> at scala.Option.foreach(Option.scala:236)
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1328)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
> at org.apache.spark.rdd.RDD.take(RDD.scala:1302)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
> at $iwC$$iwC$$iwC.<init>(<console>:40)
> at $iwC$$iwC.<init>(<console>:42)
> at $iwC.<init>(<console>:44)
> at <init>(<console>:46)
> at .<init>(<console>:50)
> at .<clinit>(<console>)
> at .<init>(<console>:7)
> at .<clinit>(<console>)
> at $print(<console>)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
> at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
> at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
> at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
> at org.apache.spark.repl.Main.main(Main.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Cannot run program "afolder/abash.sh":
> error=13, Permission denied
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
> at org.apache.spark.rdd.PipedRDD.compute(PipedRDD.scala:119)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied
> at java.lang.UNIXProcess.forkAndExec(Native Method)
> at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
> at java.lang.ProcessImpl.start(ProcessImpl.java:134)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
> ... 9 more
请尝试chmod+x/abash.sh
问题内容: 我正在尝试执行以下代码: 我收到以下错误: 我检查了我是否具有必要的权限,并通过终端找到了该权限: 关于如何使它工作的任何建议? 问题答案: 您需要更改xyz.exe的权限
我在运行docker compose时得到了这个
问题内容: 我正在尝试通过管理页面上传图片,但一直在说: 上载时动态创建文件夹 。 在Traceback中,我发现此命令期间发生错误: 在/usr/lib64/python2.6/os.py第157行中,同时调用 意思是,它不能创建任何文件夹,因为它没有执行此操作的权限 我在服务器中将OpenSuse作为OS。在httpd.conf中,我有这个: 我需要chmod或chown吗? 问题答案: 您需
我添加了使用权限,包括WRITE_EXTERNAL_STORAGE,android.permission.相机,READ_EXTERNAL_STORAGEAndroidManifest.xml. 当我在Nexus6(API 24)中运行我的应用程序时,它向我抛出了以下错误: java.io.IOException:权限被拒绝 这是我的代码: 如何在权限相机运行时打开之前使用它?
问题内容: 通过Flask / Python运行Selenium时收到以下错误 该功能是 如果直接进入应用程序目录并运行脚本(),则不会收到该错误。 基于此,通过Flask运行时,日志文件似乎不可写,但该文件应位于何处? 可执行文件安装在 问题答案: 这些错误为我们提供了一些有关发生了什么错误的提示,如下所示: 按照源代码中的 GeckoDriver 得到了两个默认参数发起和如下: 您的程序在这里
问题内容: 我明白了,不知道这段代码有什么问题。 我正在尝试读取具有绝对路径(仅表示)的文件, 和相对路径(意思是),我希望程序将文件写入给定的任何路径-如果是绝对路径,则应将其写入当前目录;否则,转到给定的路径。 编码: 给出的错误: 我执行代码的方式: 我在这里做错了什么? 问题答案: 您似乎正在尝试使用以下代码替换扩展名: 但是,您似乎混合了数组索引。请尝试以下操作: 请注意在第二行代码中使