嘿,我对大数据世界还很陌生。我在http://musicmachinery.com/2011/09/04/how-to-process-a-
million-songs-in-20-minutes/中
找到了本教程
它详细描述了如何在本地和Elastic Map Reduce上使用mrjob运行MapReduce作业。
好吧,我正在尝试在自己的Hadoop cluser上运行它。我使用以下命令运行了作业。
python density.py tiny.dat -r hadoop --hadoop-bin /usr/bin/hadoop > outputmusic
这就是我得到的:
HADOOP: Running job: job_1369345811890_0245
HADOOP: Job job_1369345811890_0245 running in uber mode : false
HADOOP: map 0% reduce 0%
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Container killed by the ApplicationMaster.
HADOOP:
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP: at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP: at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP: at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP: at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP: at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP: at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP: at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP: at java.security.AccessController.doPrivileged(Native Method)
HADOOP: at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP: at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP: at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: map 100% reduce 0%
HADOOP: Job job_1369345811890_0245 failed with state FAILED due to: Task failed task_1369345811890_0245_m_000001
HADOOP: Job failed as tasks failed. failedMaps:1 failedReduces:0
HADOOP:
HADOOP: Counters: 6
HADOOP: Job Counters
HADOOP: Failed map tasks=7
HADOOP: Launched map tasks=8
HADOOP: Other local map tasks=6
HADOOP: Data-local map tasks=2
HADOOP: Total time spent by all maps in occupied slots (ms)=32379
HADOOP: Total time spent by all reduces in occupied slots (ms)=0
HADOOP: Job not Successful!
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir=null
Traceback (most recent call last):
File "density.py", line 34, in <module>
MRDensity.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 344, in run
mr_job.run_job()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 381, in run_job
runner.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py", line 316, in run
self._run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 175, in _run
self._run_job_in_hadoop()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 325, in _run_job_in_hadoop
raise CalledProcessError(step_proc.returncode, streaming_args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/input', '-output', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output', '-cacheFile', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py', '-cacheArchive', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/mrjob.tar.gz#mrjob.tar.gz', '-mapper', 'python density.py --step-num=0 --mapper --protocol json --output-protocol json --input-protocol raw_value', '-jobconf', 'mapred.reduce.tasks=0']' returned non-zero exit status 1
注意:正如我在其他论坛中所建议的那样
#! /usr/bin/python
在我的两个python文件density.py和track.py的开头。对于大多数人来说,它似乎奏效了,但我仍然继续得到上述启示。
编辑:我在原始density.py中包含了其中一个函数的定义,该函数在density.py本身的另一个文件track.py中定义。
工作顺利完成。但是,如果有人知道为什么会这样,那真的很有帮助。
错误代码1是Hadoop Streaming的一般错误。出现此错误代码的原因主要有两个:
您的Mapper和Reducer脚本不可执行(在脚本开头包含#!/ usr / bin / python)。
您的Python程序只是写错了-您可能有语法错误或逻辑错误。
不幸的是,错误代码1并没有提供任何详细信息来确切说明您的Python程序出了什么问题。
我本人一度陷入了错误代码1的困境,而我发现错误的方法就是将Mapper脚本作为独立的python程序运行: python mapper.py
完成此操作后,我遇到了一个常规的Python错误,该错误告诉我我只是给函数提供了错误的参数类型。我修复了语法错误,此后一切正常。因此,如果可能的话,我会将您的Mapper或Reducer脚本作为独立的Python程序运行,以查看是否能为您提供有关错误原因的任何见解。
问题内容: 我试图通过Python调用带有多个参数的进程。执行批处理文件本身对我来说很好,但是将其翻译成Python会让我大叫。这里是批处理文件的内容: 批处理文件运行的可执行文件名为。可执行文件的输出提供以下信息:– backend 。 另请注意,某些参数是字符串,而有些则不是。 解 现在对我有用: 问题答案: 在Windows中执行批处理文件: 如果您不想执行批处理文件,而是直接从Python
我正在尝试在Hadoop流中解决倒置单词列表问题(对于每个单词,输出是包含该单词的文件名列表)。输入是包含文本文件的目录的名称。我已经用python编写了映射器和化简器,它们在尝试使用unix管道时工作正常。但是,当使用Hadoop流命令执行时,代码会运行,但作业最终会失败。我怀疑这是Mapper代码中的东西,但似乎无法确切知道问题所在。 我是一个初学者(如果我没有得到正确的东西,请原谅),在VM
当我学习mapreduce时,其中一个关键组件是组合器。这是映射器和还原器之间的一个步骤,基本上在映射阶段结束时运行还原器,以减少映射器输出的数据行数。随着我需要处理的数据的大小增加(以万亿字节的规模),减少步骤变得非常慢。我和我的一个朋友谈过,他说这也是他的经历,他没有使用组合器,而是使用哈希函数来划分他的reduce键,这减少了reduce步骤中每个键的值的数量。我试过了,效果很好。有没有其他
所以我在RHEL 7号上安装了docker引擎 现在,当我做一个 我得到以下错误: 当我转到“systemctl status docker.service”和“journalctl-xe”时,我得到: 和 - 我试图寻找解决办法,但没有找到任何答案。
我正在通过rest上传视频到我们的Azure媒体服务器,但编码工作失败,以下例外: 我可以看到它声明不支持文件类型,但是如果我手动上传它就没有问题了。 这就是我发布视频的方式 该文件存在于Azure服务器上,但无法播放。 谁能给我指个方向吗
我试图运行WordCount示例的一个变体,这个变体是,映射器输出文本作为键和文本作为值,而还原器输出文本作为键和NullWritable作为值。 除了地图,减少签名,我把主要的方法是这样的: