1)在Hadoop中跑一个Python MRJob脚本报以下错误
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2019-01-14 13:00:53,010 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: PipeMapR
ed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2)跑Python MRJob脚本报错一般都跟python的安装环境和库有关。
3)使用下面的命令导出log看下是哪里错误。
yarn logs -applicationId application_1545890266346_0066 > yarn.log
4)log信息如下
Container: container_1545890266346_0066_01_000007 on CDH2_55798
=================================================================
LogType:stderr
Log Upload Time:Tue Jan 15 10:03:24 +0800 2019
LogLength:242
Log Contents:
+ __mrjob_PWD=/HDFS/yarn/local/usercache/hdfs/appcache/application_1545890266346_0066/container_1545890266346_0066_01_000007
+ exec
+ python -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python: command not found
指示没有找到python命令
5)解决方法如下:
首先要在Hadoop集群 中安装mrjob库,pip install mrjob
1.py脚本中配置
#!/usr/lib/python
# encoding:utf-8
2.把脚本加执行权限
chmod +x mrjob.py
3.执行脚本看下
./mrtop.py -r hadoop hdfs:///tmp/wordcount/data1
4.执行成功
job output is in hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output
Streaming final output from hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output...
"xiaojun" 2
"python" 2
Removing HDFS temp directory hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237...
Removing temp directory /tmp/mrtop.root.20190115.025644.808237...