当前位置: 首页 > 工具软件 > mrjob > 使用案例 >

Python MRJob Hadoop中报错解决思路

章侯林
2023-12-01

1)在Hadoop中跑一个Python MRJob脚本报以下错误

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2019-01-14 13:00:53,010 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: PipeMapR
ed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:322)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2)跑Python MRJob脚本报错一般都跟python的安装环境和库有关。

3)使用下面的命令导出log看下是哪里错误。

yarn  logs  -applicationId   application_1545890266346_0066  > yarn.log

4)log信息如下

Container: container_1545890266346_0066_01_000007 on CDH2_55798
=================================================================
LogType:stderr
Log Upload Time:Tue Jan 15 10:03:24 +0800 2019
LogLength:242
Log Contents:
+ __mrjob_PWD=/HDFS/yarn/local/usercache/hdfs/appcache/application_1545890266346_0066/container_1545890266346_0066_01_000007
+ exec
+ python -c 'import fcntl; fcntl.flock(9, fcntl.LOCK_EX)'
setup-wrapper.sh: line 6: python: command not found

指示没有找到python命令

5)解决方法如下:

首先要在Hadoop集群 中安装mrjob库,pip install mrjob

1.py脚本中配置

    #!/usr/lib/python
    # encoding:utf-8

 2.把脚本加执行权限

   chmod +x mrjob.py

  3.执行脚本看下

      ./mrtop.py -r hadoop hdfs:///tmp/wordcount/data1

 4.执行成功

job output is in hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output
Streaming final output from hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237/output...
"xiaojun"       2
"python"        2
Removing HDFS temp directory hdfs:///user/root/tmp/mrjob/mrtop.root.20190115.025644.808237...
Removing temp directory /tmp/mrtop.root.20190115.025644.808237... 

 类似资料: