当前位置: 首页 > 面试题库 >

Hadoop 1.2.1-多节点群集-Reducer阶段因Wordcount程序而挂起?

王才英
2023-03-14
问题内容

我的问题在这里听起来可能是多余的,但先前问题的解决方案都是临时的。我尝试过的人很少,但还没有运气。

最终,我正在使用hadoop-1.2.1(在ubuntu 14上),最初我有单节点设置,并且在那里成功运行了WordCount程序。然后根据本教程向它添加了另一个节点。它成功启动,没有任何错误,但是现在,当我运行相同的WordCount程序时,它处于还原阶段。我查看了任务跟踪器日志,如下所示:

INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_m_000002_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_m_000002_0 which needs 1 slots
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_m_000002_0 which needs 1 slots
INFO org.apache.hadoop.mapred.JobLocalizer: Initializing user hadoopuser on this TT.
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_m_18975496
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_m_18975496 spawned.
INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_000002_0/taskjvm.sh
INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_m_18975496 given task: attempt_201509110037_0001_m_000002_0
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_m_000002_0 0.0% hdfs://HadoopMaster:54310/input/file02:25+3
INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201509110037_0001_m_000002_0 is done.
INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201509110037_0001_m_000002_0  was 6
INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2
INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201509110037_0001_m_18975496 exited with exit code 0. Number of tasks it ran: 1
INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_r_000000_0 task's state:UNASSIGNED
INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_r_000000_0 which needs 1 slots
INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_r_000000_0 which needs 1 slots
INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoopuser for UID 10 from the native implementation
INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_r_18975496
INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_r_18975496 spawned.
INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_000000_0/taskjvm.sh
INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_r_18975496 given task: attempt_201509110037_0001_r_000000_0
INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:500, dest: 127.0.0.1:55946, bytes: 6, op: MAPRED_SHUFFLE, cliID: attempt_201509110037_0001_m_000002_0, duration: 7129894
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 
INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) >

也在我运行程序的控制台上它挂在-

00:39:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
00:39:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library
00:39:24 WARN snappy.LoadSnappy: Snappy native library not loaded
00:39:24 INFO mapred.FileInputFormat: Total input paths to process : 2
00:39:24 INFO mapred.JobClient: Running job: job_201509110037_0001
00:39:25 INFO mapred.JobClient:  map 0% reduce 0%
00:39:28 INFO mapred.JobClient:  map 100% reduce 0%
00:39:35 INFO mapred.JobClient:  map 100% reduce 11%

和我的配置文件如下:

//core-site.xml

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://HadoopMaster:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

//hdfs-site.xml

<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>   
</configuration>

//mapred-site.xml

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>HadoopMaster:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
  <value>0.80</value>
</property>    
</configuration>

/ etc / hosts

127.0.0.1 localhost
127.0.1.1 M-1947

#HADOOP CLUSTER SETUP
172.50.88.54 HadoopMaster
172.50.88.60 HadoopSlave1

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/ etc /主机名

M-1947

//大师

Hadoop大师

//奴隶

Hadoop大师

Hadoop从站1

我已经为此苦苦挣扎很长时间了,感谢您的帮助。谢谢 !


问题答案:

解决了..虽然,同一问题在论坛上有多个问题,但是根据我的说法,已验证的解决方案是群集中任何节点的主机名解析应该是正确的(而且此问题不取决于群集的大小)。

实际上是 dns-lookup 的问题,请确保进行以下更改以解决上述问题-

  1. 尝试使用’$ hostname’在每台机器上打印主机名

  2. 检查为每台机器打印的主机名是否与在相应机器的主/从文件中输入的主机名相同。

  3. 如果不匹配,请通过在/ etc / hostname文件中进行更改来重命名主机,然后重新 引导系统。

例子:-

/ etc / hosts 文件中(假设在hadoop群集的主计算机上)

127.0.0.1本地主机

127.0.1.1约翰机

#Hadoop集群

172.50.88.21 Hadoop大师

172.50.88.22 Hadoop从1

172.50.88.23 Hadoop从站2

那么它的-> / etc / hostname文件 (在主机上)应包含以下条目(上述问题得以解决)

Hadoop大师

类似地,验证每个从节点的/ etc / hostname文件。



 类似资料:
  • 我有一个由1B记录组成的大型数据集,由于Apache spark提供的可扩展性,我想使用它运行分析,但我在这里看到了一个反模式。我向spark集群添加的节点越多,完成时间就越长。数据存储是Cassandra,查询由Zeppelin运行。我尝试过许多不同的查询,但甚至是对<code>数据帧的简单查询。count()的行为如下。 这是齐柏林飞艇笔记本临时表有18M记录 当针对不同数量进行测试时。这些是

  • 因为每个 Disque 节点都会将自己的配置信息储存在 disque-server 运行的文件夹里面, 而同一个文件夹只能有一份这样的配置信息, 所以如果我们打算同时运行多个节点, 那么就必须在不同的文件夹里面运行 disque-server , 并为每个节点指定不同的端口。 假设我们现在打算运行三个 Disque 节点, 那么首先要做的就是创建三个文件夹, 然后分别在这些文件夹里面运行 disq

  • 我正试图在hadoop中设置多节点集群,如何将0个数据阳极作为活动数据阳极,而我的hdfs显示了0个字节的分配 但是nodemanager后台进程正在datanodes上运行 `

  • 我需要在不同的机器上配置一个Kafka集群,但它不起作用,当我启动生产者和消费者时,将显示以下错误: 你能帮帮我吗。

  • 我有 2 个 docker 容器运行我的 Web 应用程序和机器学习应用程序,都使用 h2o。最初,我既调用 h2o.init() 又指向同一个 IP:PORT,因此初始化了一个具有一个节点的 h2o 集群。 考虑到我已经训练了一个模型,现在我正在训练第二个模型。在此训练过程中,如果web应用程序调用h2o集群(例如,从第一个模型请求预测),它将终止训练过程(错误消息如下),这是无意的。我尝试为每

  • 每当我尝试使用节点驱动程序进行测试时,我发现在公证点,我的流将挂起。 检查节点日志后,发现无法访问公证人的消息代理: [信息]09:33:26653[nioEventLoopGroup-3-3](AMQPClient.kt:91)内蒂。AMQPClient。运行-重试连接{} [信息]09:33:26657[nioEventLoopGroup-3-4](AMQPClient.kt:76)内蒂。AM