问题：

hadoop reducer输出在reducer中迭代读取

韩阳飙

2023-03-14

我只是使用3机器集群测试单词计数示例。我的代码与此示例相同，但以下代码除外：

System.out.println(key);
key.set(key + " - Key in Reducer");

3M3WI - Key in Reducer - Key in Reducer
3M3WIG - Key in Reducer - Key in Reducer
3M3WL - Key in Reducer - Key in Reducer
3M3WNWPLG - Key in Reducer - Key in Reducer
3M3WQ - Key in Reducer - Key in Reducer
3M3WQNG.K78QJ0WN, - Key in Reducer - Key in Reducer
3M3WWR - Key in Reducer - Key in Reducer
3M3WX - Key in Reducer - Key in Reducer
3M3X - Key in Reducer - Key in Reducer
3M3X,. - Key in Reducer - Key in Reducer
3M3X.KZA8J - Key in Reducer - Key in Reducer
3M3X1 - Key in Reducer - Key in Reducer
3M3X8RC - Key in Reducer - Key in Reducer
3M3XC - Key in Reducer - Key in Reducer
3M3XCBD9R337PK - Key in Reducer - Key in Reducer
3M3XD - Key in Reducer - Key in Reducer
3M3XLW - Key in Reducer - Key in Reducer
3M3XML - Key in Reducer - Key in Reducer
3M3XN - Key in Reducer - Key in Reducer
3M3XU - Key in Reducer - Key in Reducer
3M3XX - Key in Reducer - Key in Reducer
3M3XZ - Key in Reducer - Key in Reducer
3M3Y - Key in Reducer - Key in Reducer
3M3YAIJL - Key in Reducer - Key in Reducer

package test;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();

      public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
          word.set(tokenizer.nextToken());
          output.collect(word, one);
        }
      }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
        int sum = 0;
        while (values.hasNext()) {
          sum += values.next().get();
        }
        System.out.println(key);
        key.set(key+" - Key in Reducer");
        output.collect(key, new IntWritable(sum));
      }
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);

      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);

      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);

      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));

      JobClient.runJob(conf);
    }
}

共有1个答案

谭玉泽

2023-03-14

注释掉conf.setCombinerClass（reduce.class）；而且应该没问题。发生这种情况是因为您将还原器用作组合器。

当组合器可用时，map（）的输出首先被提供给combine（）。combine（）函数的输出然后被发送到减速器机器上的reduce（）函数。因此，reduce（）的实际输入中已经包含了一个Reducer中的键，在通过reduce（）之后，这个键会增加一倍。这就是为什么你两次在减速器中得到钥匙的原因。

类似资料：

在MapReduce作业中多次迭代Reducer中的文本输入值

我在HDFS上有两个非常大的数据集（表）。我想在一些列上连接它们，然后在一些列上将它们分组，然后在某些列上执行一些分组函数。我的步骤是： 1-创建两个工作。 2-在第一个作业中，在映射器中，读取每个数据集的行作为映射输入值，并发出连接列的值作为映射输出键，其余列的值为映射输出值。映射之后，MapReduce框架执行混洗，并根据映射输出键对所有映射输出值进行分组。然后，在reducer中，它读
C ++编程中的输出迭代器

本文向大家介绍C ++编程中的输出迭代器，包括了C ++编程中的输出迭代器的使用技巧和注意事项，需要的朋友参考一下在本教程中，我们将讨论一个程序，以了解C ++中的输出迭代器。输出迭代器是主要的五个迭代器的一部分。它们与输入迭代器的功能相反，它们可以被分配值，但不能被访问以获取值。示例输出结果
C++ 使用输出迭代器

本文向大家介绍C++ 使用输出迭代器，包括了C++ 使用输出迭代器的使用技巧和注意事项，需要的朋友参考一下示例通过将输出迭代器传递给函数，可以返回相同类型的多个值。这对于一般功能（例如标准库的算法）尤其常见。例：用法示例：
在Scala中迭代RDD迭代

null 一些示例输出数据： *编辑：工作的scala代码行：
将映射器输出写入输出文件的Reducer

我正在学习Hadoop，并尝试执行我的Mapduce程序。所有Map任务和Reduce er任务都完成得很好，但Reducer将Mapper Output写入Output文件。这意味着根本没有调用Reduce函数。我的示例输入如下所示预期输出如下所示以下是我的计划。这里问了同样的问题，我在reduce函数中使用了Iterable值作为该线程中建议的答案。但这并不能解决问题。我不能在那里发表评
C ++中的输入迭代器

本文向大家介绍C ++中的输入迭代器，包括了C ++中的输入迭代器的使用技巧和注意事项，需要的朋友参考一下在本教程中，我们将讨论一个程序，以了解C ++中的输入迭代器。输入迭代器是STL中五个最弱，最简单的迭代器之一。它们主要用于串行输入操作，在该操作中，每个值都被读取为一个值，然后迭代器移至下一个值。示例输出结果

hadoop reducer输出在reducer中迭代读取

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档