问题：

Hadoop MapReduce：context.write更改值

邓崇凛

2023-03-14

我是Hadoop和编写MapReduce作业的新手，我遇到了一个问题，似乎reducers context.write方法正在将正确的值改为不正确的值。

计数字数总数（int wordCount)
计算不同字的数目(int counter_dist)
计数以“Z”或“Z”开头的字数(int counter_startZ)
计算出现次数少于4次的单词数(int counter_less4)

所有这些都必须在单个MapReduce作业中完成。

正在分析的文本文件

Hello how zou zou zou zou how are you

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.write(word, one);
        }

    }
}

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

    int wordCount = 0; // Total number of words
    int counter_dist = 0; // Number of distinct words in the corpus
    int counter_startZ = 0; // Number of words that start with letter Z
    int counter_less4 = 0; // Number of words that appear less than 4 

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int repeatedWords = 0;
        System.out.println("###Reduce method starts");
        System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " (start)");
        for (IntWritable val : values){
            System.out.println("Key: " + key.toString());
            repeatedWords++;
            wordCount += val.get();
            if(key.toString().startsWith("z") || key.toString().startsWith("Z")){
            counter_startZ++;
            }
            System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " (end of loop)");
        }
        counter_dist++;

        if(repeatedWords < 4){
            counter_less4++;
        }

        System.out.println("Values: wordCount:" + wordCount + " counter_dist:" + counter_dist + " counter_startZ:" + counter_startZ + " counter_less4:" + counter_less4 + " repeatedWords:" + repeatedWords + " (end)");
        System.out.println("###Reduce method ends\n");
    }


    @Override
    public void cleanup(Context context) throws IOException, InterruptedException{
        System.out.println("###CLEANUP: wordCount: " + wordCount);
        System.out.println("###CLEANUP: counter_dist: " + counter_dist);
        System.out.println("###CLEANUP: counter_startZ: " + counter_startZ);
        System.out.println("###CLEANUP: counter_less4: " + counter_less4);

        context.write(new Text("Total words: "), new IntWritable(wordCount));
        context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
        context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
        context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));
    }


}

Stdout日志，我正在使用它进行调试

###Reduce method starts
Values: wordCount:0 counter_dist:0 counter_startZ:0 counter_less4:0 (start)
Key: Hello
Values: wordCount:1 counter_dist:0 counter_startZ:0 counter_less4:0 (end of loop)
Values: wordCount:1 counter_dist:1 counter_startZ:0 counter_less4:1 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:1 counter_dist:1 counter_startZ:0 counter_less4:1 (start)
Key: are
Values: wordCount:2 counter_dist:1 counter_startZ:0 counter_less4:1 (end of loop)
Values: wordCount:2 counter_dist:2 counter_startZ:0 counter_less4:2 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:2 counter_dist:2 counter_startZ:0 counter_less4:2 (start)
Key: how
Values: wordCount:3 counter_dist:2 counter_startZ:0 counter_less4:2 (end of loop)
Key: how
Values: wordCount:4 counter_dist:2 counter_startZ:0 counter_less4:2 (end of loop)
Values: wordCount:4 counter_dist:3 counter_startZ:0 counter_less4:3 repeatedWords:2 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:4 counter_dist:3 counter_startZ:0 counter_less4:3 (start)
Key: you
Values: wordCount:5 counter_dist:3 counter_startZ:0 counter_less4:3 (end of loop)
Values: wordCount:5 counter_dist:4 counter_startZ:0 counter_less4:4 repeatedWords:1 (end)
###Reduce method ends

###Reduce method starts
Values: wordCount:5 counter_dist:4 counter_startZ:0 counter_less4:4 (start)
Key: zou
Values: wordCount:6 counter_dist:4 counter_startZ:1 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:7 counter_dist:4 counter_startZ:2 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:8 counter_dist:4 counter_startZ:3 counter_less4:4 (end of loop)
Key: zou
Values: wordCount:9 counter_dist:4 counter_startZ:4 counter_less4:4 (end of loop)
Values: wordCount:9 counter_dist:5 counter_startZ:4 counter_less4:4 repeatedWords:4 (end)
###Reduce method ends

###CLEANUP: wordCount: 9
###CLEANUP: counter_dist: 5
###CLEANUP: counter_startZ: 4
###CLEANUP: counter_less4: 4

从日志中可以看出，所有的值都是正确的，而且一切正常。然而，当我打开HDFS中的输出目录并读取“part-r-00000”文件时，在那里写入的context.write的输出完全不同。

Total words: 22
Distinct words: 4
Starts with Z: 0
Appears less than 4 times: 4

共有1个答案

滕弘新

2023-03-14

对于关键的程序逻辑，绝不能依赖cleanup()方法。每次撕下JVM时都会调用cleanup()方法。因此，根据JVM（您无法预测）生成和终止的数量，您的逻辑会变得不稳定。

将初始化和写入上下文都移动到reduce方法中。

即。

int wordCount = 0; // Total number of words
int counter_dist = 0; // Number of distinct words in the corpus
int counter_startZ = 0; // Number of words that start with letter Z
int counter_less4 = 0; // Number of words that appear less than 4

   context.write(new Text("Total words: "), new IntWritable(wordCount));
    context.write(new Text("Distinct words: "), new IntWritable(counter_dist));
    context.write(new Text("Starts with Z: "), new IntWritable(counter_startZ));
    context.write(new Text("Appears less than 4 times:"), new IntWritable(counter_less4));

public class WordCountMapper extends Mapper <Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            String hasKey = itr.nextToken();
            word.set(hasKey);
            context.getCounter("my_counters", "TOTAL_WORDS").increment(1);
            if(hasKey.toUpperCase().startsWith("Z")){
            context.getCounter("my_counters", "Z_WORDS").increment(1);
            }
            context.write(word, one);
        }
    }
}

可在reducer计数器中计数出现少于4次的非重复字和字的数目。

public class WordCountReducer extends Reducer <Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int wordCount= 0;
        context.getCounter("my_counters", "DISTINCT_WORDS").increment(1);
        for (IntWritable val : values){
            wordCount += val.get();
        }
        if(wordCount < 4{
           context.getCounter("my_counters", "WORDS_LESS_THAN_4").increment(1);
        }
    }
}

在Driver类中获取计数器。下面的代码位于提交作业的行之后

CounterGroup group = job.getCounters().getGroup("my_counters");

for (Counter counter : group) {
   System.out.println(counter.getName() + "=" + counter.getValue());
}

类似资料：

如何更改列并更改默认值？

问题内容：尝试更改列的数据类型并设置新的默认值时遇到以下错误：错误1064（42000）：您的SQL语法有错误；检查与您的MySQL服务器版本相对应的手册，以在第1行的’VARCHAR（255）NOT NULL SET DEFAULT’{}’‘附近使用正确的语法问题答案：同样的第二种可能性（感谢juergen_d）：
如何更改列并更改默认值？

在尝试更改列的数据类型并设置新的默认值时，我遇到以下错误：错误1064（42000）：您的SQL语法中有错误；查看与您的MySQL server版本相对应的手册，以了解第1行“varchar(255)NOT NULL SET DEFAULT”{}“附近使用的正确语法
改造-更改BaseUrl

我有一个场景，我必须调用具有相同基本URL的API，例如，但具有不同的。我有一个Retrofit 2的实例，它是通过构建的：如下所示：对于某些，我必须调用相同的API，但在其他情况下，我必须从完全不同的调用它。如何更改实例以因此在运行时指向不同的URL？改装实例没有
在哈希更改时更改样式

我正在寻找一种通过改变散列来改变部分样式的方法。让我以一个例子来解释： > 当前URL为: 然后单击复选框，URL将更改为: 我想更改类的。我尝试了以下代码，但它不起作用：实际上，每次哈希更改时，控制台中都不会出现任何内容。
更改表的SQL Server性能更改列更改数据类型

问题内容：我们需要将某些列的数据类型从int更改为bigint。不幸的是，其中一些表很大，大约有7-10百万行（但不宽）。 Alter表alter列将永远保留在这些表上。有没有更快的方法来实现这一目标？问题答案：巧合的是，大约3个小时前，我不得不做一些非常相似的事情。该表是3500万行，它相当宽，并且花了很多时间才能做到这一点：这就是我最终得到的结果：这次，这些陈述几乎是即时的。（在速度
修改列与更改列

问题内容：我知道，我们不能使用来重命名列，但是可以。我的问题是：的主要用途是什么？例如，代替已编辑的问题已替换的主要用途是什么？上面的问题被下面的问题取代为什么我们必须使用更改列而不是修改列？问题答案： CHANGE COLUMN如果您已经创建了MySQL数据库，并确定其中一个列的名称不正确，则无需删除它并进行替换，您只需使用 change column 重命名即可。 MODI

Hadoop MapReduce：context.write更改值

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档