我运行了一个简单的wordcount MapReduce示例,在组合器输出中添加一个小变化的组合器,组合器的输出不被Reducer合并。场景如下
context.write(t,new IntWritable(1));//添加了我自己的输出
public class wordcountcombiner extends Reducer<Text, IntWritable, Text, IntWritable>{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
context.write(key, new IntWritable(sum));
Text t = new Text("different"); // Added my own output
context.write(t, new IntWritable(1)); // Added my own output
}
}
我运行了一个简单的wordcount MapReduce示例,在组合器输出中添加一个小变化的组合器,组合器的输出不被Reducer合并。场景如下:在组合器中,我添加了两个额外的行来输出一个不同的单词和计数1,还原器不是求和“不同”的单词计数。输出粘贴在下面。
"different" 1
different 1
different 1
I 2
different 1
In 1
different 1
MapReduce 1
different 1
The 1
different 1
...
怎么会这样呢?
public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Job job = Job.getInstance(new Configuration());
job.setJarByClass(wordcountmapper.class);
job.setJobName("Word Count");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(wordcountmapper.class);
job.setCombinerClass(wordcountcombiner.class);
job.setReducerClass(wordcountreducer.class);
job.getConfiguration().set("fs.file.impl", "com.conga.services.hadoop.patch.HADOOP_7682.WinLocalFileSystem");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true)? 0 : 1);
}
}
public class wordcountmapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
IntWritable one = new IntWritable(1);
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
{
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens())
{
word.set(token.nextToken());
context.write(word, one);
}
}
}
public class wordcountcombiner extends Reducer<Text, IntWritable, Text, IntWritable>{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
context.write(key, new IntWritable(sum));
Text t = new Text("different");
context.write(t, new IntWritable(1));
}
}
减速器:
public class wordcountreducer extends Reducer<Text, IntWritable, Text, IntWritable>{
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
输出是正常的,因为您有两行做了错误的事情:为什么您有这段代码
Text t = new Text("different"); // Added my own output
context.write(t, new IntWritable(1)); // Added my own output
在减速器中,你要做求和,然后在输出中加上不同的1....
我只是使用3机器集群测试单词计数示例。我的代码与此示例相同,但以下代码除外:
现在我正在编写一个 Java 程序,使用哈道普映射还原将输出写入 HBase。问题是关于合并器类的。因为现在我的 reduce 类扩展了 TableReducer,而不是化简器。那么我的合并器类呢,它应该也扩展表还原器,还是仍然扩展化简器?
它考虑第一个KV对并给出相同的输出...!!??因为我只有一个价值。为什么它同时考虑钥匙和制造 既然我们考虑的是一次一对千伏?我知道这是一个错误的假设;请有人纠正我这一点
我决定创建自己的WritableComparable类来学习Hadoop如何使用它。因此,我创建了一个带有两个实例变量(orderNumber cliente)的Order类,并实现了所需的方法。我还为getters/setters/hashcode/equals/toString使用了Eclipse生成器。 相比较而言,我决定只使用orderNumber变量。 我创建了一个简单的MapReduc