问题：

hadoop-mapreduce reducer-combiner input

乐正烨熠

2023-03-14

我正在学习一些MapReduce，但是我遇到了一些问题，情况是这样的:我有两个文件:“users”包含一个用户列表，其中包含一些用户数据(性别、年龄、国家等)...)文件看起来像这样:

user_000003  m  22  United States   Oct 30, 2005

“歌曲”包含所有用户收听的歌曲的数据（用户ID，收听日期和时间，艺术家ID，艺术家姓名，歌曲ID，歌曲标题）：

user_000999 2008-12-11T22:52:33Z    b7ffd2af-418f-4be2-bdd1-22f8b48613da    Nine Inch Nails 1d1bb32a-5bc6-4b6f-88cc-c043f6c52509    7 Ghosts I

目标是在某些国家找到k首最受欢迎的歌曲。k和输入中提供的国家列表。

我决定为映射器使用MultipleInputs类，这样一个映射器将输出一组如下所示的键值对:。另一个映射器将输出。据我所知，我应该能够在reducer中读取与某个键配对的所有值(因此我应该在与userID相关联的值列表中找到国家和一定数量的歌曲),并输出一组由另一个MapReduce作业读取的文件。

我很确定映射程序完成了他们的工作，因为我能够用reducer编写他们的输出。

更新：使用以下代码将文件传递给映射器：

Job job = Job.getInstance(conf);

        MultipleInputs.addInputPath(job, new Path(songsFile), TextInputFormat.class, SongMapper.class);
        MultipleInputs.addInputPath(job, new Path(usersFile), TextInputFormat.class, UserMapper.class);

        FileOutputFormat.setOutputPath(job, new Path(outFile));
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setJarByClass(Songs.class);

        job.setCombinerClass(Combiner.class);
        job.setReducerClass(UserSongReducer.class);

映射器代码：

public static class UserMapper extends Mapper<LongWritable, Text, Text, Text>{

        //empty cleanUp()

        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {

            String record = value.toString();               
            String[] userData = record.split("\t");
            if(userData.length>3 && !userData[3].equals(""))
            {
                context.write(new Text(userData[0]), new Text(userData[3]));
            }        


        }

        //empty run() and setup()

    }

    public static class SongMapper extends Mapper<LongWritable, Text, Text, Text>{

        //empty cleanUp()

        protected void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {

            String record = value.toString();               
            String[] songData = record.split("\t");
            if(songData.length>3 && !songData[3].equals(""))
            {
                context.write(new Text(songData[0]), new Text(songData[5]+" |||  "+songData[3]));
            }        


        }

        //empty run() and setup()

    }

组合器代码：

public static class Combiner extends Reducer<Text, Text, Text, Text> 
    {

        private boolean isCountryAllowed(String toCheck, String[] countries)
        {

            for(int i=0; i<countries.length;i++)
            {
                if(toCheck.equals(countries[i]))
                    return true;
            }
            return false;
        }

        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 
        {



            ArrayList<String> list = new ArrayList<String>();



            String country = "foo";
            for(Text value : values) 
            {
                if(!value.toString().contains(" ||| "))
                {
                    country = value.toString();
                }else
                {
                    list.add(value.toString());
                }

            }

            if(isCountryAllowed(country, context.getConfiguration().getStrings("countries")))
            {

                for (String listVal : list) 
                {
                    context.write(new Text(country), 
                            new Text(listVal));
                }
            }

         }
    }

当我试图用减速器输出线对时，问题出现了:

public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 
        {


             for (Text value : values) 
            {
                context.write(key,value);
            }
            }

         }

我使用“||”来构建艺术家标题字符串，问题是country仍然是“foo”。我想我应该看到至少一行输出，正确的国家作为关键，但输出总是“foo”(2，5kb歌曲文件):

foo Deep Dish |||  Fuck Me Im Famous (Pacha Ibiza)-09-28-2007
foo Vnv Nation |||  Kingdom
foo Les Fleur De Lys |||  Circles
foo Home Video |||  Penguin
foo Of Montreal |||  Will You Come And Fetch Me
foo Godspeed You! Black Emperor |||  Bbf3
foo Alarum |||  Sustained Connection
foo Sneaker Pimps |||  Walking Zero
foo Cecilio And Kapono |||  I Love You
foo Garbage |||  Vow
foo The Brian Setzer Orchestra |||  Gettin' In The Mood
foo Nitin Sawhney |||  Sunset (J-Walk Remix)
foo Nine Inch Nails |||  Heresy
foo Collective Soul |||  Crowded Head
foo Vicarious Bliss |||  Limousine
foo Noisettes |||  Malice In Wonderland
foo Black Rebel Motorcycle Club |||  Lien On Your Dreams
foo Mae |||  Brink Of Disaster
foo Michael Andrews |||  Rosie Darko
foo A Perfect Circle |||  Blue

我做错了什么？

PS如果我使用自定义组合器，我应该可以避免第二个工作，组合器的作用是否完全像减速器？

共有1个答案

曹鹏海

2023-03-14

从你的代码中我看到，国家应该是“foo”，你到底想实现什么？

 // IN YOUR MAPPER THE VALUE IS WRITTEN USING |||

context.write(new Text(songData[0]), new Text(songData[5]+" |||  "+songData[3]));

// THEN VALUE WILL ALWAYS CONTAIN ||| AND WILL NEVER CHANGE THE COUNTRY THAT WAS SET TO TRUE
 String country = "foo";
     if(!value.toString().contains(" ||| ")) //never ---

然后输出国家变量：

 context.write(new Text(country),

类似资料：

Hadoop

Hadoop 是一个分布式系统基础架构，由Apache基金会开发。用户可以在不了解分布式底层细节的情况下，开发分布式程序。充分利用集群的威力高速运算和存储。Hadoop实现了一个分布式文件系统（Hadoop Distributed File System），简称HDFS。HDFS有着高容错性的特点，并且设计用来部署在低廉的（low-cost）硬件上。而且它提供高传输率（high throughpu
Hadoop概述/Hadoop介绍

一、背景 1、起源 MapReduce编程模型的思想来源于函数式编程语言Lisp，由Google公司于2004年提出并首先应用于大型集群。同时，Google也发表了GFS、BigTable等底层系统以应用MapReduce模型。在2007年，Google’s MapReduce Programming Model-Revisted论文发表，进一步详细介绍了Google MapReduce模型以及S
Hadoop-common、Hadoop-core和Hadoop-Client之间的区别？

顺便说一下，对于给定的类，我如何知道Maven中哪个工件包含它？例如，哪个包含org.apache.hadoop.io.text？
HADOOP :: java.lang.ClassNotFoundException：WordCount

问题内容：我正在使用eclipse导出map-reduce程序的jar文件。当我使用命令运行jar 它总是显示错误：顺便说一句，我从互联网上得到了wordcount的示例jar文件，它运行得很好。我不知道问题出在哪里。问题答案：如果您尝试运行示例中提供的单词计数，则应运行：有关如何在此链接上运行wordcount的更多信息。通常，如果您要开发自己的Map / Reduce作业，则应包
Hadoop ClassNotFoundException

问题内容：我正在编写第一个Hadoop应用程序，但出现错误。我不太了解此堆栈跟踪中的一些细节是什么意思。这是一个。我正在Ubuntu Linux v12.10，Eclipse 3.8.0，Java 1.6.0_24上构建它。我通过从Apache站点下载并使用Ant构建它来安装Hadoop。创建工作时，我的崩溃发生在程序的第一行。控制台输出：问题答案：您应该添加找到的所有jar，以避免此类
MultipleOutputFormat-Hadoop

我是一个有点新的地图缩小，所以如果任何人可以指导我与下面的问题，这将是伟大的 > 我在map Reduce中使用了多输出格式来写入分离输出文件。让我们假设我的输入文件有水果和蔬菜，因此把它分成两个文件。水果和蔬菜如下。水果-R-00000，蔬菜-R-00000，部分-R-00000 我搞不清有多少减速器会运转？我知道，默认情况下，减速器的数量被设置为1，由于文件名的数字部分是相同的，我相信只有一
Spring Hadoop

Spring for Apache Hadoop 提供了 Spring 框架用于创建和运行 Hadoop MapReduce、Hive 和 Pig 作业的功能，包括 HDFS 和 HBase。如果你需要简单的基于 Hadoop 进行作业调度，你可添加 Spring for Apache Hadoop 命名空间到你的 Spring 项目即可快速使用 Hadoop 了，使用该项目无需了解过多 Hado
Hadoop-PDL

该项目提出了一种有效的基于组合设计工具 PBD (Pairwise Block Design) 的数据布局 PDL，以加快混合 EC 分布式存储系统中单节点故障的数据修复。由于减少了机架间的通信量，并在修复过程中实现了读写 I/O 的负载平衡，因此它实现了几乎均匀的数据分布以及更高的修复性能。我们设计了数据的放置策略以及相应的故障恢复方案，并且在 Hadoop 3.1.1 中实现了它们。

hadoop-mapreduce reducer-combiner input

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档