问题：

将输出发送到多个目录的hadoop方法

嵇俊德

2023-03-14

等等。

在任何时候，我只能获得长达12个月的数据，因此，我使用MultipleOutputs类在驱动程序中使用以下函数创建12个输出：

public void createOutputs(){
    Calendar c = Calendar.getInstance();
    String monthStr, pathStr;

    // Create multiple outputs for last 12 months
    // TODO make 12 configurable
    for(int i = 0; i < 12; ++i ){
        //Get month and add 1 as month is 0 based index
        int month = c.get(Calendar.MONTH)+1; 
        //Add leading 0
        monthStr = month > 10 ? "" + month : "0" + month ;  
        // Generate path string in the format 2013/03/etl
        pathStr = c.get(Calendar.YEAR) + "" + monthStr + "etl";
        // Add the named output
        MultipleOutputs.addNamedOutput(config, pathStr );  
        // Move to previous month
        c.add(Calendar.MONTH, -1); 
    }
}

在reducer中，我添加了一个cleanup函数，将生成的输出移动到适当的目录。

protected void cleanup(Context context) throws IOException, InterruptedException {
        // Custom function to recursively process data
        moveFiles (FileSystem.get(new Configuration()), new Path("/MyOutputPath"));
}

null

MyMapReduce: filepath:hdfs://localhost:8020/dev/test
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_logs
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_logs/history/job_201310301015_0224_1383763613843_371979_HtmlEtl
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_temporary
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_temporary/_attempt_201310301015_0224_r_000000_0
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_temporary/_attempt_201310301015_0224_r_000000_0/201307etl-r-00000
MyMapReduce: filepath:hdfs://localhost:8020/dev/test/_temporary/_attempt_201310301015_0224_r_000000_0/part-r-00000

共有1个答案

萧安怡

2023-03-14

你不应该需要第二份工作。我目前正在使用MultipleOutputs在我的一个程序中创建大量的输出目录。尽管有30多个目录，但我只能使用几个MultipleOutputs对象。这是因为您可以在写入时设置输出目录，所以只有在需要时才可以确定。如果要以不同的格式输出（例如，键:text.class，值:text.class，键:text.class，值：intwritable.class），实际上只需要多个namedOutput

设置：

MultipleOutputs.addNamedOutput(job, "Output", TextOutputFormat.class, Text.class, Text.class);

减速机的设置：

mout = new MultipleOutputs<Text, Text>(context);

String key; //set to whatever output key will be
String value; //set to whatever output value will be
String outputFileName; //set to absolute path to file where this should write

mout.write("Output",new Text(key),new Text(value),outputFileName);

int year;//extract year from data
int month;//extract month from data
String baseFileName; //parent directory to all outputs from this job
String outputFileName = baseFileName + "/" + year + "/" + month;

mout.write("Output",new Text(key),new Text(value),outputFileName);

类似资料：

hadoop方法将输出发送到多个目录

问题内容：我的工作按日期处理数据，需要将输出写入特定的文件夹结构。当前的期望是生成如下结构：等等在任何时候，我最多只能获得12个月的数据，因此，我正在使用类在驱动程序中使用以下函数创建12个输出：在reducer中，我添加了一个清理功能，以将生成的输出移动到适当的目录。问题：在将输出从_temporary目录移动到输出目录之前，reducer的清除功能正在执行。因此，由于所有数据仍位于_
Hadoop MultipleOutputs输出到多个文件中的实现方法

本文向大家介绍Hadoop MultipleOutputs输出到多个文件中的实现方法，包括了Hadoop MultipleOutputs输出到多个文件中的实现方法的使用技巧和注意事项，需要的朋友参考一下 Hadoop MultipleOutputs输出到多个文件中的实现方法 1.输出到多个文件或多个文件夹：驱动中不需要额外改变，只需要在MapClass或Reduce类中加入如下代码　　然后就
记录到多个输出

问题内容： Go语言中有没有办法记录到不同级别的多个输出？我希望有一个程序可以同时在Info级别记录到stdout并在带有时间戳的调试级别记录一个文件。就像我每次编写代码一样：我可以看到控制台打印：和一个文件：我使用logrus和glog，但是找不到此功能。还有其他包装或我可以编码的东西吗？问题答案： Go-logging支持不同的日志记录后端，例如文件，syslog等。可以设置多个后
Hadoop多个输入

问题内容：我正在使用hadoop mapreduce，我想计算两个文件。我的第一个Map / Reduce迭代是给我一个文件，其文件具有ID号，如下所示：我的目标是使用文件中的该ID与另一个文件相关联，并使用三重奏输出另一个：ID，Number，Name，如下所示：但是我不确定使用Map Reduce是否是最好的方法。例如，使用文件读取器读取第二个输入文件并通过ID获得名称会更好吗？还是可以
将多个图像发送到firebase

你好，我想上传多张图片到firebase。目前我可以上传1张图片。尽管如此，我还是决定将所有内容作为HTML，比如标题、描述和图像，放在一个webview中，并从那里显示出来。目前，这项工作还不错，我在firebase中有一个字符串，包含所有这些内容：不过，正如您所见，这只是本地存储的工作形式。如何下载这些图像并用正确的firebase图像URL替换图像src。
具有多个输出的Hadoop MapReduce递归？

嗨，我有一个map-reduce程序，它在每个递归步骤中获取reducer的输出。但我还需要在每次递归中输出另一个结果。输入1--- 输出1--- 输出2--- 输出3--- 作为我需要的最终输出：输出11，输出22，输出33，输出44和输出4 像这样，每个步骤都有两个输出文件，其中一个用于下一次迭代，另一个用于输出。我正在使用序列文件作为文本输入格式。任何帮助，谢谢。

将输出发送到多个目录的hadoop方法

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档