当前位置: 首页 > 知识库问答 >
问题:

map reduce程序无法加载csv文件hbase表

徐文斌
2023-03-14

我正在运行这个命令来使用map-reduce程序加载csv文件。它正在成功运行,但扫描hbase表时显示0行。
以下是执行过程的控制台日志数据:

  [hadoop@01HW394491 ~]$ HADOOP_CLASSPATH='hbase classpath' hadoop jar Desktop/bulk.jar /user/hadoop/3.csv /user/hadoop/load bulk
        13/06/07 15:59:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.3-cdh3u1--1, built on 07/18/2011 15:17 GMT
        13/06/07 15:59:00 INFO zookeeper.ZooKeeper: Client environment:host.name=01HW394491
        13/06/07 15:59:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_0
        13/06/07 15:59:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
        13/06/07 15:59:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0/jre
        hbase.mapreduce.inputtable
        13/06/07 15:59:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
        13/06/07 15:59:02 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=172.29.179.59:2181 sessionTimeout=180000 watcher=hconnection
        13/06/07 15:59:02 INFO zookeeper.ClientCnxn: Opening socket connection to server /172.29.179.59:2181
        13/06/07 15:59:02 INFO zookeeper.ClientCnxn: Socket connection established to 01HW394491/172.29.179.59:2181, initiating session
        13/06/07 15:59:02 INFO zookeeper.ClientCnxn: Session establishment complete on server 01HW394491/172.29.179.59:2181, sessionid = 0x13f1e28c4b4000a, negotiated timeout = 180000
        13/06/07 15:59:03 INFO mapred.JobClient: Running job: job_201306071546_0001
        13/06/07 15:59:04 INFO mapred.JobClient:  map 0% reduce 0%
        13/06/07 15:59:11 INFO mapred.JobClient:  map 100% reduce 0%
        13/06/07 15:59:18 INFO mapred.JobClient:  map 100% reduce 33%
        13/06/07 15:59:19 INFO mapred.JobClient:  map 100% reduce 100%
        13/06/07 15:59:19 INFO mapred.JobClient: Job complete: job_201306071546_0001
        13/06/07 15:59:19 INFO mapred.JobClient: Counters: 21
        13/06/07 15:59:19 INFO mapred.JobClient:   Job Counters 
        13/06/07 15:59:19 INFO mapred.JobClient:     Launched reduce tasks=1
        13/06/07 15:59:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5499
        13/06/07 15:59:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Rack-local map tasks=1
        13/06/07 15:59:19 INFO mapred.JobClient:     Launched map tasks=1
        13/06/07 15:59:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=7561
        13/06/07 15:59:19 INFO mapred.JobClient:   FileSystemCounters
        13/06/07 15:59:19 INFO mapred.JobClient:     FILE_BYTES_READ=159
        13/06/07 15:59:19 INFO mapred.JobClient:     HDFS_BYTES_READ=63
        13/06/07 15:59:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=127600
        13/06/07 15:59:19 INFO mapred.JobClient:   Map-Reduce Framework
        13/06/07 15:59:19 INFO mapred.JobClient:     Reduce input groups=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Combine output records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Map input records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Reduce shuffle bytes=6
        13/06/07 15:59:19 INFO mapred.JobClient:     Reduce output records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Spilled Records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Map output bytes=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Combine input records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     Map output records=0
        13/06/07 15:59:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=63
        13/06/07 15:59:19 INFO mapred.JobClient:     Reduce input records=0
        13/06/07 15:59:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=172.29.179.59:2181 sessionTimeout=180000 watcher=hconnection
        13/06/07 15:59:19 INFO zookeeper.ClientCnxn: Opening socket connection to server /172.29.179.59:2181
        13/06/07 15:59:19 INFO zookeeper.ClientCnxn: Socket connection established to 01HW394491/172.29.179.59:2181, initiating session
        13/06/07 15:59:19 INFO zookeeper.ClientCnxn: Session establishment complete on server 01HW394491/172.29.179.59:2181, sessionid = 0x13f1e28c4b4000c, negotiated timeout = 180000
        13/06/07 15:59:19 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://01HW394491:9000/user/hadoop/load/_SUCCESS
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat;
import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;



/**
 * HBase bulk import example<br>
 * Data preparation MapReduce job driver
 * <ol>
 * <li>args[0]: HDFS input path
 * <li>args[1]: HDFS output path
 * <li>args[2]: HBase table name
 * </ol>
 */

public class Driver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        //conf.set("hbase.table.name", "bulk");
        conf.set("hbase.mapreduce.inputtable", args[2]);

        conf.set("hbase.zookeeper.quorum","172.29.179.59");
        conf.set("hbase.zookeeper.property.clientPort", "2181");
        //conf.set("hbase.master", "172.29.179.59:60000");
        //conf.set("hbase.zookeeper.quorum","ibm-r1-node2.apache-nextgen.com");
        HBaseConfiguration.addHbaseResources(conf);

        Job job = new Job(conf, "HBase Bulk Import Example");
        job.setJarByClass(HBaseKVMapper.class);


        job.setMapperClass(HBaseKVMapper.class);
        job.setMapOutputKeyClass(ImmutableBytesWritable.class);
        job.setMapOutputValueClass(KeyValue.class);

        job.setInputFormatClass(TableInputFormat.class);

        HTable hTable = new HTable(args[2]);

        // HTable hTable = new HTable("bulkdata");

        // Auto configure partitioner and reducer
        HFileOutputFormat.configureIncrementalLoad(job, hTable);

        FileInputFormat.addInputPath(job, new Path(args[0]));

        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        /*
         * FileInputFormat.addInputPath(job, new
         * Path("hdfs://localhost:9000/user/685536/input1.csv"));
         * 
         * FileOutputFormat.setOutputPath(job, new
         * Path("hdfs://localhost:9000/user/685536/outputs12348"));
         * 
         */
         System.out.println(TableInputFormat.INPUT_TABLE);
        job.waitForCompletion(true);

        // Load generated HFiles into table
        LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
        loader.doBulkLoad(new Path(args[1]), hTable);

        // loader.doBulkLoad(new
        // Path("hdfs://localhost:9000/user/685536/outputs12348"), hTable);
    }
}
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

/**
 * HBase bulk import example
 * <p>
 * Parses Facebook and Twitter messages from CSV files and outputs
 * <ImmutableBytesWritable, KeyValue>.
 * <p>
 * The ImmutableBytesWritable key is used by the TotalOrderPartitioner to map it
 * into the correct HBase table region.
 * <p>
 * The KeyValue value holds the HBase mutation information (column family,
 * column, and value)
 */
public class HBaseKVMapper extends
        Mapper<LongWritable, Text, ImmutableBytesWritable, KeyValue> {

    final static byte[] SRV_COL_FAM = "fields".getBytes();
    String tableName = "";
    ImmutableBytesWritable hKey = new ImmutableBytesWritable();
    KeyValue kv;

    /** {@inheritDoc} */
    @Override
    protected void setup(Context context) throws IOException,
            InterruptedException {
        Configuration c = context.getConfiguration();

        tableName = c.get("hbase.mapreduce.inputtable");
    }

    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        String fields[] = { "", "", "", "", "", "" };

        String field = value.toString();

        fields = field.split(",");
        String fieldValue = fields[1];
        FileSplit fileSplit = (FileSplit) context.getInputSplit();
        String filename = fileSplit.getPath().getName();
        hKey.set((filename).getBytes());

        for(int i=2 ; i<fields.length ; i++){
            fieldValue = fieldValue.concat(","+fields[i])  ;    
        }

        //fieldValue = fieldValue.substring(0,(fieldValue.length())-2);
        System.out.println(fieldValue);

        kv = new KeyValue(hKey.get(), SRV_COL_FAM, Bytes.toBytes(fields[0]),
                Bytes.toBytes(fieldValue));
        context.write(hKey, kv);

    }
}

共有1个答案

沈宇定
2023-03-14

尝试hadoop_classpath='hbase classpath'hadoop jar desktop/bulk.jar/user/hadoop/user/hadoop/load bulk

我不确定MR是否采用确切的输入路径文件名。它可能需要path(文件夹路径)。

 类似资料:
  • 问题内容: 我是PostgresSQL的新手,我正尝试加载以下格式的数据: 记录超过1万,某些列可能包含NULL数据。当我尝试这样做时: 我收到以下错误:ERROR:最后一个预期的列SQL状态之后的多余数据:22P04 我已经检查了列,它们还可以。这意味着不会遗漏任何列。 希望有人能帮忙。提前谢谢。 问题答案: 应该是 对我有用

  • 我试图托管一个webproject,但是当服务器试图编译它时,我得到以下错误: 未处理的异常:System.io.FileLoadException:无法加载文件或程序集“Microsoft.codeAnalysis,版本=1.1.1.0,Culture=Neutrice,PublicKeyToken=31BF3856AD364E35”或其依赖项之一。定位的程序集的清单定义与程序集引用不匹配。(来

  • 我在我的项目中使用的是umbraco 4.11.3。我的项目在Windows 7上运行良好,并在visual studio 2012上运行。但它在Win 8中从visual studio 2012运行时并不工作! 错误是: 无法加载文件或程序集“System.Web”。Mvc,Version=2.0.0.0,Culture=neutral,PublicKeyToken=31bf3856ad364e

  • 对Liquibase变更集使用SQL样式方法(这是我们的codestyle,我们不使用XML),我试图使用以下SQL变更集加载CSV文件 SQL 主变更日志文件引用2021目录,其中SQL和CSV文件存在于同一目录中。 我尝试了以下其他路径,但它们仍然产生一个FileNotFoundException prices.csv web-inf/classes/liquibase/changelogs/

  • 本文向大家介绍通用MapReduce程序复制HBase表数据,包括了通用MapReduce程序复制HBase表数据的使用技巧和注意事项,需要的朋友参考一下 编写MR程序,让其可以适合大部分的HBase表数据导入到HBase表数据。其中包括可以设置版本数、可以设置输入表的列导入设置(选取其中某几列)、可以设置输出表的列导出设置(选取其中某几列)。 原始表test1数据如下: 每个row key都有两

  • 贡献者:BridgetLai Apache MapReduce 是 Apache Hadoop 提供的软件框架,用来进行大规模数据分析.MapReduce 已超出本文档范围,可通过如下文档学习https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTuto