当前位置: 首页 > 知识库问答 >
问题:

hadoop字数示例

洪安顺
2023-03-14

我是hadoop的新手,刚刚安装了Hadoop2.6。

16/04/30 20:30:33 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/30 20:30:34 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
16/04/30 20:30:34 INFO input.FileInputFormat: Total input paths to process : 1
16/04/30 20:30:34 INFO mapreduce.JobSubmitter: number of splits:1
16/04/30 20:30:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1461971181442_0005
16/04/30 20:30:34 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/04/30 20:30:34 INFO impl.YarnClientImpl: Submitted application application_1461971181442_0005
16/04/30 20:30:34 INFO mapreduce.Job: The url to track the job: http://yoni-Lenovo-Z40-70:8088/proxy/application_1461971181442_0005/
16/04/30 20:30:34 INFO mapreduce.Job: Running job: job_1461971181442_0005
16/04/30 20:30:41 INFO mapreduce.Job: Job job_1461971181442_0005 running in uber mode : false
16/04/30 20:30:41 INFO mapreduce.Job:  map 0% reduce 0%
16/04/30 20:30:46 INFO mapreduce.Job:  map 100% reduce 0%
16/04/30 20:30:51 INFO mapreduce.Job:  map 100% reduce 100%
16/04/30 20:30:52 INFO mapreduce.Job: Job job_1461971181442_0005 completed successfully
16/04/30 20:30:52 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=211511
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=170
        HDFS: Number of bytes written=86
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2923
        Total time spent by all reduces in occupied slots (ms)=2526
        Total time spent by all map tasks (ms)=2923
        Total time spent by all reduce tasks (ms)=2526
        Total vcore-seconds taken by all map tasks=2923
        Total vcore-seconds taken by all reduce tasks=2526
        Total megabyte-seconds taken by all map tasks=2993152
        Total megabyte-seconds taken by all reduce tasks=2586624
    Map-Reduce Framework
        Map input records=1
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=6
        Input split bytes=116
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=166
        CPU time spent (ms)=1620
        Physical memory (bytes) snapshot=426713088
        Virtual memory (bytes) snapshot=3818450944
        Total committed heap usage (bytes)=324009984
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=54
    File Output Format Counters 
        Bytes Written=86
16/04/30 20:30:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/04/30 20:30:52 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
16/04/30 20:30:52 INFO input.FileInputFormat: Total input paths to process : 1
16/04/30 20:30:52 INFO mapreduce.JobSubmitter: number of splits:1
16/04/30 20:30:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1461971181442_0006
16/04/30 20:30:52 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
16/04/30 20:30:52 INFO impl.YarnClientImpl: Submitted application application_1461971181442_0006
16/04/30 20:30:52 INFO mapreduce.Job: The url to track the job: http://yoni-Lenovo-Z40-70:8088/proxy/application_1461971181442_0006/
16/04/30 20:30:52 INFO mapreduce.Job: Running job: job_1461971181442_0006
16/04/30 20:31:01 INFO mapreduce.Job: Job job_1461971181442_0006 running in uber mode : false
16/04/30 20:31:01 INFO mapreduce.Job:  map 0% reduce 0%
16/04/30 20:31:07 INFO mapreduce.Job:  map 100% reduce 0%
16/04/30 20:31:12 INFO mapreduce.Job:  map 100% reduce 100%
16/04/30 20:31:13 INFO mapreduce.Job: Job job_1461971181442_0006 completed successfully
16/04/30 20:31:13 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=6
        FILE: Number of bytes written=210495
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=216
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=7
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3739
        Total time spent by all reduces in occupied slots (ms)=3133
        Total time spent by all map tasks (ms)=3739
        Total time spent by all reduce tasks (ms)=3133
        Total vcore-seconds taken by all map tasks=3739
        Total vcore-seconds taken by all reduce tasks=3133
        Total megabyte-seconds taken by all map tasks=3828736
        Total megabyte-seconds taken by all reduce tasks=3208192
    Map-Reduce Framework
        Map input records=0
        Map output records=0
        Map output bytes=0
        Map output materialized bytes=6
        Input split bytes=130
        Combine input records=0
        Combine output records=0
        Reduce input groups=0
        Reduce shuffle bytes=6
        Reduce input records=0
        Reduce output records=0
        Spilled Records=0
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=125
        CPU time spent (ms)=1010
        Physical memory (bytes) snapshot=427823104
        Virtual memory (bytes) snapshot=3819626496
        Total committed heap usage (bytes)=324534272
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=86
    File Output Format Counters 
        Bytes Written=0

hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep/user/yoni/input/user/yoni/output101“dfs[a-z.]+”

以及在伪分布式模式下的设置,就像在所有的基本tutilies中一样

共有1个答案

印飞捷
2023-03-14

在本例中,您应该将hadoop-2.6.4/etc/hadoop下的所有xml文件放入名为'input'的hdfs文件夹中,该文件夹位于正确的用户主目录(此处为'yoni)中。

因此,首先通过浏览http://localhost:50070(默认情况下)检查HDFS后台进程状态。

其次,通过bin/hdfs dfs-ls/user/yoni/inputbin/hdfs fsck/-files-blocks检查文件的状态。

如果一切顺利,它应该会起作用。

Hadoop MapReduce下一代-设置单个节点集群

 类似资料:
  • 它考虑第一个KV对并给出相同的输出...!!??因为我只有一个价值。为什么它同时考虑钥匙和制造 既然我们考虑的是一次一对千伏?我知道这是一个错误的假设;请有人纠正我这一点

  • 通常,Hadoop示例定义了如何对一个文件或多个文件进行字数计算,字数计算的结果将来自整个集合! 我希望对每个段落进行wordcount,并将其存储在单独的文件中,如paragh(i)_wordcnt.txt。 我能看到para2写para1的wordcount结果吗?或者,如果以其他方式在单独的文件中写入每一段,该如何做,像这样的顺序

  • 因此,从Hadoop教程网站(http://Hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapreducetutorial.html#source_code)上,我了解了如何使用map reduce方法实现单词计数,并且输出的单词都是出现频率的。 我想做的是只有输出是最高频率

  • 所以,我一直在跟踪这个网站上的Mapreduce python代码(http://www . Michael-noll . com/tutorials/writing-an-Hadoop-Mapreduce-program-in-python/),它从一个文本文件中返回字数(即单词及其在文本中出现的次数)。但是,我想知道如何返回出现次数最多的单词。映射器和缩减器如下- 所以,我知道我需要在减速器的