名词解释:
云霄飞车:hive本身对MR Job的 reduce数估算不合理,导致reduce分配过少,任务运行很慢,云霄飞车项目主要对hive本身reduce数的估算进行优化。
map_input_bytes:map输入文件大小,单位:bytes
map_output_bytes:map输出文件大小,单位:bytes
优化背景:
云霄飞车一期存在如下问题:只能优化reduce数>1的MR Job。原因在于无法确定reduce数为1是编译时确定还是根据map输入估算的结果。对于编译时确定,不能进行优化,否则导致结果错误;对于后者,需要进行优化,特别是对于map_output_bytes远大于map_input_bytes的情况,不进行优化将导致reduce执行过慢。
解决方法:
确定reduce数为1是编译时确定还是根据map_input_bytes估算得到的。具体实现方式:编译完成后,收集编译时确定的reduce数为1的Job;云霄飞车优化时,如果此Job的reduce数为1不在收集的Job集合里面,则此Job不是编译时确定的reduce,则进行优化,否则不优化。
优化算法:
hive估算reduce的逻辑如下:
判断Job是否需要reduce操作,如不需要reduce操作,reduce数设置为0,跳出;如需要reduce操作,执行步骤(2);
判断Job是否在编译时确定reduce数为1,如编译确定为1,reduce数设置为1,跳出;如需要reduce操作,执行步骤(3);
判断Job是否手动设置reduce数,如果手动设置reduce数,reduce数设置为此值,跳出;如未手动设置,执行步骤(4);
根据map输入文件大小(map_input_bytes)估算reduce数,默认为输入文件的1G估算为1个reduce,根据估算的reduce设置设置此Job的reduce。
云霄飞车项目对上述步骤(4)进行了优化,即优化根据输入文件的大小评估reduce数的逻辑。对于hive估算的reduce数>1的MR Job,直接按照如下算法重新估算reduce数;对于hive估算的reduce数=1的MR Job,判断此MR Job的reduce数是否是编译时确定的,如果是编译时确定,不进行优化,否则按照相同算法进行优化。
map_output_bytes | 计算公式 | reduce范围 |
0-30GB: | datasize/128M | 1-240 |
30GB-100GB: | 240 + (datasize– 30G) / 512M | 240 - 380 |
100GB-500GB: | 380 + (datasize-100GB) / 1G | 380 - 780 |
500GB以上: | 780 + (datasize-500GB)/2G | 780 - |
INSERT OVERWRITE TABLE tdl_en_dm_account_kw_effect_smt0_tmp5
SELECT a.keyword
,indexation(coalesce(b.search_pv_index, cast(0 as bigint)), '3.14,1.8','0,50,100,1000,5000','1,10,30,100,1000') as search_pv_index
,coalesce(c.gs_tp_member_set_cnt, cast(0 as bigint)) as gs_tp_member_set_cnt
,(case when d.keyword is not null then '1' else '0' end) as is_ban_kw
,dummy_string(200) as dummy_string
FROM (SELECT keyword
FROM tdl_en_dm_account_kw_effect_smt0_tmp0
GROUP BY keyword
) a
LEFT OUTER JOIN
(SELECT trim(upper(keyword)) as keyword
,sum(coalesce(spv, cast (0 as bigint))) as search_pv_index
FROM adl_en_kw_effect_se_norm_fdt0
WHERE hp_stat_date <= '2012-07-31'
AND hp_stat_date >= '2012-07-01'
GROUP BY trim(upper(keyword))
) b
ON (a.keyword = b.keyword)
LEFT OUTER JOIN
(SELECT trim(upper(keyword)) as keyword
,count(distinct admin_member_seq) as gs_tp_member_set_cnt
FROM idl_en_kw_cpt_mem_set_fdt0
WHERE hp_stat_date = '2012-07-31'
AND service_type_id in ('cgs','hkgs','twgs','tp')
AND keyword is not null
GROUP BY trim(upper(keyword))
) c
ON (a.keyword = c.keyword)
LEFT OUTER JOIN
(SELECT trim(upper(keyword)) as keyword
FROM bdl_en07_ipr_keyword_dw_c
WHERE type1 = 'ban'
GROUP BY trim(upper(keyword))
) d
ON (a.keyword = d.keyword);
Total MapReduce jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 698539343) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2406874, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2406874
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2406874
Hadoop job information for Stage-1: number of mappers: 31; number of reducers: 1
2012-09-10 16:26:40,644 Stage-1 map = 0%, reduce = 0%
2012-09-10 16:26:51,523 Stage-1 map = 11%, reduce = 0%
2012-09-10 16:27:02,736 Stage-1 map = 57%, reduce = 0%
2012-09-10 16:27:17,953 Stage-1 map = 99%, reduce = 0%
2012-09-10 16:27:41,117 Stage-1 map = 100%, reduce = 17%
2012-09-10 16:28:09,655 Stage-1 map = 100%, reduce = 45%
2012-09-10 16:28:41,003 Stage-1 map = 100%, reduce = 74%
2012-09-10 16:29:01,683 Stage-1 map = 100%, reduce = 79%
2012-09-10 16:29:04,744 Stage-1 map = 100%, reduce = 82%
2012-09-10 16:29:10,280 Stage-1 map = 100%, reduce = 85%
2012-09-10 16:29:23,987 Stage-1 map = 100%, reduce = 87%
2012-09-10 16:29:33,265 Stage-1 map = 100%, reduce = 90%
2012-09-10 16:29:42,898 Stage-1 map = 100%, reduce = 93%
2012-09-10 16:29:58,016 Stage-1 map = 100%, reduce = 99%
Ended Job = job_201208241319_2406874
Launching Job 2 out of 5
Number of reduce tasks not specified. Estimated from input data size: 65
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 64928671778) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2407439, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2407439
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2407439
Hadoop job information for Stage-3: number of mappers: 333; number of reducers: 65
2012-09-10 16:31:48,096 Stage-3 map = 0%, reduce = 0%
2012-09-10 16:31:58,278 Stage-3 map = 1%, reduce = 0%
2012-09-10 16:32:00,878 Stage-3 map = 4%, reduce = 0%
2012-09-10 16:32:03,450 Stage-3 map = 8%, reduce = 0%
2012-09-10 16:32:05,322 Stage-3 map = 14%, reduce = 0%
2012-09-10 16:32:07,365 Stage-3 map = 22%, reduce = 0%
2012-09-10 16:32:08,801 Stage-3 map = 29%, reduce = 0%
2012-09-10 16:32:10,335 Stage-3 map = 35%, reduce = 0%
2012-09-10 16:32:13,453 Stage-3 map = 43%, reduce = 0%
2012-09-10 16:32:16,894 Stage-3 map = 63%, reduce = 0%
2012-09-10 16:32:20,426 Stage-3 map = 77%, reduce = 0%
2012-09-10 16:32:27,855 Stage-3 map = 90%, reduce = 0%
2012-09-10 16:32:36,965 Stage-3 map = 99%, reduce = 0%
2012-09-10 16:32:43,084 Stage-3 map = 100%, reduce = 0%
2012-09-10 16:32:47,360 Stage-3 map = 100%, reduce = 18%
2012-09-10 16:32:51,149 Stage-3 map = 100%, reduce = 31%
2012-09-10 16:32:53,988 Stage-3 map = 100%, reduce = 38%
2012-09-10 16:32:56,459 Stage-3 map = 100%, reduce = 42%
2012-09-10 16:32:59,834 Stage-3 map = 100%, reduce = 54%
2012-09-10 16:33:03,535 Stage-3 map = 100%, reduce = 63%
2012-09-10 16:33:08,789 Stage-3 map = 100%, reduce = 73%
2012-09-10 16:33:14,299 Stage-3 map = 100%, reduce = 92%
2012-09-10 16:33:18,423 Stage-3 map = 100%, reduce = 99%
2012-09-10 16:33:22,124 Stage-3 map = 100%, reduce = 100%
Ended Job = job_201208241319_2407439
Launching Job 3 out of 5
Number of reduce tasks not specified. Estimated from input data size: 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2711959479) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2407819, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2407819
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2407819
Hadoop job information for Stage-4: number of mappers: 10; number of reducers: 3
2012-09-10 16:33:44,219 Stage-4 map = 0%, reduce = 0%
2012-09-10 16:34:01,388 Stage-4 map = 1%, reduce = 0%
2012-09-10 16:34:11,607 Stage-4 map = 6%, reduce = 0%
2012-09-10 16:34:17,661 Stage-4 map = 11%, reduce = 0%
2012-09-10 16:34:23,270 Stage-4 map = 14%, reduce = 0%
2012-09-10 16:34:32,606 Stage-4 map = 17%, reduce = 0%
2012-09-10 16:34:44,748 Stage-4 map = 22%, reduce = 0%
2012-09-10 16:35:01,395 Stage-4 map = 32%, reduce = 0%
2012-09-10 16:35:18,943 Stage-4 map = 43%, reduce = 0%
2012-09-10 16:35:38,716 Stage-4 map = 54%, reduce = 0%
2012-09-10 16:36:01,974 Stage-4 map = 73%, reduce = 0%
2012-09-10 16:36:21,750 Stage-4 map = 97%, reduce = 0%
2012-09-10 16:36:40,284 Stage-4 map = 100%, reduce = 4%
2012-09-10 16:36:58,595 Stage-4 map = 100%, reduce = 21%
2012-09-10 16:37:17,022 Stage-4 map = 100%, reduce = 52%
2012-09-10 16:37:29,315 Stage-4 map = 100%, reduce = 69%
2012-09-10 16:37:39,690 Stage-4 map = 100%, reduce = 72%
2012-09-10 16:37:50,249 Stage-4 map = 100%, reduce = 75%
2012-09-10 16:38:05,929 Stage-4 map = 100%, reduce = 81%
2012-09-10 16:38:17,927 Stage-4 map = 100%, reduce = 84%
2012-09-10 16:38:27,357 Stage-4 map = 100%, reduce = 87%
2012-09-10 16:38:36,761 Stage-4 map = 100%, reduce = 88%
2012-09-10 16:38:46,276 Stage-4 map = 100%, reduce = 92%
2012-09-10 16:38:53,322 Stage-4 map = 100%, reduce = 95%
2012-09-10 16:39:00,616 Stage-4 map = 100%, reduce = 96%
2012-09-10 16:39:12,326 Stage-4 map = 100%, reduce = 99%
2012-09-10 16:39:21,258 Stage-4 map = 100%, reduce = 100%
Ended Job = job_201208241319_2407819
Launching Job 4 out of 5
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2497170) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2408468, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2408468
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2408468
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 1
2012-09-10 16:40:04,701 Stage-5 map = 0%, reduce = 0%
2012-09-10 16:40:26,284 Stage-5 map = 100%, reduce = 0%
2012-09-10 16:40:48,103 Stage-5 map = 100%, reduce = 100%
Ended Job = job_201208241319_2408468
Launching Job 5 out of 5
Number of reduce tasks not specified. Estimated from input data size: 2
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 1067723025) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2408626, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2408626
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2408626
Hadoop job information for Stage-2: number of mappers: 70; number of reducers: 2
2012-09-10 16:42:40,831 Stage-2 map = 0%, reduce = 0%
2012-09-10 16:43:02,831 Stage-2 map = 94%, reduce = 0%
2012-09-10 16:43:25,577 Stage-2 map = 96%, reduce = 9%
2012-09-10 16:43:38,820 Stage-2 map = 96%, reduce = 17%
2012-09-10 16:43:46,859 Stage-2 map = 97%, reduce = 28%
2012-09-10 16:43:50,491 Stage-2 map = 97%, reduce = 31%
2012-09-10 16:43:57,931 Stage-2 map = 98%, reduce = 31%
2012-09-10 16:44:07,289 Stage-2 map = 99%, reduce = 31%
2012-09-10 16:44:14,606 Stage-2 map = 99%, reduce = 32%
2012-09-10 16:44:26,118 Stage-2 map = 99%, reduce = 33%
2012-09-10 16:44:29,891 Stage-2 map = 100%, reduce = 33%
2012-09-10 16:45:04,755 Stage-2 map = 100%, reduce = 52%
2012-09-10 16:45:14,944 Stage-2 map = 100%, reduce = 67%
2012-09-10 16:45:57,172 Stage-2 map = 100%, reduce = 68%
2012-09-10 16:46:55,271 Stage-2 map = 100%, reduce = 69%
2012-09-10 16:47:34,879 Stage-2 map = 100%, reduce = 70%
2012-09-10 16:48:51,459 Stage-2 map = 100%, reduce = 71%
2012-09-10 16:49:40,682 Stage-2 map = 100%, reduce = 72%
2012-09-10 16:50:31,918 Stage-2 map = 100%, reduce = 73%
2012-09-10 16:51:17,001 Stage-2 map = 100%, reduce = 74%
2012-09-10 16:52:16,802 Stage-2 map = 100%, reduce = 75%
2012-09-10 16:53:26,683 Stage-2 map = 100%, reduce = 76%
2012-09-10 16:54:28,473 Stage-2 map = 100%, reduce = 77%
2012-09-10 16:54:40,219 Stage-2 map = 100%, reduce = 78%
2012-09-10 16:55:15,820 Stage-2 map = 100%, reduce = 79%
2012-09-10 16:56:15,632 Stage-2 map = 100%, reduce = 80%
2012-09-10 16:56:58,645 Stage-2 map = 100%, reduce = 81%
2012-09-10 16:57:34,794 Stage-2 map = 100%, reduce = 82%
2012-09-10 16:58:12,770 Stage-2 map = 100%, reduce = 83%
2012-09-10 16:59:09,950 Stage-2 map = 100%, reduce = 84%
2012-09-10 16:59:56,071 Stage-2 map = 100%, reduce = 85%
2012-09-10 17:00:51,556 Stage-2 map = 100%, reduce = 86%
2012-09-10 17:01:52,019 Stage-2 map = 100%, reduce = 87%
2012-09-10 17:02:33,026 Stage-2 map = 100%, reduce = 88%
2012-09-10 17:03:42,677 Stage-2 map = 100%, reduce = 89%
2012-09-10 17:04:33,151 Stage-2 map = 100%, reduce = 90%
2012-09-10 17:05:21,476 Stage-2 map = 100%, reduce = 91%
2012-09-10 17:05:57,097 Stage-2 map = 100%, reduce = 92%
2012-09-10 17:06:39,520 Stage-2 map = 100%, reduce = 93%
2012-09-10 17:07:28,118 Stage-2 map = 100%, reduce = 94%
2012-09-10 17:08:10,033 Stage-2 map = 100%, reduce = 95%
2012-09-10 17:09:03,468 Stage-2 map = 100%, reduce = 96%
2012-09-10 17:09:42,495 Stage-2 map = 100%, reduce = 97%
2012-09-10 17:10:36,427 Stage-2 map = 100%, reduce = 98%
2012-09-10 17:11:27,875 Stage-2 map = 100%, reduce = 99%
2012-09-10 17:12:33,050 Stage-2 map = 100%, reduce = 99%
2012-09-10 17:12:54,651 Stage-2 map = 100%, reduce = 100%
Ended Job = job_201208241319_2408626
Loading data to table tdl_en_dm_account_kw_effect_smt0_tmp5
27964711 Rows loaded to tdl_en_dm_account_kw_effect_smt0_tmp5
OK
Time taken: 2911.069 seconds
分析执行过程,发现主要的时间消耗在reduce阶段,主要是因为hadoop根据reduce的input数据量大小来计算需要的reduce数量
(input bytes)/1024/1024/1024,而估算的reduce数量不尽合理,导致任务执行较慢,在资源充裕的情况下可以使得增加reduce数量
以使得效率提升。
set mapred.reduce.tasks=200;
再执行该sql,结果如下:
Total MapReduce jobs = 5
Launching Job 1 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 698539343) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2418716, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2418716
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2418716
Hadoop job information for Stage-1: number of mappers: 31; number of reducers: 200
2012-09-10 18:27:38,519 Stage-1 map = 4%, reduce = 0%
2012-09-10 18:27:54,686 Stage-1 map = 86%, reduce = 0%
2012-09-10 18:28:12,664 Stage-1 map = 100%, reduce = 1%
2012-09-10 18:28:32,951 Stage-1 map = 100%, reduce = 97%
Ended Job = job_201208241319_2418716
Launching Job 2 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 64928671778) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2418954, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2418954
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2418954
Hadoop job information for Stage-3: number of mappers: 333; number of reducers: 200
2012-09-10 18:29:58,542 Stage-3 map = 41%, reduce = 0%
2012-09-10 18:30:18,341 Stage-3 map = 97%, reduce = 0%
2012-09-10 18:30:39,798 Stage-3 map = 100%, reduce = 30%
2012-09-10 18:30:57,445 Stage-3 map = 100%, reduce = 33%
2012-09-10 18:31:30,148 Stage-3 map = 100%, reduce = 48%
2012-09-10 18:31:36,229 Stage-3 map = 100%, reduce = 82%
2012-09-10 18:31:40,261 Stage-3 map = 100%, reduce = 95%
2012-09-10 18:31:43,385 Stage-3 map = 100%, reduce = 98%
2012-09-10 18:31:46,417 Stage-3 map = 100%, reduce = 99%
2012-09-10 18:31:49,988 Stage-3 map = 100%, reduce = 100%
Ended Job = job_201208241319_2418954
Launching Job 3 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2711959479) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419277, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419277
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419277
Hadoop job information for Stage-4: number of mappers: 10; number of reducers: 200
2012-09-10 18:32:39,666 Stage-4 map = 0%, reduce = 0%
2012-09-10 18:32:51,789 Stage-4 map = 2%, reduce = 0%
2012-09-10 18:33:11,546 Stage-4 map = 13%, reduce = 0%
2012-09-10 18:33:32,475 Stage-4 map = 23%, reduce = 0%
2012-09-10 18:33:49,567 Stage-4 map = 33%, reduce = 0%
2012-09-10 18:34:05,118 Stage-4 map = 36%, reduce = 0%
2012-09-10 18:34:25,977 Stage-4 map = 48%, reduce = 0%
2012-09-10 18:34:38,126 Stage-4 map = 55%, reduce = 0%
2012-09-10 18:34:46,751 Stage-4 map = 63%, reduce = 0%
2012-09-10 18:34:52,980 Stage-4 map = 67%, reduce = 0%
2012-09-10 18:34:55,887 Stage-4 map = 73%, reduce = 0%
2012-09-10 18:35:03,626 Stage-4 map = 78%, reduce = 0%
2012-09-10 18:35:09,209 Stage-4 map = 82%, reduce = 0%
2012-09-10 18:35:13,249 Stage-4 map = 84%, reduce = 0%
2012-09-10 18:35:17,927 Stage-4 map = 85%, reduce = 0%
2012-09-10 18:35:24,694 Stage-4 map = 89%, reduce = 0%
2012-09-10 18:35:32,634 Stage-4 map = 90%, reduce = 0%
2012-09-10 18:35:34,874 Stage-4 map = 91%, reduce = 0%
2012-09-10 18:35:37,460 Stage-4 map = 93%, reduce = 0%
2012-09-10 18:35:39,766 Stage-4 map = 95%, reduce = 0%
2012-09-10 18:35:42,091 Stage-4 map = 97%, reduce = 0%
2012-09-10 18:35:51,546 Stage-4 map = 100%, reduce = 0%
2012-09-10 18:35:57,990 Stage-4 map = 100%, reduce = 11%
2012-09-10 18:36:11,144 Stage-4 map = 100%, reduce = 90%
2012-09-10 18:36:24,157 Stage-4 map = 100%, reduce = 99%
2012-09-10 18:36:45,706 Stage-4 map = 100%, reduce = 100%
Ended Job = job_201208241319_2419277
Launching Job 4 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 2497056) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419707, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419707
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419707
Hadoop job information for Stage-5: number of mappers: 2; number of reducers: 200
2012-09-10 18:37:20,531 Stage-5 map = 0%, reduce = 0%
2012-09-10 18:37:30,908 Stage-5 map = 100%, reduce = 0%
2012-09-10 18:37:45,810 Stage-5 map = 100%, reduce = 86%
2012-09-10 18:37:54,667 Stage-5 map = 100%, reduce = 99%
Ended Job = job_201208241319_2419707
Launching Job 5 out of 5
Number of reduce tasks not specified. Defaulting to jobconf value of: 200
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Cannot run job locally: Input Size (= 1327722733) is larger than hive.exec.mode.local.auto.inputbytes.max (= -1)
Warning:Can't find tianshu info:mark id missing!
Starting Job = job_201208241319_2419881, Tracking URL = http://hdpjt:50030/jobdetails.jsp?jobid=job_201208241319_2419881
Kill Command = /dhwdata/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=hdpjt:9001 -kill job_201208241319_2419881
Hadoop job information for Stage-2: number of mappers: 800; number of reducers: 200
2012-09-10 18:40:52,642 Stage-2 map = 79%, reduce = 0%
2012-09-10 18:41:09,558 Stage-2 map = 100%, reduce = 19%
2012-09-10 18:41:14,070 Stage-2 map = 100%, reduce = 34%
2012-09-10 18:41:16,301 Stage-2 map = 100%, reduce = 48%
2012-09-10 18:41:18,580 Stage-2 map = 100%, reduce = 60%
2012-09-10 18:41:20,193 Stage-2 map = 100%, reduce = 68%
2012-09-10 18:41:21,253 Stage-2 map = 100%, reduce = 73%
2012-09-10 18:41:23,210 Stage-2 map = 100%, reduce = 77%
2012-09-10 18:41:25,600 Stage-2 map = 100%, reduce = 83%
2012-09-10 18:41:28,022 Stage-2 map = 100%, reduce = 89%
2012-09-10 18:41:31,500 Stage-2 map = 100%, reduce = 93%
2012-09-10 18:41:36,121 Stage-2 map = 100%, reduce = 98%
2012-09-10 18:41:40,743 Stage-2 map = 100%, reduce = 100%
Ended Job = job_201208241319_2419881
Loading data to table tdl_en_dm_account_kw_effect_smt0_tmp5
53095125 Rows loaded to tdl_en_dm_account_kw_effect_smt0_tmp5
OK
Time taken: 1020.148 seconds
性能提升约3倍。
---------------------
作者:lpxuan151009
来源:CSDN
原文:https://blog.csdn.net/lpxuan151009/article/details/7956554
版权声明:本文为博主原创文章,转载请附上博文链接!