HIVE_HIVE中的内置UDTF_EXPLODE 和 LATERAL VIEW

扶珂
2023-12-01

参考文章:

1.【学习笔记】hive 之行拆列explode

https://yq.aliyun.com/articles/654743

   HIVE 中内置了 EXPLODE 和 LATERAL VIEW,这两个 Function 非常的有用。尤其是在行转列的情形下。

 

2.hive中的lateral view 与 explode函数的使用

https://blog.csdn.net/guodong2k/article/details/79459282

 

 

本篇文章我们也是用了 str_to_map 函数,不熟悉的同学,可以先看下我的 str_to_map 详解。

https://blog.csdn.net/u010003835/article/details/106632517

 

构建测试数据

USE data_warehouse_test;
 
CREATE TABLE IF NOT EXISTS explode_laterview_org(
	day1_num BIGINT
	,day2_num BIGINT
	,day3_num BIGINT
	,day4_num BIGINT
	,day5_num BIGINT
	,day6_num BIGINT
	,day7_num BIGINT
	,campaign_name STRING
	,campaign_id BIGINT
);
 
 
INSERT OVERWRITE TABLE explode_laterview_org VALUES 
(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )
,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)
;

 

EXPLODE 

EXPLODE 主要用来拆解列或者行的,下面我们看下函数介绍:

desc function explode;

+----------------------------------------------------+
|                      tab_name                      |
+----------------------------------------------------+
| explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns  |
+----------------------------------------------------+
 

 

函数解释 :

   可以看到EXPLODE 函数一共处理两种情况,可以把 Array 切分成 单列多行,也可以把 Map 切分成 key,value 组成的多列多行。

 

案例

   下面我们就上面的数据做一个测试。SQL 如下


SELECT EXPLODE(
			STR_TO_MAP(
					CONCAT(
						'day1_num=',CAST (day1_num AS STRING),
						'&day2_num=',CAST (day2_num AS STRING),
						'&day3_num=',CAST (day3_num AS STRING),
						'&day4_num=',CAST (day4_num AS STRING),
						'&day5_num=',CAST (day5_num AS STRING),
						'&day6_num=',CAST (day6_num AS STRING),
						'&day7_num=',CAST (day7_num AS STRING)
					)
				,'&', '=')
		)
FROM explode_laterview_org
;


SELECT EXPLODE(
			SPLIT(
					CONCAT(
						CAST (day1_num AS STRING),
						',',CAST (day2_num AS STRING),
						',',CAST (day3_num AS STRING),
						',',CAST (day4_num AS STRING),
						',',CAST (day5_num AS STRING),
						',',CAST (day6_num AS STRING),
						',',CAST (day7_num AS STRING)
					)
				,',')
		)
FROM explode_laterview_org
;

 

输出

SQL1

+-----------+--------+
|    key    | value  |
+-----------+--------+
| day1_num  | 40     |
| day2_num  | 20     |
| day3_num  | 10     |
| day4_num  | 4      |
| day5_num  | 4      |
| day6_num  | 2      |
| day7_num  | 1      |
| day1_num  | 100    |
| day2_num  | 80     |
| day3_num  | 53     |
| day4_num  | 40     |
| day5_num  | 7      |
| day6_num  | 6      |
| day7_num  | 5      |
+-----------+--------+

 

SQL2

+------+
| col  |
+------+
| 40   |
| 20   |
| 10   |
| 4    |
| 4    |
| 2    |
| 1    |
| 100  |
| 80   |
| 53   |
| 40   |
| 7    |
| 6    |
| 5    |
+------+

 

 

注意

  使用 EXPLODE ,需要注意一下几点

1、No other expressions are allowed in SELECT

如果只是单一使用了EXPLODE, SELECT 不可以使用其他的表达式

SELECT pageid, explode(adid_list) AS myCol... is not supported

 

2、UDTF's can't be nested

EXPLODE 类似的 UDTF 函数不能嵌套

SELECT explode(explode(adid_list)) AS myCol... is not supported

 

3.GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported

EXPLODE 不可以聚合函数,排序等组合

SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported

 

LATERAL VIEW

    既然EXPLODE 限制这么多,那么很多情况下是不能满足我们需求的。 这个时候使用 LATERAL VIEW 可以去除 EXPLODE 的这些限制。

    LATERAL VIEW 被解释为 侧视图,先看下LATERAL VIEW的介绍

    LATERAL 按照我的理解,其实是和原来的其他列做JOIN。 因为原来的一行还有其他列,对其中一行进行了拆分,还希望带着其他列的信息,这个时候就可以使用 LATERAL VIEW。

 

比如针对上面的测试数据,想转换为以下的数据形式

campagin_id, campagin_name, type, num

1,'zoo','day1_num ',40

1,'zoo' ,'day2_num',30

...

 

这个时候,我们就可以结合上面的 EXPLODE 写出以下的语句

SELECT 
	campaign_id, campaign_name, type, num
FROM explode_laterview_org
LATERAL VIEW
	EXPLODE(
			STR_TO_MAP(
					CONCAT(
						'day1_num=',CAST (day1_num AS STRING),
						'&day2_num=',CAST (day2_num AS STRING),
						'&day3_num=',CAST (day3_num AS STRING),
						'&day4_num=',CAST (day4_num AS STRING),
						'&day5_num=',CAST (day5_num AS STRING),
						'&day6_num=',CAST (day6_num AS STRING),
						'&day7_num=',CAST (day7_num AS STRING)
					)
				,'&', '=')
		) lateral_table AS type, num
;

结果

+--------------+----------------+-----------+------+
| campaign_id  | campaign_name  |   type    | num  |
+--------------+----------------+-----------+------+
| 2            | zoo            | day1_num  | 40   |
| 2            | zoo            | day2_num  | 20   |
| 2            | zoo            | day3_num  | 10   |
| 2            | zoo            | day4_num  | 4    |
| 2            | zoo            | day5_num  | 4    |
| 2            | zoo            | day6_num  | 2    |
| 2            | zoo            | day7_num  | 1    |
| 3            | moji           | day1_num  | 100  |
| 3            | moji           | day2_num  | 80   |
| 3            | moji           | day3_num  | 53   |
| 3            | moji           | day4_num  | 40   |
| 3            | moji           | day5_num  | 7    |
| 3            | moji           | day6_num  | 6    |
| 3            | moji           | day7_num  | 5    |
+--------------+----------------+-----------+------+
 

 

额外提示:

如果有多个拆分列,需要进行笛卡尔积的组合。 可以多重使用 LATERAL VIEW

select goods_id2,sale_info,area2
from explode_lateral_view 
LATERAL VIEW explode(split(goods_id,','))goods as goods_id2 
LATERAL VIEW explode(split(area,','))area as area2;
 

 

 

 

 类似资料: