当前位置: 首页 > 面试题库 >

为什么此查询不会发生分区消除?

扶冠宇
2023-03-14
问题内容

我有一个配置单元表,该配置表按年,月,日和小时划分。我需要对其执行查询以获取最近7天的数据。这是在Hive 0.14.0.2.2.4.2-2。我的查询当前看起来像这样:

SELECT COUNT(column_name) from table_name 
where year >= year(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

这需要很长时间。当我用上面的实际数字代替时,请说:

SELECT COUNT(column_name) from table_name 
where year >= 2017
AND month >= 2
AND day >= 13

它会在几分钟内完成。有什么办法可以更改上面的脚本,以便实际上只包含查询中的数字而不是函数?

我尝试使用set像:

set yearLimit = year(date_sub(from_unixtime(unix_timestamp()), 7));

SELECT COUNT(column_name) from table_name 
where year >= ${hiveconf:yearLimit}
AND month >= month(date_sub(from_unixtime(unix_timestamp()), 7)) 
AND day >= day(date_sub(from_unixtime(unix_timestamp()), 7));

但这不能解决问题。


问题答案:

解决方案

select      count (column_name)

from        table_name

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

原始查询出了什么问题?

unix_timestamp()

以秒为单位获取当前的Unix时间戳。此函数不是确定性的,其值在查询执行范围内也不是固定的,因此会阻止对查询的适当优化-自2.0版开始不推荐使用此函数,而推荐使用CURRENT_TIMESTAMP常量

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

(我只是稍微更改了文档 :-)

由于unix_timestamp()的值在执行期间可能会更改,因此应针对每一行对表达式进行求值,因此可以避免删除分区。

为什么使用SET没有用?

set只是一种文本替换机制。
期间没有任何计算set
唯一发生的事情是为变量分配了 文本
执行查询之前,变量占位符(${hiveconf:...})被分配的 文本 替换。
只有这样,查询才会被解析和执行。


hive> set a=sele;
hive> set b=ct 1+;
hive> set c=1;
hive> ${hiveconf:a}${hiveconf:b}${hiveconf:c};
OK
2

演示版

create table table_name (column_name int) partitioned by (year int,month int,day int);

set hive.exec.dynamic.partition.mode=nonstrict;


insert into table_name partition (year,month,day)

select  pos
       ,year(dt)
       ,month(dt)
       ,day(dt)

from   (select  pe.pos
               ,date_sub (current_date,pe.pos) as dt

        from    (select 1) x 
                lateral view posexplode (split (space (99),' ')) pe
        ) t
;
explain dependency

select      count (column_name)

from        table_name

where       year  >= year  (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and month >= month (date_sub (from_unixtime (unix_timestamp ()),7)) 
        and day   >= day   (date_sub (from_unixtime (unix_timestamp ()),7))
;

{“ input_partitions”:[{“ partitionName”:“ default @ table_name @ year = 2016
/ month = 11 / day = 14”},{“ partitionName”:“ default @ table_name @ year =
2016 / month = 11 / day = 15“},{” partitionName“:” default @ table_name @
year = 2016 / month = 11 / day = 16“},{” partitionName“:” default @
table_name @ year = 2016 / month = 11 / day = 17“ },{“ partitionName”:“
default @ table_name @ year = 2016 / month = 11 / day = 18”}},{“
partitionName”:“ default @ table_name @ year = 2016 / month = 11 / day =
19”}, {“ partitionName”:“ default @ table_name @ year = 2016 / month = 11 /
day = 20”},{“ partitionName”:“ default @ table_name @ year = 2016 / month =
11 / day = 21”},{“ partitionName“:” default @ table_name @ year = 2016 /
month = 11 / day = 22“},{” partitionName“:”default @ table_name @ year =
2016 / month = 11 / day = 23“},{” partitionName“:” default @ table_name @
year = 2016 / month = 11 / day = 24“},{” partitionName“:” default @
table_name @ year = 2016 / month = 11 / day = 25“},{” partitionName“:”
default @ table_name @ year = 2016 / month = 11 / day = 26“},{”
partitionName“:” default @ table_name @ year = 2016 / month = 11 / day =
27“},{” partitionName“:” default @ table_name @ year = 2016 / month = 11 /
day = 28“},{” partitionName“:” default @ table_name @ year = 2016 / month =
11 / day = 29“},{” partitionName“:” default @ table_name @ year = 2016 /
month = 11 / day = 30“},{” partitionName“:” default @ table_name @ year =
2016 / month = 12 / day = 1“},{” partitionName“:” default @ table_name @
year = 2016 / month = 12 / day = 10“},{” partitionName“:”default @
table_name @ year = 2016 / month = 12 / day = 11“},{” partitionName“:”
default @ table_name @ year = 2016 / month = 12 / day = 12“},{”
partitionName“:” default @ table_name @ year = 2016 / month = 12 / day =
13“},{” partitionName“:” default @ table_name @ year = 2016 / month = 12 /
day = 14“},{” partitionName“:” default @ table_name @ year = 2016 / month =
12 / day = 15“},{” partitionName“:” default @ table_name @ year = 2016 /
month = 12 / day = 16“},{” partitionName“:” default @ table_name @ year =
2016 / month = 12 / day = 17“},{” partitionName“:” default @ table_name @
year = 2016 / month = 12 / day = 18“},{” partitionName“:” default @
table_name @ year = 2016 / month = 12 / day = 19“},{” partitionName“:”
default @ table_name @ year = 2016 / month = 12 / day = 2“},{”
partitionName“:”default @ table_name @ year = 2016 / month = 12 / day =
20“},{” partitionName“:” default @ table_name @ year = 2016 / month = 12 /
day = 21“},{” partitionName“:” default @ table_name @ year = 2016 / month =
12 / day = 22“},{” partitionName“:” default @ table_name @ year = 2016 /
month = 12 / day = 23“},{” partitionName“:” default @ table_name @ year =
2016 / month = 12 / day = 24“},{” partitionName“:” default @ table_name @
year = 2016 / month = 12 / day = 25“},{” partitionName“:” default @
table_name @ year = 2016 / month = 12 / day = 26“},{” partitionName“:”
default @ table_name @ year = 2016 / month = 12 / day = 27“},{”
partitionName“:” default @ table_name @ year = 2016 / month = 12 / day =
28“},{” partitionName“:” default @ table_name @ year = 2016 / month = 12 /
day = 29“},{” partitionName“:”default @ table_name @ year = 2016 / month =
12 / day = 3“},{” partitionName“:” default @ table_name @ year = 2016 /
month = 12 / day = 30“},{” partitionName“:” default @ table_name @ year =
2016 / month = 12 / day = 31“},{” partitionName“:” default @ table_name @
year = 2016 / month = 12 / day = 4“},{” partitionName“:” default @
table_name @ year = 2016 / month = 12 / day = 5“},{” partitionName“:”
default @ table_name @ year = 2016 / month = 12 / day = 6“},{”
partitionName“:” default @ table_name @ year = 2016 / month = 12 / day =
7“},{” partitionName“:” default @ table_name @ year = 2016 / month = 12 /
day = 8“},{” partitionName“:” default @ table_name @ year = 2016 / month =
12 / day = 9“},{” partitionName“:” default @ table_name @ year = 2017 /
month = 1 / day = 1“},{” partitionName“:”default @ table_name @ year = 2017
/ month = 1 / day = 10“},{” partitionName“:” default @ table_name @ year =
2017 / month = 1 / day = 11“},{” partitionName“:” default @ table_name @
year = 2017 / month = 1 / day = 12“},{” partitionName“:” default @
table_name @ year = 2017 / month = 1 / day = 13“},{” partitionName“:”
default @ table_name @ year = 2017 / month = 1 / day = 14“},{”
partitionName“:” default @ table_name @ year = 2017 / month = 1 / day =
15“},{” partitionName“:” default @ table_name @ year = 2017 / month = 1 /
day = 16“},{” partitionName“:” default @ table_name @ year = 2017 / month =
1 / day = 17“},{” partitionName“:” default @ table_name @ year = 2017 /
month = 1 / day = 18“},{” partitionName“:” default @ table_name @ year =
2017 / month = 1 / day = 19“},{” partitionName“:”default @ table_name @ year
= 2017 / month = 1 / day = 2“},{” partitionName“:” default @ table_name @
year = 2017 / month = 1 / day = 20“},{” partitionName“:” default @
table_name @ year = 2017 / month = 1 / day = 21“},{” partitionName“:”
default @ table_name @ year = 2017 / month = 1 / day = 22“},{”
partitionName“:” default @ table_name @ year = 2017 / month = 1 / day =
23“},{” partitionName“:” default @ table_name @ year = 2017 / month = 1 /
day = 24“},{” partitionName“:” default @ table_name @ year = 2017 / month =
1 / day = 25“},{” partitionName“:” default @ table_name @ year = 2017 /
month = 1 / day = 26“},{” partitionName“:” default @ table_name @ year =
2017 / month = 1 / day = 27“},{” partitionName“:” default @ table_name @
year = 2017 / month = 1 / day = 28“},{” partitionName“:”default @ table_name
@ year = 2017 / month = 1 / day = 29“},{” partitionName“:” default @
table_name @ year = 2017 / month = 1 / day = 3“},{” partitionName“:” default
@ table_name @ year = 2017 / month = 1 / day = 30“},{” partitionName“:”
default @ table_name @ year = 2017 / month = 1 / day = 31“},{”
partitionName“:” default @ table_name @ year = 2017 / month = 1 / day =
4“},{” partitionName“:” default @ table_name @ year = 2017 / month = 1 / day
= 5“},{” partitionName“:” default @ table_name @ year = 2017 / month = 1 /
day = 6“},{” partitionName“:” default @ table_name @ year = 2017 / month = 1
/ day = 7“},{” partitionName“:” default @ table_name @ year = 2017 / month =
1 / day = 8“},{” partitionName“:” default @ table_name @ year = 2017 / month
= 1 / day = 9“},{” partitionName“:”default @ table_name @ year = 2017 /
month = 2 / day = 1“},{” partitionName“:” default @ table_name @ year = 2017
/ month = 2 / day = 10“},{” partitionName“:” default @ table_name @ year =
2017 / month = 2 / day = 11“},{” partitionName“:” default @ table_name @
year = 2017 / month = 2 / day = 12“},{” partitionName“:” default @
table_name @ year = 2017 / month = 2 / day = 13“},{” partitionName“:”
default @ table_name @ year = 2017 / month = 2 / day = 14“},{”
partitionName“:” default @ table_name @ year = 2017 / month = 2 / day =
15“},{” partitionName“:” default @ table_name @ year = 2017 / month = 2 /
day = 16“},{” partitionName“:” default @ table_name @ year = 2017 / month =
2 / day = 17“},{” partitionName“:” default @ table_name @ year = 2017 /
month = 2 / day = 18“},{” partitionName“:”default @ table_name @ year = 2017
/ month = 2 / day = 19“},{” partitionName“:” default @ table_name @ year =
2017 / month = 2 / day = 2“},{” partitionName“:” default @ table_name @ year
= 2017 / month = 2 / day = 20“},{” partitionName“:” default @ table_name @
year = 2017 / month = 2 / day = 21“},{” partitionName“:” default @
table_name @ year = 2017 / month = 2 / day = 3“},{” partitionName“:” default
@ table_name @ year = 2017 / month = 2 / day = 4“},{” partitionName“:”
default @ table_name @ year = 2017 / month = 2 / day = 5“},{”
partitionName“:” default @ table_name @ year = 2017 / month = 2 / day =
6“},{” partitionName“:” default @ table_name @ year = 2017 / month = 2 / day
= 7“},{” partitionName“:” default @ table_name @ year = 2017 / month = 2 /
day = 8“},{” partitionName“:”default @ table_name @ year = 2017 / month = 2
/ day = 9“}],” input_tables“:[{” tablename“:” default @ table_name“,”
tabletype“:” MANAGED_TABLE“}]}}

explain dependency

select      count (column_name)

from        table_name

where       year  >= year  (date_sub (current_date,7)) 
        and month >= month (date_sub (current_date,7)) 
        and day   >= day   (date_sub (current_date,7))
;

{“ input_partitions”:[{“ partitionName”:“ default @ table_name @ year = 2017
/ month = 2 / day = 14”},{“ partitionName”:“ default @ table_name @ year =
2017 / month = 2 / day = 15“},{” partitionName“:” default @ table_name @
year = 2017 / month = 2 / day = 16“},{” partitionName“:” default @
table_name @ year = 2017 / month = 2 / day = 17“ },{“ partitionName”:“
default @ table_name @ year = 2017 / month = 2 / day = 18”},{“
partitionName”:“ default @ table_name @ year = 2017 / month = 2 / day =
19”}, {“ partitionName”:“ default @ table_name @ year = 2017 / month = 2 /
day = 20”},{“ partitionName”:“ default @ table_name @ year = 2017 / month =
2 / day = 21”}],“ input_tables“:[{” tablename“:” default @ table_name“,”
tabletype“:” MANAGED_TABLE“}]}



 类似资料:
  • 问题内容: 假设我有一个查询发送到我的SQL-Server数据库,它花费了30秒钟以上,并且我的程序引发了SQL Query Timeout异常。查询是否仍然在我的数据库中徘徊,或者在引发异常后立即终止? 问题答案: 客户端使用注意事件向服务器发出查询超时信号。注意事件只是SQL Server客户端可以发送给它的TDS数据包的一种不同类型。除了连接/断开连接,T-SQL批处理和RPC事件外,客户端

  • 因此,我尝试在单个查询中,仅在行不存在的情况下插入行。 我的疑问如下: 有时(非常罕见,但仍然如此),它会生成以下错误: 违反主键约束“主键用户角色”。无法在对象“dbo”中插入重复键。用户的角色。重复的键值为(29851,1)。 是。下面是表的架构的完整SQL: 背景: 这是由托管在Apache服务器上的PHP脚本执行的,在数百次事件中,“随机”发生一次(很可能与并发相关)。 更多信息: 提供:

  • 问题内容: 我有一个查询,在我认为可能不使用索引的情况下,出于好奇,我将其重现: 创建一个具有1.000.000行(在中有10个不同的值,在中有500个字节的数据)的。 创建一个索引并收集表统计信息: 尝试获取和的不同值: 不使用索引,前提是提示不会更改。 我想在这种情况下不能使用索引,但是为什么呢? 问题答案: 我运行了Peter的原始内容并复制了他的结果。然后我应用了dcp的建议… 之所以如此

  • 死锁描述了另外两个线程因为永远等待对方而被阻塞的情况。当死锁发生时,程序永远挂起,你唯一能做的就是杀死程序。 为什么在下面给出的示例生产者-消费者问题中没有发生死锁: 我想知道为什么当同步对象正在等待其他线程释放锁时,在同步块中调用等待方法不会导致死锁?

  • 问题内容: 为什么投给打破,当是? 问题答案: 不幸的是,没有办法对中的数组进行这样的转换。您将必须迭代数组并分别转换每个对象。 原因是安全类型,根本无法确保无需迭代遍历数组的内容就可以将其转换为Uri,这就是为什么必须对其进行迭代并分别进行转换的原因。 基本上因为可以被其他对象继承,所以不能保证数组仅包含对象。但是强制转换为超类型将是可行的,因为那样的话,类型安全就可以了。

  • 问题内容: 我正在编写一个PL / SQL过程,该过程需要动态生成一些查询,其中之一涉及使用来自查询的结果作为参数来创建临时表。 它可以正确编译,但是即使使用非常简单的查询,例如: IT抛出。如果我手动运行创建的查询,它将正确运行。在这一点上,我能够确定是什么导致了问题。 问题答案: 尝试丢掉“;” 从您立即执行的字符串内部。