Lightdb 从 23.1 版本开始支持 oracle 中的unnest/no_unnest hint。
在 Lightdb 中 unnest 只支持语法
Lightdb 根据子查询的位置和作用把子查询分为了两类,一类称为子链接(sublink),一类称为子查询(subquery)。而在 Oracle 中这两种都统称为子查询。
一般来说以范围表形式存在的称为子查询,如下所示(select * from test2 )
即为子查询:
select * from test1, (select * from test2 ) x;
已表达式存在的是子链接,如下所示(select * from test2 where test1.key1=test2.key1)
即为子链接
select * from test1 where exists (select * from test2 where test1.key1=test2.key1);
目前优化器有两种方式对SQL的执行进行优化,分别为 RBO(基于规则的优化) 和 CBO(基于代价的优化)。
在 Oracle 中 CBO(在 Oracle 10g中完全取代 RBO)会对提升子查询与不提升子查询的 cost 进行比较,然后获取更优的路径。
而在 Lightdb 中优化器采用了 RBO 与 CBO 结合的方式, 对于提升子查询这块采用了 RBO 的方式,也即只要子查询可以提升就一定会提升。
目前 Lightdb 可以对符合条件的 exists类型(exists, not exists)和 any(in, any, some)类型的子链接进行提升。不支持对 not in(any 类型) 的提升。
对于 any 类型的子链接,先把子链接提升到 from 中,然后再由提升子查询流程判断是否能继续提升,如下 SQL 由于 limit 即不能继续提升:
explain select * from test1 where key1 in (select key1 from test2 limit1);
QUERY PLAN
---------------------------------------------------------------------------
Hash Semi Join (cost=0.04..38.69 rows=11 width=8)
Hash Cond: (test1.key1 = test2.key1)
-> Seq Scan on test1 (cost=0.00..32.60 rows=2260 width=8)
-> Hash (cost=0.02..0.02 rows=1 width=4)
-> Limit (cost=0.00..0.01 rows=1 width=4)
-> Seq Scan on test2 (cost=0.00..32.60 rows=2260 width=4)
(6 rows)
explain select * from test1, (select key1 from test2 limit1) x where test1.key1=x.key1;
对于exists类型的子链接,可以直接把子链接中的表与 from 中的表进行关联(exists 语义可以忽略子链接中的许多条件,如limit, 不影响结果), 提升效果如下所示:
select * from test1 where exists (select * from test2 where test1.key1=test2.key1);
提升后 SQL 形式(不能直接执行,内部形式)
select * from test1 semijoin test2 where test1.key1=test2.key1;
在生成 semijoin 的路径时,可能可以会计算 innerjoin 的路径,然后比较 cost,取最优的.
We might have a normal semijoin, or a case where we don’t have
enough rels to do the semijoin but can unique-ify the RHS and
then do an innerjoin (see comments in join_is_legal). In the
latter case we can’t apply JOIN_SEMI joining.
If we know how to unique-ify the RHS and one input rel is
exactly the RHS (not a superset) we can consider unique-ifying
it and then doing a regular join. (The create_unique_path
check here is probably redundant with what join_is_legal did,
but if so the check is cheap because it’s cached. So test
anyway to be sure.)
lightdb@postgres=# explain select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1);
QUERY PLAN
---------------------------------------------------------------------------
--------
Seq Scan on test1 @"lt#1" (cost=0.00..1.05 rows=1 width=8)
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
SubPlan 1
-> Seq Scan on test2 @"lt#0" (cost=0.00..170.00 rows=10000 width=0)
Filter: (test1.key1 = key1)
SubPlan 2
-> Seq Scan on test2 test2_1 @"lt#0" (cost=0.00..145.00 rows=10000 w
idth=4)
(7 rows)
Time: 1.305 ms
Lightdb 的 filer 有两种模式:
在使用 no_unnest 不提升子链接后,执行计划中可能会显示为Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
, 然后同时展示两种 subplan。explain analyze
时会显示实际执行了那个 subplan。
lightdb@postgres=# explain analyze select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1);
QUERY PLAN
---------------------------------------------------------------------------
------------------------------------------
Seq Scan on test1 @"lt#1" (cost=0.00..1.05 rows=1 width=8) (actual time=0
.025..0.030 rows=2 loops=1)
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
SubPlan 1
-> Seq Scan on test2 @"lt#0" (cost=0.00..170.00 rows=10000 width=0)
(actual time=0.005..0.005 rows=1 loops=2)
Filter: (test1.key1 = key1)
SubPlan 2
-> Seq Scan on test2 test2_1 @"lt#0" (cost=0.00..145.00 rows=10000 w
idth=4) (never executed)
Planning Time: 0.200 ms
Execution Time: 0.079 ms
(9 rows)
Time: 1.401 ms
这是由于在 Lightdb 中的 plan 阶段,是先执行 subplan 的执行计划生成, 再执行外表的执行计划生成, 此时还不能获取 subplan 需要执行的次数(即外表的行数)。subplan 的选择是在执行阶段进行(ExecInitAlternativeSubPlan函数中选择)
no_unnest hint 用于
lightdb@postgres=# create table test1 (key1 int, key2 int);
CREATE TABLE
lightdb@postgres=# create table test2 (key1 int, key2 int);
CREATE TABLE
lightdb@postgres=# EXPLAIN (COSTS false) select * from test1 where exists(select * from test2 where test1.key1=test2.key1);
QUERY PLAN
----------------------------------------
Hash Join
Hash Cond: (test1.key1 = test2.key1)
-> Seq Scan on test1
-> Hash
-> HashAggregate
Group Key: test2.key1
-> Seq Scan on test2
(7 rows)
lightdb@postgres=# EXPLAIN (COSTS false) select * from test1 where exists(select/*+no_unnest*/ * from test2 where test1.key1=test2.key1);
QUERY PLAN
---------------------------------------------------------
Seq Scan on test1 @"lt#1"
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
SubPlan 1
-> Seq Scan on test2 @"lt#0"
Filter: (test1.key1 = key1)
SubPlan 2
-> Seq Scan on test2 test2_1 @"lt#0"
(7 rows)
lightdb@postgres=# EXPLAIN (COSTS false) select /*+no_unnest(@qb)*/* from test1 where exists(select/*+qb_name(qb)*/ * from test2 where test1.key1=test2.key1);
QUERY PLAN
---------------------------------------------------------
Seq Scan on test1 @"lt#0"
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
SubPlan 1
-> Seq Scan on test2 @qb
Filter: (test1.key1 = key1)
SubPlan 2
-> Seq Scan on test2 test2_1 @qb
(7 rows)
lightdb@postgres=#
下面对可能使用到的情况进行介绍,具体使用需要具体分析,no_unnest hint 只是提供了一种选择。
由于使用filter只需匹配到一行即可,提升后可能需要对多行进行查找,反而性能变差。
比如 filter 中表的第一行一定符合条件,那么每次只需匹配第一行即可(非hashed subplan)。
在如下极端场景(非实际应用场景)即可以使用 no_unnest
create table test1 (key1 int, key2 int);
create table test2 (key1 int, key2 int);
insert into test1 select 1, generate_series(1, 2);
insert into test2 select 1, generate_series(1, 10000);
analyze test2;
不对test1 进行analyze, 在 analyze tets1;
后,提升后会使用nestloop semijoin,也执行匹配第一行即可。
不使用 no_unnest:
lightdb@postgres=# explain analyze select * from test1 where exists(select/* no_unnest*/ * from test2 where test1.key1=test2.key1 limit 1);
QUERY PLAN
---------------------------------------------------------------------------
------------------------------------------------
Hash Join (cost=170.02..221.13 rows=1130 width=8) (actual time=3.492..3.4
96 rows=2 loops=1)
Hash Cond: (test1.key1 = test2.key1)
-> Seq Scan on test1 (cost=0.00..32.60 rows=2260 width=8) (actual time
=0.006..0.007 rows=2 loops=1)
-> Hash (cost=170.01..170.01 rows=1 width=4) (actual time=3.478..3.479
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=170.00..170.01 rows=1 width=4) (actual ti
me=3.472..3.473 rows=1 loops=1)
Group Key: test2.key1
Batches: 1 Memory Usage: 24kB
-> Seq Scan on test2 (cost=0.00..145.00 rows=10000 width=4
) (actual time=0.006..1.476 rows=10000 loops=1)
Planning Time: 0.236 ms
Execution Time: 3.546 ms
(11 rows)
Time: 4.816 ms
使用 no_unnest 后会更快:
lightdb@postgres=# explain analyze select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1 limit 1);
QUERY PLAN
---------------------------------------------------------------------------
------------------------------------------
Seq Scan on test1 @"lt#1" (cost=0.00..71.02 rows=1130 width=8) (actual ti
me=0.023..0.030 rows=2 loops=1)
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
SubPlan 1
-> Seq Scan on test2 @"lt#0" (cost=0.00..170.00 rows=10000 width=0)
(actual time=0.007..0.007 rows=1 loops=2)
Filter: (test1.key1 = key1)
SubPlan 2
-> Seq Scan on test2 test2_1 @"lt#0" (cost=0.00..145.00 rows=10000 w
idth=4) (never executed)
Planning Time: 0.216 ms
Execution Time: 0.072 ms
(9 rows)
Time: 1.056 ms