lightdb-no_unnest hint

微生运浩

2023-12-01

Lightdb no_unnest hint

Lightdb 从 23.1 版本开始支持 oracle 中的unnest/no_unnest hint。

unnest：让子查询展开查询，和外部的查询进行关联、合并，从而得到执行计划
no_unnest：让子查询不展开，这时子查询作为FILTER条件来过滤外部查询

在 Lightdb 中 unnest 只支持语法

一. Lightdb 提升子链接简述

1.1 子链接介绍

Lightdb 根据子查询的位置和作用把子查询分为了两类，一类称为子链接（sublink），一类称为子查询（subquery）。而在 Oracle 中这两种都统称为子查询。

一般来说以范围表形式存在的称为子查询，如下所示(select * from test2 )即为子查询:

select * from test1, (select * from test2 ) x;

已表达式存在的是子链接，如下所示(select * from test2 where test1.key1=test2.key1)即为子链接

select * from test1 where exists (select * from test2 where test1.key1=test2.key1);

1.2 提升子链接

目前优化器有两种方式对SQL的执行进行优化，分别为 RBO（基于规则的优化）和 CBO（基于代价的优化）。

1.2.1 Oracle 与 Lightdb 区别

在 Oracle 中 CBO（在 Oracle 10g中完全取代 RBO）会对提升子查询与不提升子查询的 cost 进行比较，然后获取更优的路径。
而在 Lightdb 中优化器采用了 RBO 与 CBO 结合的方式, 对于提升子查询这块采用了 RBO 的方式，也即只要子查询可以提升就一定会提升。

1.2.2 目前 Lightdb 可以提升的子链接形式

目前 Lightdb 可以对符合条件的 exists类型（exists, not exists）和 any（in, any, some）类型的子链接进行提升。不支持对 not in（any 类型）的提升。

1.2.2.1 any 类型

对于 any 类型的子链接，先把子链接提升到 from 中，然后再由提升子查询流程判断是否能继续提升，如下 SQL 由于 limit 即不能继续提升：

explain select * from test1 where key1 in (select key1 from test2 limit1);
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Hash Semi Join  (cost=0.04..38.69 rows=11 width=8)
   Hash Cond: (test1.key1 = test2.key1)
   ->  Seq Scan on test1  (cost=0.00..32.60 rows=2260 width=8)
   ->  Hash  (cost=0.02..0.02 rows=1 width=4)
         ->  Limit  (cost=0.00..0.01 rows=1 width=4)
               ->  Seq Scan on test2  (cost=0.00..32.60 rows=2260 width=4)
(6 rows)

explain select * from test1, (select key1 from test2 limit1) x where test1.key1=x.key1;

1.2.2.1 exists 类型

对于exists类型的子链接，可以直接把子链接中的表与 from 中的表进行关联（exists 语义可以忽略子链接中的许多条件，如limit，不影响结果），提升效果如下所示：

select * from test1 where exists (select * from test2 where test1.key1=test2.key1);

提升后 SQL 形式（不能直接执行，内部形式）
select * from test1 semijoin test2 where test1.key1=test2.key1;

在生成 semijoin 的路径时，可能可以会计算 innerjoin 的路径，然后比较 cost，取最优的.

We might have a normal semijoin, or a case where we don’t have
enough rels to do the semijoin but can unique-ify the RHS and
then do an innerjoin (see comments in join_is_legal). In the
latter case we can’t apply JOIN_SEMI joining.
If we know how to unique-ify the RHS and one input rel is
exactly the RHS (not a superset) we can consider unique-ifying
it and then doing a regular join. (The create_unique_path
check here is probably redundant with what join_is_legal did,
but if so the check is cheap because it’s cached. So test
anyway to be sure.)

1.3 filter subplan

lightdb@postgres=# explain  select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1);
                                    QUERY PLAN                             
        
---------------------------------------------------------------------------
--------
 Seq Scan on test1 @"lt#1"  (cost=0.00..1.05 rows=1 width=8)
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   SubPlan 1
     ->  Seq Scan on test2 @"lt#0"  (cost=0.00..170.00 rows=10000 width=0)
           Filter: (test1.key1 = key1)
   SubPlan 2
     ->  Seq Scan on test2 test2_1 @"lt#0"  (cost=0.00..145.00 rows=10000 w
idth=4)
(7 rows)

Time: 1.305 ms

Lightdb 的 filer 有两种模式：

SubPlan ：类似 nestloop，根据查询条件查找表，看是否查到数据
hashed SubPlan：先创建 hash table，然后根据查询条件查找是否存在数据，只能用于ANY subplan，在某些情况下 exists 可以转换为 any(convert_EXISTS_to_ANY)。

在使用 no_unnest 不提升子链接后，执行计划中可能会显示为Filter: (alternatives: SubPlan 1 or hashed SubPlan 2), 然后同时展示两种 subplan。explain analyze 时会显示实际执行了那个 subplan。

lightdb@postgres=# explain analyze select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1);
                                                     QUERY PLAN            
                                          
---------------------------------------------------------------------------
------------------------------------------
 Seq Scan on test1 @"lt#1"  (cost=0.00..1.05 rows=1 width=8) (actual time=0
.025..0.030 rows=2 loops=1)
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   SubPlan 1
     ->  Seq Scan on test2 @"lt#0"  (cost=0.00..170.00 rows=10000 width=0) 
(actual time=0.005..0.005 rows=1 loops=2)
           Filter: (test1.key1 = key1)
   SubPlan 2
     ->  Seq Scan on test2 test2_1 @"lt#0"  (cost=0.00..145.00 rows=10000 w
idth=4) (never executed)
 Planning Time: 0.200 ms
 Execution Time: 0.079 ms
(9 rows)

Time: 1.401 ms

这是由于在 Lightdb 中的 plan 阶段，是先执行 subplan 的执行计划生成，再执行外表的执行计划生成，此时还不能获取 subplan 需要执行的次数（即外表的行数）。subplan 的选择是在执行阶段进行（ExecInitAlternativeSubPlan函数中选择）

二. no_unnest hint 使用案例

no_unnest hint 用于

2.1 环境准备

lightdb@postgres=# create table test1 (key1 int, key2 int);
CREATE TABLE
lightdb@postgres=# create table test2 (key1 int, key2 int);
CREATE TABLE

2.2 hint 使用前后执行计划

lightdb@postgres=# EXPLAIN (COSTS false) select * from test1 where exists(select * from test2 where test1.key1=test2.key1);
               QUERY PLAN               
----------------------------------------
 Hash Join
   Hash Cond: (test1.key1 = test2.key1)
   ->  Seq Scan on test1
   ->  Hash
         ->  HashAggregate
               Group Key: test2.key1
               ->  Seq Scan on test2
(7 rows)

lightdb@postgres=# EXPLAIN (COSTS false) select * from test1 where exists(select/*+no_unnest*/ * from test2 where test1.key1=test2.key1);
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test1 @"lt#1"
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   SubPlan 1
     ->  Seq Scan on test2 @"lt#0"
           Filter: (test1.key1 = key1)
   SubPlan 2
     ->  Seq Scan on test2 test2_1 @"lt#0"
(7 rows)

lightdb@postgres=# EXPLAIN (COSTS false) select /*+no_unnest(@qb)*/* from test1 where exists(select/*+qb_name(qb)*/ * from test2 where test1.key1=test2.key1);
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on test1 @"lt#0"
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   SubPlan 1
     ->  Seq Scan on test2 @qb
           Filter: (test1.key1 = key1)
   SubPlan 2
     ->  Seq Scan on test2 test2_1 @qb
(7 rows)

lightdb@postgres=#

2.3 使用场景

下面对可能使用到的情况进行介绍，具体使用需要具体分析，no_unnest hint 只是提供了一种选择。

2.3.1 简单介绍

由于使用filter只需匹配到一行即可，提升后可能需要对多行进行查找，反而性能变差。
比如 filter 中表的第一行一定符合条件，那么每次只需匹配第一行即可（非hashed subplan）。

在如下极端场景（非实际应用场景）即可以使用 no_unnest

2.3.1 环境搭建

create table test1 (key1 int, key2 int);
create table test2 (key1 int, key2 int);
insert into test1 select 1,  generate_series(1, 2);
insert into test2 select 1,  generate_series(1, 10000);
analyze test2;

不对test1 进行analyze, 在 analyze tets1; 后，提升后会使用nestloop semijoin，也执行匹配第一行即可。

2.3.1 使用对比

不使用 no_unnest:

lightdb@postgres=# explain analyze select * from test1 where exists(select/* no_unnest*/ * from test2 where test1.key1=test2.key1 limit 1);
                                                        QUERY PLAN         
                                                
---------------------------------------------------------------------------
------------------------------------------------
 Hash Join  (cost=170.02..221.13 rows=1130 width=8) (actual time=3.492..3.4
96 rows=2 loops=1)
   Hash Cond: (test1.key1 = test2.key1)
   ->  Seq Scan on test1  (cost=0.00..32.60 rows=2260 width=8) (actual time
=0.006..0.007 rows=2 loops=1)
   ->  Hash  (cost=170.01..170.01 rows=1 width=4) (actual time=3.478..3.479
 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=170.00..170.01 rows=1 width=4) (actual ti
me=3.472..3.473 rows=1 loops=1)
               Group Key: test2.key1
               Batches: 1  Memory Usage: 24kB
               ->  Seq Scan on test2  (cost=0.00..145.00 rows=10000 width=4
) (actual time=0.006..1.476 rows=10000 loops=1)
 Planning Time: 0.236 ms
 Execution Time: 3.546 ms
(11 rows)

Time: 4.816 ms

使用 no_unnest 后会更快:

lightdb@postgres=# explain analyze select * from test1 where exists(select/*+ no_unnest*/ * from test2 where test1.key1=test2.key1 limit 1);
                                                     QUERY PLAN            
                                          
---------------------------------------------------------------------------
------------------------------------------
 Seq Scan on test1 @"lt#1"  (cost=0.00..71.02 rows=1130 width=8) (actual ti
me=0.023..0.030 rows=2 loops=1)
   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
   SubPlan 1
     ->  Seq Scan on test2 @"lt#0"  (cost=0.00..170.00 rows=10000 width=0) 
(actual time=0.007..0.007 rows=1 loops=2)
           Filter: (test1.key1 = key1)
   SubPlan 2
     ->  Seq Scan on test2 test2_1 @"lt#0"  (cost=0.00..145.00 rows=10000 w
idth=4) (never executed)
 Planning Time: 0.216 ms
 Execution Time: 0.072 ms
(9 rows)

Time: 1.056 ms