分区表查询仍在扫描所有分区

濮丁雷

2023-03-14

问题内容：

我的桌子上有十亿多条记录。为了提高性能，我将其划分为30个分区。最频繁的查询包含(id = ...)在where子句中，因此我决定在表上对表进行分区id。

基本上，分区是通过以下方式创建的：

CREATE TABLE foo_0 (CHECK (id % 30 = 0)) INHERITS (foo);
CREATE TABLE foo_1 (CHECK (id % 30 = 1)) INHERITS (foo);
CREATE TABLE foo_2 (CHECK (id % 30 = 2)) INHERITS (foo);
CREATE TABLE foo_3 (CHECK (id % 30 = 3)) INHERITS (foo);
.
.
.

我运行ANALYZE了整个数据库，尤其是id通过运行以下命令使它为该表的列收集了额外的统计信息：

ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000;

但是，当我运行对id列进行筛选的查询时，计划程序会显示它仍在扫描所有分区。constraint_exclusion设置为partition，所以这不是问题。

EXPLAIN ANALYZE SELECT * FROM foo WHERE (id = 2);


                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.544..215.540 rows=171477 loops=1)
   ->  Append  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=30.539..106.446 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id = 2)
         ->  Bitmap Heap Scan on foo_0 foo  (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.020..0.020 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_0_idx_1  (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.018..0.018 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_1 foo  (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.012..0.012 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_1_idx_1  (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.010..0.010 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3280.30..272541.10 rows=121903 width=56) (actual time=30.504..77.033 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=29.825..29.825 rows=171477 loops=1)
                     Index Cond: (id = 2)
.
.
.

我怎样做才能使刨床有更好的计划？我是否还需要ALTER TABLE foo ALTER COLUMN id SET STATISTICS 10000;为所有分区运行？

编辑

在使用Erwin建议的查询更改后，计划程序仅扫描正确的分区，但是执行时间实际上比完整扫描（至少对索引）要差。

EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
                                                                         QUERY PLAN
                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.611..224.934 rows=171477 loops=1)
   ->  Append  (cost=0.00..8106617.40 rows=3620981 width=54) (actual time=32.606..116.565 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id = 2)
         ->  Bitmap Heap Scan on foo_0 foo  (cost=3293.44..281055.75 rows=122479 width=52) (actual time=0.046..0.046 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_0_idx_1  (cost=0.00..3262.82 rows=122479 width=0) (actual time=0.044..0.044 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_1 foo  (cost=3312.59..274769.09 rows=122968 width=56) (actual time=0.021..0.021 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_1_idx_1  (cost=0.00..3281.85 rows=122968 width=0) (actual time=0.020..0.020 rows=0 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3280.30..272541.10 rows=121903 width=56) (actual time=32.536..86.730 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=31.842..31.842 rows=171477 loops=1)
                     Index Cond: (id = 2)
         ->  Bitmap Heap Scan on foo_3 foo  (cost=3475.87..285574.05 rows=129032 width=52) (actual time=0.035..0.035 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_3_idx_1  (cost=0.00..3443.61 rows=129032 width=0) (actual time=0.031..0.031 rows=0 loops=1)
.
.
.
         ->  Bitmap Heap Scan on foo_29 foo  (cost=3401.84..276569.90 rows=126245 width=56) (actual time=0.019..0.019 rows=0 loops=1)
               Recheck Cond: (id = 2)
               ->  Bitmap Index Scan on foo_29_idx_1  (cost=0.00..3370.28 rows=126245 width=0) (actual time=0.018..0.018 rows=0 loops=1)
                     Index Cond: (id = 2)
 Total runtime: 238.790 ms

相对：

EXPLAIN ANALYZE select * from foo where (id % 30 = 2) and (id = 2);
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..273120.30 rows=611 width=56) (actual time=31.519..257.051 rows=171477 loops=1)
   ->  Append  (cost=0.00..273120.30 rows=611 width=56) (actual time=31.516..153.356 rows=171477 loops=1)
         ->  Seq Scan on foo  (cost=0.00..0.00 rows=1 width=203) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: ((id = 2) AND ((id % 30) = 2))
         ->  Bitmap Heap Scan on foo_2 foo  (cost=3249.97..273120.30 rows=610 width=56) (actual time=31.512..124.177 rows=171477 loops=1)
               Recheck Cond: (id = 2)
               Filter: ((id % 30) = 2)
               ->  Bitmap Index Scan on foo_2_idx_1  (cost=0.00..3249.82 rows=121903 width=0) (actual time=30.816..30.816 rows=171477 loops=1)
                     Index Cond: (id = 2)
 Total runtime: 270.384 ms

问题答案：

对于非平凡的表达式，您必须在查询中重复或多或少的逐字条件，以使Postgres查询计划程序了解其可以依赖CHECK约束。即使显得多余！

每个文档：

启用约束排除后，计划人员将检查每个分区的约束，并尝试证明不需要扫描该分区，因为该分区不能包含满足查询WHERE子句的任何行。
当计划者可以证明这一点时 ，它将从查询计划中排除该分区。

大胆强调我的。计划者不理解复杂的表达式。当然，这也必须满足：

确保未在中禁用constraint_exclusion配置参数postgresql.conf。如果是这样，查询将不会根据需要进行优化。

代替

~~SELECT * FROM foo WHERE (id = 2);~~

尝试：

SELECT * FROM foo WHERE **id % 30 = 2 AND** id = 2;

和：

默认（和推荐）的Constraint_exclusion设置实际上既不是on也不是off，而是称为的中间设置
partition，这导致该技术仅应用于可能在分区表上运行的查询。on设置使计划人员可以检查CHECK所有查询中的约束，即使是那些不太可能受益的简单查询。

您可以尝试使用constraint_exclusion = on来查看计划者是否在没有多余逐字记录条件的情况下继续前进。但是您必须权衡此设置的成本和收益。

替代方法是使用@harmic概述的更简单的分区条件。

不，STATISTICS在这种情况下增加数量将无济于事。仅CHECK约束条件和您WHERE在查询中的条件很重要。

分区表查询仍在扫描所有分区

相关阅读

相关文章

相关问答

相关工具

相关文档