PostgreSql 索引

屠建本

2023-12-01

一、概述

索引主要被用来提升数据库性能，不当的使用会导致性能变差。 PostgreSQL 提供了多种索引类型： B-tree、Hash、GiST、SP-GiST 、GIN 和 BRIN。每一种索引类型使用了一种不同的算法来适应不同类型的查询。默认情况下，CREATE INDEX 命令创建适合于大部分情况的 B-tree 索引。

B-树（默认）：B-树是一个自平衡树（self-balancing tree），按照顺序存储数据，支持对数时间复杂度（O(logN)）的搜索、插入、删除和顺序访问。
哈希：哈希索引（Hash index）只能用于简单的等值查找（=），也就是说索引字段被用于等号条件判断。因为对数据进行哈希运算之后不再保留原来的大小关系。
GiST：GiST 代表通用搜索树（Generalized Search Tree）， GiST 索引单个索引类型，而是一种支持不同索引策略的框架。 GiST 索引常见的用途包括几何数据的索引和全文搜索。
SP-GiST：SP-GiST 代表空间分区 GiST，主要用于 GIS、多媒体、电话路由以及 IP 路由等数据的索引。与 GiST 类似， SP-GiST 也支持“最近邻”搜索。
GIN：GIN 代表广义倒排索引（generalized inverted indexes），主要用于单个字段中包含多个值的数据，例如 hstore、 array、 jsonb 以及 range 数据类型。一个倒排索引为每个元素值都创建一个单独的索引项，可以有效地查询某个特定元素值是否存在。 Google、百度这种搜索引擎利用的就是倒排索引。
BRIN：BRIN 代表块区间索引（block range indexes），存储了连续物理范围区间内的数据摘要信息。BRIN 也相比 B-树索引要小很多，维护也更容易。对于不进行水平分区就无法使用 B-树索引的超大型表，可以考虑 BRIN。

在使用上，除了常见的单列索引，还有多列索引、唯一索引、表达式索引、部分索引、覆盖索引等。

多列索引：目前，只有 B-tree、GiST、GIN 和 BRIN 索引类型支持多列索引，最多可以指定32个列（该限制可以在源代码文件 pg_config_manual.h 中修改，但是修改后需要重新编译PostgreSQL）。
唯一索引：目前，只有 B-tree 能够被声明为唯一。
表达式索引：从表的一列或多列计算而来的一个函数或者标量表达式。索引表达式的维护代价较为昂贵，因为在每一个行被插入或更新时都得为它重新计算相应的表达式。然而，索引表达式在进行索引搜索时却不需要重新计算，因为它们的结果已经被存储在索引中了。
部分索引：一个部分索引是建立在表的一个子集上，而该子集则由一个条件表达式（被称为部分索引的谓词）定义。而索引中只包含那些符合该谓词的表行的项。使用部分索引的一个主要原因是避免索引公值（查询结果行在一个表中占比超过一定百分比的值不会使用索引）。
覆盖索引：目前，B-树索引总是支持只用索引的扫描。GiST 和 SP-GiST 索引只对某些操作符类支持只用索引的扫描。其他索引类型不支持这种扫描。仅访问索引就可获取查询所需的全部数据，无需回表（Index-Only Scan）。

二、语法

CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON [ ONLY ] table_name [ USING method ]
    ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass [ ( opclass_parameter = value [, ... ] ) ] ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
    [ INCLUDE ( column_name [, ...] ) ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    [ WHERE predicate ]

UNIQUE：唯一索引，在索引被创建时（如果数据已经存在）或者加入数据时检查重复值。
CONCURRENTLY：在构建索引时不会取得任何会阻止该表上并发插入、更新或者删除的锁。而标准的索引构建将会把表锁住以阻止对表的写（但不阻塞读），这种锁定会持续到索引创建完毕。
IF NOT EXISTS：如果一个同名关系已经存在则不要抛出错误。
INCLUDE：指定一个列的列表，其中的列将被包括在索引中作为非键列。不能作为索引扫描的条件，主要作用是相关数据索存储在索引中，访问时无需访问该索引的基表。当前，有B-树和GiST索引访问方法支持这一特性。
name：要创建的索引名称。这里不能包括模式名，因为索引总是被创建在其基表所在的模式中。如果索引名称被省略，PostgreSQL 将基于基表名称和被索引列名称选择一个合适的名称。
ONLY：如果该表是分区表，指示不要在分区上递归创建索引。默认会递归创建索引。
table_name：要被索引的表的名称（可以被模式限定）。
method：要使用的索引方法的名称。可以选择 btree、hash、 gist、spgist、 gin 以及brin。默认方法是 btree。
column_name：一个表列的名称。
expression：一个基于一个或者更多个表列的表达式。如语法中所示，表达式通常必须被写在圆括号中。不过，如果该表达式是一个函数调用的形式，圆括号可以被省略。
collation：要用于该索引的排序规则的名称。
opclass：一个操作符类的名称。
opclass_parameter：运算符类参数的名称。
ASC：指定上升排序（默认）。
DESC：指定下降排序。
NULLS FIRST：指定把空值排序在非空值前面。在指定DESC时，这是默认行为。
NULLS LAST：指定把空值排序在非空值后面。在没有指定DESC时，这是默认行为。
storage_parameter：索引方法相关的存储参数的名称。可选的WITH子句为索引指定存储参数。每一种索引方法都有自己的存储参数集合。B-树、哈希、GiST以及SP-GiST索引方法都接受这个参数：

fillfactor (integer) :索引的填充因子是一个百分数，它决定索引方法将尝试填充索引页面的充满程度。对于B-树，在初始的索引构建过程中，叶子页面会被填充至该百分数，当在索引右端扩展索引（增加新的最大键值）时也会这样处理。如果页面后来被完全填满，它们就会被分裂，导致索引的效率逐渐退化。B-树使用了默认的填充因子 90，但是也可以选择为 10 到 100 的任何整数值。如果表是静态的，那么填充因子 100 是最好的，因为它可以让索引的物理尺寸最小化。但是对于更新负荷很重的表，较小的填充因子有利于最小化对页面分裂的需求。其他索引方法以不同但是大致类似的方式使用填充因子，不同方法的默认填充因子也不相同。
deduplicate_items (boolean)：B 树重复数据删除技术的使用。设置为 ON 或 OFF 以启用或禁用优化。默认值为ON。
vacuum_cleanup_index_scale_factor：指定在以前的统计信息收集过程中计数到的堆元组总数的一个分数，插入不超过这一数量所代表的元组不会导致VACUUM清理阶段的索引扫描。这个设置当前仅适用于B-树索引。
buffering (enum)：适用于 GiST 索引，决定是否用缓冲构建技术来构建索引。OFF 会禁用它，ON 则启用该特性，如果设置为 AUTO 则初始会禁用它，但是一旦索引尺寸到达 effective_cache_size 就会随时打开。默认值是 AUTO。
fastupdate (boolean)：适用于 GIN 索引，这个设置控制快速更新技术的使用。它是一个布尔参数：ON 启用快速更新，OFF 禁用。默认是 ON。
gin_pending_list_limit (integer)：适用于 GIN 索引，设置 fastupdate 被启用时可以使用的 GIN 索引的待处理列表的最大尺寸。如果该列表增长到超过这个最大尺寸，会通过批量将其中的项移入索引的主 GIN 数据结构来清理列表。如果指定值时没有单位，则以千字节为单位。默认值是四兆字节（4MB）。可以通过更改索引的存储参数来为个别 GIN 索引覆盖这个设置。
pages_per_range (integer)：使用于 BRIN 索引，定义用于每一个BRIN索引项的块范围由多少个表块组成。默认是128。
autosummarize (boolean)：定义是否只要在下一个页面上检测到插入就为前面的页面范围运行概要操作。

三、示例

1）B-tree 索引

postgres=# create index idx_test_id on test(id);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_id" btree (id)

postgres=# explain analyze select * from test where id between 100 and 200;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test_id on test  (cost=0.43..10.49 rows=103 width=15) (actual time=0.006..0.058 rows=101 loops=1)
   Index Cond: ((id >= 100) AND (id <= 200))
 Planning Time: 0.408 ms
 Execution Time: 0.072 ms
(4 rows)

2）Hash 索引

postgres=# create index idx_test_id on test using hash(id);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_id" hash (id)

postgres=# explain analyze select * from test where id = 100;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test_id on test  (cost=0.00..8.02 rows=1 width=15) (actual time=0.014..0.015 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.142 ms
 Execution Time: 0.029 ms
(4 rows)

3）GiST 索引

postgres=# create index idx_t_gist_pos on t_gist using gist(pos);
CREATE INDEX
postgres=# \d t_gist
               Table "public.t_gist"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 pos    | point   |           |          |
Indexes:
    "idx_t_gist_pos" gist (pos)

postgres=# explain analyze select * from t_gist where circle '((100,100) 10)'  @> pos;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_gist  (cost=5.06..271.70 rows=100 width=20) (actual time=0.048..0.092 rows=29 loops=1)
   Recheck Cond: ('<(100,100),10>'::circle @> pos)
   Heap Blocks: exact=29
   ->  Bitmap Index Scan on idx_t_gist_pos  (cost=0.00..5.03 rows=100 width=0) (actual time=0.027..0.027 rows=29 loops=1)
         Index Cond: (pos <@ '<(100,100),10>'::circle)
 Planning Time: 0.092 ms
 Execution Time: 0.136 ms
(7 rows)

4）SP-GiST 索引

postgres=# create index idx_t_spgist_p on t_spgist using spgist(p);
CREATE INDEX
postgres=# \d t_spgist
              Table "public.t_spgist"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 p      | point   |           |          |
Indexes:
    "idx_t_spgist_p" spgist (p)

postgres=# explain analyze select * from t_spgist where p >^ point '(2,7)';
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_spgist  (cost=305.78..1067.78 rows=10000 width=20) (actual time=7.387..15.011 rows=99245 loops=1)
   Recheck Cond: (p >^ '(2,7)'::point)
   Heap Blocks: exact=637
   ->  Bitmap Index Scan on idx_t_spgist_p  (cost=0.00..303.28 rows=10000 width=0) (actual time=7.302..7.303 rows=99245 loops=1)
         Index Cond: (p >^ '(2,7)'::point)
 Planning Time: 0.124 ms
 Execution Time: 17.611 ms
(7 rows)

5）GIN 索引

postgres=# create index idx_ts_doc_tsv on ts using gin(doc_tsv);
CREATE INDEX
postgres=# \d ts
                  Table "public.ts"
 Column  |   Type   | Collation | Nullable | Default
---------+----------+-----------+----------+---------
 doc     | text     |           |          |
 doc_tsv | tsvector |           |          |
Indexes:
    "idx_ts_doc_tsv" gin (doc_tsv)
postgres=# set enable_seqscan = off;
SET
postgres=# explain analyze select * from ts where doc_tsv @@ to_tsquery('many & slitter');
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ts  (cost=12.25..16.51 rows=1 width=64) (actual time=0.046..0.047 rows=1 loops=1)
   Recheck Cond: (doc_tsv @@ to_tsquery('many & slitter'::text))
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on idx_ts_doc_tsv  (cost=0.00..12.25 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1)
         Index Cond: (doc_tsv @@ to_tsquery('many & slitter'::text))
 Planning Time: 0.176 ms
 Execution Time: 0.071 ms
(7 rows)

6）BRIN 索引

postgres=# create index idx_test_id on test using brin(id);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_id" brin (id)

postgres=# explain analyze select * from test where id between 100 and 200;
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=12.03..27651.36 rows=1 width=15) (actual time=0.079..4.006 rows=101 loops=1)
   Recheck Cond: ((id >= 100) AND (id <= 200))
   Rows Removed by Index Recheck: 23579
   Heap Blocks: lossy=128
   ->  Bitmap Index Scan on idx_test_id  (cost=0.00..12.03 rows=23585 width=0) (actual time=0.067..0.067 rows=1280 loops=1)
         Index Cond: ((id >= 100) AND (id <= 200))
 Planning Time: 0.240 ms
 Execution Time: 4.045 ms
(8 rows)

7）其他用法

--多列索引
postgres=# create index idx_test_dl on test(id,name);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_dl" btree (id, name)

postgres=# explain analyze select * from test where id = 100 and name = 'val:100';
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_test_dl on test  (cost=0.43..8.45 rows=1 width=15) (actual time=0.086..0.087 rows=1 loops=1)
   Index Cond: ((id = 100) AND (name = 'val:100'::text))
   Heap Fetches: 1
 Planning Time: 0.224 ms
 Execution Time: 0.104 ms
(5 rows)

--唯一索引
postgres=# create unique index idx_test_wy  on test(id);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_wy" UNIQUE, btree (id)

postgres=# explain analyze select * from test where id = 100;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test_wy on test  (cost=0.43..8.45 rows=1 width=15) (actual time=0.099..0.101 rows=1 loops=1)
   Index Cond: (id = 100)
 Planning Time: 0.350 ms
 Execution Time: 0.119 ms
(4 rows)

--表达式索引
postgres=# create index idx_test_bds on test((id+1000));
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_bds" btree ((id + 1000))

postgres=# explain analyze select * from test where id+1000 > 10000;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=31205.10..83233.11 rows=1666667 width=15) (actual time=578.175..1501.361 rows=4991000 loops=1)
   Recheck Cond: ((id + 1000) > 10000)
   Heap Blocks: exact=26980
   ->  Bitmap Index Scan on idx_test_bds  (cost=0.00..30788.43 rows=1666667 width=0) (actual time=574.862..574.863 rows=4991000 loops=1)
         Index Cond: ((id + 1000) > 10000)
 Planning Time: 0.110 ms
 Execution Time: 1633.012 ms
(7 rows)

--部分索引
postgres=# create index idx_test_bf on test(id) where (id between 1000 and 2000);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_bf" btree (id) WHERE id >= 1000 AND id <= 2000

postgres=# explain analyze select * from test where id between 1500 and 2000;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_test_bf on test  (cost=0.28..27.29 rows=515 width=15) (actual time=0.035..0.090 rows=501 loops=1)
   Index Cond: (id >= 1500)
 Planning Time: 0.265 ms
 Execution Time: 0.110 ms
(4 rows)

postgres=# explain analyze select * from test where id between 500 and 1000;
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=10000000000.00..10000102028.00 rows=272 width=15) (actual time=0.035..427.097 rows=501 loops=1)
   Filter: ((id >= 500) AND (id <= 1000))
   Rows Removed by Filter: 4999499
 Planning Time: 0.107 ms
 Execution Time: 427.120 ms
(5 rows)

--覆盖索引
postgres=# create index idx_test_fg on test(id) include(name);
CREATE INDEX
postgres=# \d test
                Table "public.test"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           |          |
 name   | text    |           |          |
Indexes:
    "idx_test_fg" btree (id) INCLUDE (name)

postgres=# explain analyze select * from test where id = 100;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using idx_test_fg on test  (cost=0.43..8.45 rows=1 width=15) (actual time=0.031..0.032 rows=1 loops=1)
   Index Cond: (id = 100)
   Heap Fetches: 1
 Planning Time: 0.264 ms
 Execution Time: 0.047 ms
(5 rows)

四、常用管理操作

--查询
postgres=# select * from pg_indexes where tablename = 'test';
 schemaname | tablename |  indexname  | tablespace |                                indexdef
------------+-----------+-------------+------------+-------------------------------------------------------------------------
 public     | test      | idx_test_fg |            | CREATE INDEX idx_test_fg ON public.test USING btree (id) INCLUDE (name)
(1 row)

--重建
postgres=# reindex index idx_test_fg;
REINDEX

--重命名
postgres=# alter index idx_test_fg rename to idx_test_id;
ALTER INDEX

--修改表空间
postgres=# alter index idx_test_id set tablespace tab1;
ALTER INDEX

--删除
postgres=# drop index idx_test_id;
DROP INDEX