前面测试了PostgreSQL 行存储引擎和MonetDB列存储的性能差别. 包括导入, 查询, 关联查询等.
本文将测试一下PostgreSQL使用cstore_fdw插件, 对比MonetDB的性能.
测试方法和性能数据见 :
汇总3种测试结果的对比
5000万 60个int型字段 宽表 导入速度 :
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 (开启压缩) |
320秒 | 1088秒 | 1344秒 |
存储空间 :
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
12G | 13G | 12G |
1. select count(distinct c1) from t1;
50000000
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
1.8秒 | 60秒 | 42秒 |
2. select count(distinct c1) from t2;
50000000
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
1.8秒 | 59秒 | 42秒 |
3. select count(*) from (select c1,c11,c21,c31,c41,c51,c60 from t1 group by c1,c11,c21,c31,c41,c51,c60) as t;
50000000
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
2.3秒 | 113秒 | 73秒 |
4. select count(*) from (select t1.c1 from t1 join t2 on t1.c2=t2.c3) as t;
49996801
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
28秒 | 328秒 | 279秒 |
5. select count(*) from (select t1.c1 from t1 join t2 on t1.c2=t2.c3 join t3 on t1.c3=t3.c5) as t;
49995545
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
72秒 | 736秒 | 551秒 |
6. select count(*) from (select t1.c1 from t1 join t2 on t1.c2=t2.c3 join t3 on t1.c3=t3.c5 join t4 on t3.c5=t4.c6) as t;
50001556
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
103秒 | 923秒 | 706秒 |
7. count(*) from (select t1.c1 from t1 join t2 on t1.c2=t2.c3 join t3 on t2.c3=t3.c4 join t4 on t3.c4=t4.c5 join t5 on t4.c5=t5.c6) as t;
50024539
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
125秒 | 981秒 | 737秒 |
8. select count(*) from (
select t1.c1 from t1
join t2 on t1.c2=t2.c3
join t3 on t2.c3=t3.c4
join t4 on t3.c4=t4.c5
join t5 on t4.c5=t5.c6
where t1.c60 < 100000
) as t;
98744
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
13秒 | 212秒 | 233秒 |
9. select count(*) from (
select t1.c1 from t1
join t2 on t1.c2=t2.c3
join t3 on t2.c3=t3.c4
join t4 on t3.c4=t4.c5
join t5 on t4.c5=t5.c6
join t6 on t5.c5=t6.c6
join t7 on t6.c5=t7.c6
join t8 on t7.c5=t8.c6
join t9 on t8.c5=t9.c6
join t10 on t9.c5=t10.c6
join t11 on t10.c5=t11.c6
join t12 on t11.c5=t12.c6
join t13 on t12.c5=t13.c6
join t14 on t13.c5=t14.c6
join t15 on t14.c5=t15.c6
join t16 on t15.c5=t16.c6
join t17 on t16.c5=t17.c6
) as t;
50304982
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
310秒 | 5306秒 (16表) | 4059秒 |
10. select count(*) from (
select t1.c1 from t1
join t2 on t1.c2=t2.c3
join t3 on t2.c3=t3.c4
join t4 on t3.c4=t4.c5
join t5 on t4.c5=t5.c6
join t6 on t5.c5=t6.c6
join t7 on t6.c5=t7.c6
join t8 on t7.c5=t8.c6
join t9 on t8.c5=t9.c6
join t10 on t9.c5=t10.c6
join t11 on t10.c5=t11.c6
join t12 on t11.c5=t12.c6
join t13 on t12.c5=t13.c6
join t14 on t13.c5=t14.c6
join t15 on t14.c5=t15.c6
join t16 on t15.c5=t16.c6
join t17 on t16.c5=t17.c6
where t1.c60 < 100000
) as t;
93771
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
46秒 | 2199秒 (16表) | 2872秒 |
11. select count(*) from (
select t1.c1 from t1
join t2 on t1.c1=t2.c3
join t3 on t2.c3=t3.c4
join t4 on t3.c4=t4.c5
join t5 on t4.c5=t5.c6
join t6 on t5.c5=t6.c6
join t7 on t6.c5=t7.c6
join t8 on t7.c5=t8.c6
join t9 on t8.c5=t9.c6
join t10 on t9.c5=t10.c6
join t11 on t10.c5=t11.c6
join t12 on t11.c5=t12.c6
join t13 on t12.c5=t13.c6
join t14 on t13.c5=t14.c6
join t15 on t14.c5=t15.c6
join t16 on t15.c5=t16.c6
join t17 on t16.c5=t17.c6
) as t;
50183000
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
262秒 | 3980秒 (16表) | 3562秒 |
12.
select count(*) from (
select t1.c1 from t1
join t2 on t1.c2=t2.c3
join t3 on t2.c3=t3.c4
join t4 on t3.c4=t4.c5
join t5 on t4.c5=t5.c6
join t6 on t5.c6=t6.c7
join t7 on t6.c7=t7.c8
join t8 on t7.c8=t8.c9
join t9 on t8.c9=t9.c10
join t10 on t9.c10=t10.c11
join t11 on t10.c11=t11.c12
join t12 on t11.c12=t12.c13
join t13 on t12.c13=t13.c14
join t14 on t13.c14=t14.c15
join t15 on t14.c15=t15.c16
join t16 on t15.c16=t16.c17
join t17 on t16.c17=t17.c18
where t1.c1<10
) as t;
0
MonetDB | PostgreSQL行引擎 | PostgreSQL列引擎 |
5秒 | 640秒 (16表) | 1403秒 |
以下是PostgreSQL列引擎详细的测试过程和结果 :
创建fdw handler和server
digoal=# CREATE EXTENSION cstore_fdw;
CREATE EXTENSION
digoal=# CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
CREATE SERVER
创建外部表存储目录
[root@db-172-16-3-221 cstore_fdw]# mkdir /data01/cstore_dir
[root@db-172-16-3-221 cstore_fdw]# chown postgres:postgres /data01/cstore_dir
接下来创建外部表(注意, 外部表目前不支持PK约束, 所以不创建PK约束) :
create foreign table ft1 (c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int,
c13 int,
c14 int,
c15 int,
c16 int,
c17 int,
c18 int,
c19 int,
c20 int,
c21 int,
c22 int,
c23 int,
c24 int,
c25 int,
c26 int,
c27 int,
c28 int,
c29 int,
c30 int,
c31 int,
c32 int,
c33 int,
c34 int,
c35 int,
c36 int,
c37 int,
c38 int,
c39 int,
c40 int,
c41 int,
c42 int,
c43 int,
c44 int,
c45 int,
c46 int,
c47 int,
c48 int,
c49 int,
c50 int,
c51 int,
c52 int,
c53 int,
c54 int,
c55 int,
c56 int,
c57 int,
c58 int,
c59 int,
c60 int) server cstore_server
options (filename '/data01/cstore_dir/ft1.cstore', compression 'pglz');
导入测试数据, 测试数据生成方法见
http://blog.163.com/digoal@126/blog/static/1638770402014715113449394/
[root@db-172-16-3-221 testsql2]# ll
total 25716276
-rw-r--r-- 1 root root 1646532347 Aug 15 15:54 10.sql
-rw-r--r-- 1 root root 1646537517 Aug 15 15:55 11.sql
-rw-r--r-- 1 root root 1646522483 Aug 15 15:55 12.sql
-rw-r--r-- 1 root root 1646524530 Aug 15 15:55 13.sql
-rw-r--r-- 1 root root 1646534753 Aug 15 15:56 14.sql
-rw-r--r-- 1 root root 1646527810 Aug 15 15:56 15.sql
-rw-r--r-- 1 root root 1646535257 Aug 15 15:57 16.sql
-rw-r--r-- 1 root root 1642296488 Aug 15 15:57 1.sql
-rw-r--r-- 1 root root 1643393480 Aug 15 15:57 2.sql
-rw-r--r-- 1 root root 1643415513 Aug 15 15:58 3.sql
-rw-r--r-- 1 root root 1645910749 Aug 15 15:58 4.sql
-rw-r--r-- 1 root root 1646517090 Aug 15 15:59 5.sql
-rw-r--r-- 1 root root 1646526742 Aug 15 15:59 6.sql
-rw-r--r-- 1 root root 1646523866 Aug 15 15:59 7.sql
-rw-r--r-- 1 root root 1646531161 Aug 15 16:00 8.sql
-rw-r--r-- 1 root root 1646529088 Aug 15 16:00 9.sql
[root@db-172-16-3-221 testsql2]# less 1.sql
1|33323037|7973589|30570956|1817394|19184567|44691789|34510510|33979891|13926060|34242436|19034544|16183332|43653270|41510685|1826245|39039193|45247300|13238287|32402570|22203231|27234075|11087825|6221607|13725964|23752712|10179671|9717751|43518783|29456195|22325774|3996169|12779233|30299364|34567126|14596627|49483931|29258916|49107138|33463823|43184976|33349574|2498367|9368309|27002845|44009052|11194555|16042038|39256353|24432842|48444609|11459584|1666918|9532434|17681192|15392882|33285146|27860863|25110633|26803929
2|7317059|47436408|30800099|20096292|27735772|15367226|34692920|27219704|44626142|33800058|10683527|37811119|17149632|13181894|47179428|44152477|7190947|8373984|10194516|46447300|32806827|8639125|7906885|34473745|18171559|25588077|49866628|1456706|3448940|24977262|28260636|10766000|22413670|9060735|30862292|149442|24427961|15555213|27369147|19054104|49355271|38052674|6865223|16504903|1234569|4044652|10657381|8425516|12418636|20851897|4872817|45225464|29491023|12779702|29699209|47662583|38367779|29565837|49119289
.......
导入脚本 :
#!/bin/bash
psql <<EOF
\timing
copy ft1 from '/opt/testsql2/1.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/2.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/3.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/4.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/5.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/6.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/7.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/8.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/9.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/10.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/11.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/12.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/13.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/14.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/15.sql' with csv delimiter '|';
copy ft1 from '/opt/testsql2/16.sql' with csv delimiter '|';
\q
EOF
导入效率 :
COPY 3125000
Time: 87673.278 ms
COPY 3125000
Time: 86130.378 ms
COPY 3125000
Time: 85473.444 ms
COPY 3125000
Time: 83736.891 ms
COPY 3125000
Time: 84052.830 ms
COPY 3125000
Time: 84130.955 ms
COPY 3125000
Time: 84063.729 ms
COPY 3125000
Time: 84073.529 ms
COPY 3125000
Time: 84470.295 ms
COPY 3125000
Time: 85204.964 ms
COPY 3125000
Time: 87938.947 ms
COPY 3125000
Time: 85290.653 ms
COPY 3125000
Time: 84120.752 ms
COPY 3125000
Time: 84354.485 ms
COPY 3125000
Time: 83958.781 ms
COPY 3125000
Time: 84603.499 ms
空间占用 :
[root@db-172-16-3-221 cstore_dir]# ll -h
total 20G
-rw------- 1 postgres postgres 12G Aug 16 09:56 ft1.cstore
-rw------- 1 postgres postgres 6.6K Aug 16 09:56 ft1.cstore.footer
再创建16张同样的测试表 :
因为这里只是测试, 所以用复制文件的方法创建16个一样的表.
其实用软连接或者改options的方法更快.而且省空间.
http://blog.163.com/digoal@126/blog/static/163877040201471662253548/
create foreign table ft2 (c1 int,
c2 int,
c3 int,
c4 int,
c5 int,
c6 int,
c7 int,
c8 int,
c9 int,
c10 int,
c11 int,
c12 int,
c13 int,
c14 int,
c15 int,
c16 int,
c17 int,
c18 int,
c19 int,
c20 int,
c21 int,
c22 int,
c23 int,
c24 int,
c25 int,
c26 int,
c27 int,
c28 int,
c29 int,
c30 int,
c31 int,
c32 int,
c33 int,
c34 int,
c35 int,
c36 int,
c37 int,
c38 int,
c39 int,
c40 int,
c41 int,
c42 int,
c43 int,
c44 int,
c45 int,
c46 int,
c47 int,
c48 int,
c49 int,
c50 int,
c51 int,
c52 int,
c53 int,
c54 int,
c55 int,
c56 int,
c57 int,
c58 int,
c59 int,
c60 int) server cstore_server
options (filename '/data01/cstore_dir/ft2.cstore', compression 'pglz');
postgres@db-172-16-3-221-> cd /data01/cstore_dir/
postgres@db-172-16-3-221-> ll
total 20G
-rw------- 1 postgres postgres 12G Aug 16 09:56 ft1.cstore
-rw------- 1 postgres postgres 6.6K Aug 16 09:56 ft1.cstore.footer
-rw------- 1 postgres postgres 0 Aug 16 09:58 ft2.cstore
-rw------- 1 postgres postgres 26 Aug 16 09:58 ft2.cstore.footer
postgres@db-172-16-3-221-> mv ft2.cstore ft2.cstore.bak
postgres@db-172-16-3-221-> mv ft2.cstore.footer ft2.cstore.footer.bak
postgres@db-172-16-3-221-> cp ft1.cstore ft2.cstore
postgres@db-172-16-3-221-> cp ft1.cstore.footer ft2.cstore.footer
其他同上, 创建其他15个表, 并复制数据文件, 略.
digoal=# \det+
List of foreign tables
Schema | Table | Server | FDW Options | Description
--------+-------+---------------+-----------------------------------------------------------------+-------------
public | ft1 | cstore_server | (filename '/data01/cstore_dir/ft1.cstore', compression 'pglz') |
public | ft10 | cstore_server | (filename '/data01/cstore_dir/ft10.cstore', compression 'pglz') |
public | ft11 | cstore_server | (filename '/data01/cstore_dir/ft11.cstore', compression 'pglz') |
public | ft12 | cstore_server | (filename '/data01/cstore_dir/ft12.cstore', compression 'pglz') |
public | ft13 | cstore_server | (filename '/data01/cstore_dir/ft13.cstore', compression 'pglz') |
public | ft14 | cstore_server | (filename '/data01/cstore_dir/ft14.cstore', compression 'pglz') |
public | ft15 | cstore_server | (filename '/data01/cstore_dir/ft15.cstore', compression 'pglz') |
public | ft16 | cstore_server | (filename '/data01/cstore_dir/ft16.cstore', compression 'pglz') |
public | ft17 | cstore_server | (filename '/data01/cstore_dir/ft17.cstore', compression 'pglz') |
public | ft2 | cstore_server | (filename '/data01/cstore_dir/ft2.cstore', compression 'pglz') |
public | ft3 | cstore_server | (filename '/data01/cstore_dir/ft3.cstore', compression 'pglz') |
public | ft4 | cstore_server | (filename '/data01/cstore_dir/ft4.cstore', compression 'pglz') |
public | ft5 | cstore_server | (filename '/data01/cstore_dir/ft5.cstore', compression 'pglz') |
public | ft6 | cstore_server | (filename '/data01/cstore_dir/ft6.cstore', compression 'pglz') |
public | ft7 | cstore_server | (filename '/data01/cstore_dir/ft7.cstore', compression 'pglz') |
public | ft8 | cstore_server | (filename '/data01/cstore_dir/ft8.cstore', compression 'pglz') |
public | ft9 | cstore_server | (filename '/data01/cstore_dir/ft9.cstore', compression 'pglz') |
(17 rows)
分析所有外部表 :
digoal=# analyze ft1;
ANALYZE
Time: 148885.149 ms
...
digoal=# analyze ft17;
ANALYZE
查询测试 :
digoal=# select count(distinct c1) from ft1;
count
----------
50000000
(1 row)
Time: 41536.070 ms
digoal=# select count(*) from (select c1,c11,c21,c31,c41,c51,c60 from ft1 group by c1,c11,c21,c31,c41,c51,c60) as t;
count
----------
50000000
(1 row)
Time: 73486.646 ms
digoal=# select count(*) from (select ft1.c1 from ft1 join ft2 on ft1.c2=ft2.c3) as t;
count
----------
49996801
(1 row)
Time: 279239.085 ms
digoal=# select count(*) from (select ft1.c1 from ft1 join ft2 on ft1.c2=ft2.c3 join ft3 on ft1.c3=ft3.c5) as t;
count
----------
49995545
(1 row)
Time: 551466.035 ms
digoal=# select count(*) from (select ft1.c1 from ft1 join ft2 on ft1.c2=ft2.c3 join ft3 on ft1.c3=ft3.c5 join ft4 on ft3.c5=ft4.c6) as t;
count
----------
50001556
(1 row)
Time: 706267.651 ms
digoal=# select count(*) from (select ft1.c1 from ft1 join ft2 on ft1.c2=ft2.c3 join ft3 on ft2.c3=ft3.c4 join ft4 on ft3.c4=ft4.c5 join ft5 on ft4.c5=ft5.c6) as t;
count
----------
50024539
(1 row)
Time: 736699.900 ms
digoal=# select count(*) from (
select ft1.c1 from ft1
join ft2 on ft1.c2=ft2.c3
join ft3 on ft2.c3=ft3.c4
join ft4 on ft3.c4=ft4.c5
join ft5 on ft4.c5=ft5.c6
where ft1.c60 < 100000
) as t;
count
-------
98744
(1 row)
Time: 232810.249 ms
digoal=# select count(*) from (
select ft1.c1 from ft1
join ft2 on ft1.c2=ft2.c3
join ft3 on ft2.c3=ft3.c4
join ft4 on ft3.c4=ft4.c5
join ft5 on ft4.c5=ft5.c6
join ft6 on ft5.c5=ft6.c6
join ft7 on ft6.c5=ft7.c6
join ft8 on ft7.c5=ft8.c6
join ft9 on ft8.c5=ft9.c6
join ft10 on ft9.c5=ft10.c6
join ft11 on ft10.c5=ft11.c6
join ft12 on ft11.c5=ft12.c6
join ft13 on ft12.c5=ft13.c6
join ft14 on ft13.c5=ft14.c6
join ft15 on ft14.c5=ft15.c6
join ft16 on ft15.c5=ft16.c6
join ft17 on ft16.c5=ft17.c6
) as t;
count
----------
50338741
(1 row)
Time: 4059635.466 ms
digoal=# select count(*) from (
select ft1.c1 from ft1
join ft2 on ft1.c2=ft2.c3
join ft3 on ft2.c3=ft3.c4
join ft4 on ft3.c4=ft4.c5
join ft5 on ft4.c5=ft5.c6
join ft6 on ft5.c5=ft6.c6
join ft7 on ft6.c5=ft7.c6
join ft8 on ft7.c5=ft8.c6
join ft9 on ft8.c5=ft9.c6
join ft10 on ft9.c5=ft10.c6
join ft11 on ft10.c5=ft11.c6
join ft12 on ft11.c5=ft12.c6
join ft13 on ft12.c5=ft13.c6
join ft14 on ft13.c5=ft14.c6
join ft15 on ft14.c5=ft15.c6
join ft16 on ft15.c5=ft16.c6
join ft17 on ft16.c5=ft17.c6
where ft1.c60 < 100000
) as t;
count
-------
94635
(1 row)
Time: 2872032.780 ms
digoal=# select count(*) from (
select ft1.c1 from ft1
join ft2 on ft1.c1=ft2.c3
join ft3 on ft2.c3=ft3.c4
join ft4 on ft3.c4=ft4.c5
join ft5 on ft4.c5=ft5.c6
join ft6 on ft5.c5=ft6.c6
join ft7 on ft6.c5=ft7.c6
join ft8 on ft7.c5=ft8.c6
join ft9 on ft8.c5=ft9.c6
join ft10 on ft9.c5=ft10.c6
join ft11 on ft10.c5=ft11.c6
join ft12 on ft11.c5=ft12.c6
join ft13 on ft12.c5=ft13.c6
join ft14 on ft13.c5=ft14.c6
join ft15 on ft14.c5=ft15.c6
join ft16 on ft15.c5=ft16.c6
join ft17 on ft16.c5=ft17.c6
) as t;
count
----------
50222475
(1 row)
Time: 3562364.934 ms
[小结]
1. cstore存储性能好在于它的block index, 在使用where条件的情况下, 可以减少块的扫描.
显然对于全表扫没有作用.
block_row_count (optional): Number of rows per column block. The default is 10000. cstore_fdw compresses, creates skip indexes, and reads from disk at the block granularity. Increasing this value helps with compression and results in fewer reads from disk. However, higher values also reduce the probability of skipping over unrelated row blocks.
Using Skip Indexes
cstore_fdw partitions each column into multiple blocks. Skip indexes store minimum and maximum values for each of these blocks. While scanning the table, if min/max values of the block contradict the WHERE clause, then the block is completely skipped. This way, the query processes less data and hence finishes faster.
To use skip indexes more efficiently, you should load the data after sorting it on a column that is commonly used in the WHERE clause. This ensures that there is a minimum overlap between blocks and the chance of them being skipped is higher.
In practice, the data generally has an inherent dimension (for example a time field) on which it is naturally sorted. Usually, the queries also have a filter clause on that column (for example you want to query only the last week's data), and hence you don't need to sort the data in such cases.
[参考]