greenplum数据库在大量数据更新时,会产生大量的pg_xlog,造成磁盘空间占用;系统长时间运行后,pg_log文件夹下日志文件也会越来越大,合理的做法是日志文件的磁盘和数据目录挂载在不同的磁盘路径下或者磁盘扩容;若在同一路径下,可将日志文件备份到其他路径,腾出一部分空间,或清理pg_xlog
【注意】:
[gpadmin@gpmaster ~]$ cd /home/gpadmin/gpdata/gpmaster/gpseg-1/
[gpadmin@gpmaster gpseg-1]$ du -sh *
99M base
4.1M global
20K gpperfmon
4.0K gpsegconfig_dump
4.0K gpssh.conf
4.0K internal.auto.conf
36K pg_clog
36K pg_distributedlog
4.0K pg_dynshmem
8.0K pg_hba.conf
4.0K pg_ident.conf
28K pg_log
12K pg_logical
76K pg_multixact
36K pg_notify
4.0K pg_replslot
4.0K pg_serial
4.0K pg_snapshots
4.0K pg_stat
4.0K pg_stat_tmp
36K pg_subtrans
4.0K pg_tblspc
4.0K pg_twophase
4.0K pg_utilitymodedtmredo
4.0K PG_VERSION
193M pg_xlog
4.0K postgresql.auto.conf
24K postgresql.conf
4.0K postmaster.opts
4.0K postmaster.pid
可以看到当前数据目录下pg_xlog 193M大小
实际上集群的每个数据存储节点上都会存在pg_xlog的空间占用
[gpadmin@gpmaster gpdata]$ gpstate -s | grep -E "Master data directory|Datadir"|awk -F"=" '{print $2}'
/home/gpadmin/gpdata/gpmaster/gpseg-1
/home/gpadmin/gpdata/gpdatap1/gpseg0
/home/gpadmin/gpdata/gpdatap2/gpseg1
/home/gpadmin/gpdata/gpdatap1/gpseg2
/home/gpadmin/gpdata/gpdatap2/gpseg3
/home/gpadmin/gpdata/gpdatap1/gpseg4
/home/gpadmin/gpdata/gpdatap2/gpseg5
这里获取到的路径清单即为集群所有节点的数据存储目录,其中gpstandby路径和gpmaster一样
[gpadmin@gpmaster gpdata]$ gpstop
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:---------------------------------------------
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:-Segment instances that will be shutdown:
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:---------------------------------------------
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- Host Datadir Port Status
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment01 /home/gpadmin/gpdata/gpdatap1/gpseg0 6000 u
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment01 /home/gpadmin/gpdata/gpdatap2/gpseg1 6001 u
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment02 /home/gpadmin/gpdata/gpdatap1/gpseg2 6000 u
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment02 /home/gpadmin/gpdata/gpdatap2/gpseg3 6001 u
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment03 /home/gpadmin/gpdata/gpdatap1/gpseg4 6000 u
20220630:16:19:37:005122 gpstop:gpmaster:gpadmin-[INFO]:- segment03 /home/gpadmin/gpdata/gpdatap2/gpseg5 6001 u
Continue with Greenplum instance shutdown Yy|Nn (default=N):
> y
20220630:16:19:39:005122 gpstop:gpmaster:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20220630:16:19:39:005122 gpstop:gpmaster:gpadmin-[INFO]:-Master segment instance directory=/home/gpadmin/gpdata/gpmaster/gpseg-1
20220630:16:19:39:005122 gpstop:gpmaster:gpadmin-[INFO]:-Stopping master segment and waiting for user connections to finish ...
server shutting down
20220630:16:19:40:005122 gpstop:gpmaster:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes
20220630:16:19:40:005122 gpstop:gpmaster:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts
20220630:16:19:40:005122 gpstop:gpmaster:gpadmin-[INFO]:-Cleaning up leftover shared memory
[gpadmin@gpmaster gpdata]$
# 注意操作用户均为gpadmin管理员用户来操作,其他linux普通用户无法操作
# pg_controldata命令读取的pg_control里存储路径的元数据信息
[gpadmin@gpmaster gpseg-1]$ pg_controldata /home/gpadmin/gpdata/gpmaster/gpseg-1 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/735
Latest checkpoint's NextOID: 16395
# -o 参数为查询到的NextOID,-x 参数为查询到的NextXID,-f 参数为对应的数据存储目录
[gpadmin@gpmaster gpseg-1]$ pg_resetxlog -o 16395 -x 735 -f /home/gpadmin/gpdata/gpmaster/gpseg-1
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
[gpadmin@gpmaster gpseg-1]$
[gpadmin@gpmaster gpseg-1]$ du -sh *
99M base
4.1M global
20K gpperfmon
4.0K gpsegconfig_dump
4.0K gpssh.conf
4.0K internal.auto.conf
36K pg_clog
36K pg_distributedlog
4.0K pg_dynshmem
8.0K pg_hba.conf
4.0K pg_ident.conf
36K pg_log
12K pg_logical
76K pg_multixact
36K pg_notify
4.0K pg_replslot
4.0K pg_serial
4.0K pg_snapshots
52K pg_stat
4.0K pg_stat_tmp
36K pg_subtrans
4.0K pg_tblspc
4.0K pg_twophase
4.0K pg_utilitymodedtmredo
4.0K PG_VERSION
65M pg_xlog
4.0K postgresql.auto.conf
24K postgresql.conf
4.0K postmaster.opts
可以看到当前数据目录下pg_xlog 65M大小,已经被清理
【注意】:
# 【gpstandby操作】
[gpadmin@gpstandby gpdata]$ pg_controldata /home/gpadmin/gpdata/gpmaster/gpseg-1 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/735
Latest checkpoint's NextOID: 16395
[gpadmin@gpstandby gpdata]$
[gpadmin@gpstandby gpdata]$ pg_resetxlog -o 16395 -x 735 -f /home/gpadmin/gpdata/gpmaster/gpseg-1
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
[gpadmin@gpstandby gpdata]$
# 【segment01操作】
[gpadmin@segment01 gpseg0]$ pg_controldata /home/gpadmin/gpdata/gpdatap1/gpseg0 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment01 gpseg0]$ pg_controldata /home/gpadmin/gpdata/gpdatap2/gpseg1 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment01 gpseg0]$
[gpadmin@segment01 gpseg0]$
[gpadmin@segment01 gpseg0]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap1/gpseg0
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
[gpadmin@segment01 gpseg0]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap2/gpseg1
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
# 【segment02操作】
[gpadmin@segment02 gpdata]$ pg_controldata /home/gpadmin/gpdata/gpdatap1/gpseg2 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment02 gpdata]$ pg_controldata /home/gpadmin/gpdata/gpdatap2/gpseg3 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment02 gpdata]$
[gpadmin@segment02 gpdata]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap1/gpseg2
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
[gpadmin@segment02 gpdata]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap2/gpseg3
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
# 【segment03操作】
[gpadmin@segment03 ~]$ pg_controldata /home/gpadmin/gpdata/gpdatap1/gpseg4 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment03 ~]$ pg_controldata /home/gpadmin/gpdata/gpdatap2/gpseg5 | grep -E "Latest checkpoint's NextXID|Latest checkpoint's NextOID"
Latest checkpoint's NextXID: 0/726
Latest checkpoint's NextOID: 16392
[gpadmin@segment03 ~]$
[gpadmin@segment03 ~]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap1/gpseg4
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
[gpadmin@segment03 ~]$ pg_resetxlog -o 16392 -x 726 -f /home/gpadmin/gpdata/gpdatap2/gpseg5
WARNING: Do not use this on Greenplum. pg_resetxlog might cause data loss
and render system irrecoverable. Do you wish to proceed? [yes/no] yes
Transaction log reset
截止到这,整个集群各节点的数据目录的pg_xlog已经清理完毕
# 启动gp集群
[gpadmin@gpmaster gpdata]$ gpstart
20220630:16:48:34:006792 gpstart:gpmaster:gpadmin-[INFO]:-Starting gpstart with args:
20220630:16:48:34:006792 gpstart:gpmaster:gpadmin-[INFO]:-Gathering information and validating the environment...
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- Host Datadir Port
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment01 /home/gpadmin/gpdata/gpdatap1/gpseg0 6000
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment01 /home/gpadmin/gpdata/gpdatap2/gpseg1 6001
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment02 /home/gpadmin/gpdata/gpdatap1/gpseg2 6000
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment02 /home/gpadmin/gpdata/gpdatap2/gpseg3 6001
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment03 /home/gpadmin/gpdata/gpdatap1/gpseg4 6000
20220630:16:48:35:006792 gpstart:gpmaster:gpadmin-[INFO]:- segment03 /home/gpadmin/gpdata/gpdatap2/gpseg5 6001
Continue with Greenplum instance startup Yy|Nn (default=N):
> y
20220630:16:48:37:006792 gpstart:gpmaster:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
20220630:16:48:37:006792 gpstart:gpmaster:gpadmin-[INFO]:-Checking if standby master is running on host: gpstandby in directory: /home/gpadmin/gpdata/gpmaster/gpseg-1
20220630:16:48:38:006792 gpstart:gpmaster:gpadmin-[INFO]:-Database successfully started
# 查看集群运行状态
[gpadmin@gpmaster gpseg-1]$ gpstate
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-Starting gpstate with args:
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.19.0 build commit:c88801925e5d51e4be57f54ff2158ed74a264b8e Open Source'
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.26 (Greenplum Database 6.19.0 build commit:c88801925e5d51e4be57f54ff2158ed74a264b8e Open Source) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 6.4.0, 64-bit compiled on Dec 8 2021 23:08:44'
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-Obtaining Segment details from master...
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-Gathering data from segments...
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-Greenplum instance status summary
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Master instance = Active
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Master standby = gpstandby
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Standby master state = Standby host passive
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total segment instance count from metadata = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Primary Segment Status
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total primary segments = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total primary segment valid (at master) = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total primary segment failures (at master) = 0
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of postmaster.pid files found = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number of /tmp lock files found = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number postmaster processes missing = 0
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Total number postmaster processes found = 6
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Mirror Segment Status
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:- Mirrors not configured on this array
20220630:17:20:19:007703 gpstate:gpmaster:gpadmin-[INFO]:-----------------------------------------------------
# 登录数据库
[gpadmin@gpmaster gpseg-1]$ psql
psql (9.4.26)
Type "help" for help.
testdb=#
testdb=# \d
List of relations
Schema | Name | Type | Owner | Storage
--------+----------------------+-------+---------+---------
public | amr_n042_company_inv | table | gpadmin | heap
(1 row)
testdb=#
testdb=# select * from amr_n042_company_inv limit 1;
rec_id | eid | inv | pid | eid_inv | inv_type | blic_type | blic_no | country | currency | sub_conam | acc
_conam | sub_conam_usd | acc_conam_usd | con_prop | con_form | con_date | created | update_time | delete_flg
----------------------------------+----------------------------------+------------------+----------------------------------+---------+----------+-----------+---------+---------+----------+-----------+----------
-------------+----------------------+----------------------+------------------------+----------+------------+---------------------+---------------------+------------
95e6834d0a3d99e9ea8811855ae9229d | f1a385984a2b3860624906202b84cfe1 | 测试数据测试数据 | 67f87ebe99ad9e6c21abeae9482ab52d | | 20 | | | 156 | 156 | 50.000000 | 50.000000
000000000000 | 7.370000000000000000 | 7.370000000000000000 | 100.000000000000000000 | | 2017-05-31 | 2020-11-23 15:14:59 | 2022-02-12 08:07:48 | 0
(1 row)
集群运行一切正常,清理流程至此实施完毕
WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg1 192.168.0.243:6001 pid=7269)
testdb=# \timing
Timing is on.
testdb=# \i amr_n042_company_modify.sql
SET
Time: 5.479 ms
set_config
------------
(1 row)
Time: 9.566 ms
SET
Time: 0.337 ms
SET
Time: 0.277 ms
SET
Time: 0.260 ms
SET
Time: 0.257 ms
SET
Time: 0.246 ms
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg2 192.168.0.98:6000 pid=7305)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg3 192.168.0.98:6001 pid=7306)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg0 192.168.0.243:6000 pid=7270)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg1 192.168.0.243:6001 pid=7269)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg5 192.168.0.156:6001 pid=12038)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
psql:amr_n042_company_modify.sql:36: WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg4 192.168.0.156:6000 pid=12037)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
CREATE TABLE
Time: 85.925 ms
psql:amr_n042_company_modify.sql:39: WARNING: database with OID 0 must be vacuumed within 147483646 transactions
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
发现本次清理pg_xlog后,在数据库写入数据时,报了大量的WARNING警告
WARNING: database with OID 0 must be vacuumed within 147483647 transactions (seg1 192.168.0.243:6001 pid=7269)
HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
处理办法
【注意】:这里的数据库是根据自己真实场景的数据库名称来执行 VACUUM FREEZE
# 执行vacuum freeze操作
[gpadmin@gpmaster ~]$ nohup psql -c " VACUUM FREEZE " testdb 2>&1 &
[gpadmin@gpmaster ~]$ nohup psql -c " VACUUM FREEZE " postgres 2>&1 &
# 耐心等待需要使用的database库逐个执行完VACUUM FREEZE
vacuum freeze 命令表示强制对表或数据库进行freeze 操作。freeze 操作是为了保证整个数据库的最老最新事务差不能超过20亿,从而防止事务ID 发生回卷
在PostgreSQL 中,XID 是用32位无符号数来表示的,很容易发生溢出的情况,造成事务可见性混乱
再次登录数据库执行\i amr_n042_company_modify.sql导入数据时发现不再有告警提示了。