!!!在使用pg_rewind时,应备份目标实例。pg_rewind会直接覆盖目标库的文件,如果rewind失败,那么可能目标库无法启动。
主备切换后,老主库仍然运行,导致主备时间线不一致,老主库无法当做新主库的备库启动
拉起备库时,报时间线错误如下:
LOG: entering standby mode
FATAL: requested timeline 2 is not a child of this server's history
DETAIL: Latest checkpoint is at 0/6000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/4000098.
LOG: startup process (PID 22321) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system is shut down
此时需要用rewind重新拉齐一次主备
1.配置当前主库的pg_hba
配置pg_rewind的登陆用户登陆源库许可,hba生效需要重启数据库
vi $source/pg_hba.conf
host all pg 172.17.100.150/32 trust
pg_rewind需要使用高权限用户,pg新版本可以授权,pg老版本最好用超级用户。
我当前环境的版本为pg9.6,直接使用OS超级用户
2.wal_log_hints = on参数配置
将wal_log_hints = on追加到目标库postgres.conf,重新启动并关闭一次目标库(此时主库是启动状态,备库是关闭状态)
ps:设置wal_log_hints这个参数使得能够记录特定提示位(hint-bit)的变化。这个参数只能在服务器启动的时候被设置。默认值为off。一些特定的工具会要求设置这个参数来正常的运行,所以将它设置为ON就好。
3.pg_rewind命令执行
[pg@lzl pg96data_sla]$ /pg/pg96/bin/pg_rewind
--target-pgdata /pg/pg96data_pri
--source-server='host=172.17.100.150 port=5433 user=pg password=oracle dbname=postgres'
servers diverged at WAL position 0/4000098 on timeline 1
rewinding from last common checkpoint at 0/4000028 on timeline 1
Done!
4.配置备库参数
更改postgres.conf和recovery.conf中的IP、端口、目录等配置,pg_rewind会把配置文件也cp过来
[pg@lzl pg96data_pri]$ mv recovery.done recovery.conf
[pg@lzl pg96data_pri]$ vi recovery.conf
[pg@lzl pg96data_pri]$ vi postgres.conf
5.启动备库
[pg@lzl pg96data_pri]$ /pg/pg96/bin/pg_ctl -D /pg/pg96data_sla -l /pg/pg96data_sla/server.log start
server starting
[pg@lzl pg96data_sla]$ psql -p5433 postgres
psql (9.6.17)
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 24766
usesysid | 16384
usename | lzl
application_name | walreceiver
client_addr | 172.17.100.150
client_hostname |
client_port | 47345
backend_start | 2021-07-30 07:44:05.582546+00
backend_xmin |
state | streaming
sent_location | 0/4033790
write_location | 0/4033790
flush_location | 0/4033790
replay_location | 0/4033790
sync_priority | 0
sync_state | async
pg_rewind命令报错一
could not fetch remote file "global/pg_control": ERROR: must be superuser to read files
Failure, exiting
解决办法:
使用高权限用户
pg_rewind命令报错二
could not connect to server: FATAL: no pg_hba.conf entry for host "172.17.100.150", user "rewind_user", database "postgres"
Failure, exiting
没有配置pg_hba.conf连接
pg_rewind命令报错三
[pg@lzl pg96data_sla]$ /pg/pg96/bin/pg_rewind --target-pgdata /pg/pg96data_pri --source-server='host=172.17.100.150 port=5433 user=pg password=oracle dbname=postgres'
target server needs to use either data checksums or "wal_log_hints = on"
问题原因:
1. full_page_writes (默认开启)
2. wal_log_hints 设置成 on 或者 PG 在初始化时开启 checksums 功能
解决办法:将wal_log_hints = on配置到目标库postgres.conf,启动再关闭一次目标库(目标库本来就是关闭的,必须启动再关闭一次,不然参数不会生效)
vi postgres.conf 加入目标库配置
参考文档:
https://www.postgresql.org/docs/9.6/app-pgrewind.html