当前位置: 首页 > 工具软件 > Node-xlog > 使用案例 >

postgresql 高可用 pacemaker + corosync 之六 add node

鱼宜
2023-12-01

os: ubuntu 16.04
db: postgresql 9.6.8
pacemaker: Pacemaker 1.1.14 Written by Andrew Beekhof
corosync: Corosync Cluster Engine, version ‘2.3.5’

目前的集群如下:

vip-mas 192.168.56.119 
vip-sla 192.168.56.120

node1 192.168.56.92
node2 192.168.56.90
node3 192.168.56.88

现在添加一个新节点 node4

node4 192.168.56.86

添加节点前的集群情况

root@node1:~# crm_mon -Afr -1

Last updated: Tue Feb 19 16:00:40 2019		Last change: Tue Feb 19 15:51:27 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 7 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node1

Node Attributes:
* Node node1:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 0000000006000098
    + pgsql-status                    	: PRI       
* Node node2:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  

Migration Summary:
* Node node1:
* Node node3:
* Node node2:

目前 node1 充当了 master 角色.

root@node1:~# su - postgres
postgres@node1:~$ psql -c "select * from pg_stat_replication;"

  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 14728 |    16384 | repl    | node2            | 192.168.56.90 |                 |       52404 | 2019-02-19 15:51:22.288195+08 |          588 | streaming | 0/8000060     | 0/8000060      | 0/8000060      | 0/8000060       |             0 | async
 15093 |    16384 | repl    | node3            | 192.168.56.88 |                 |       47440 | 2019-02-19 15:51:26.645476+08 |          588 | streaming | 0/8000060     | 0/8000060      | 0/8000060      | 0/8000060       |             0 | async
(2 rows)

node4 的 os 设置

# iptables -F

# systemctl stop ufw;
systemctl disable ufw;

禁用selinux,有的话就修改,没有就不修改(依赖policycoreutils)

# vi /etc/selinux/config 

SELINUX=disabled

# vi /etc/hosts

192.168.56.92 node1
192.168.56.90 node2
192.168.56.88 node3
192.168.56.86 node4

配置 ssh 信任

# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node3;
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node4;

另外 node1 node2 node3 节点上也 要执行下

# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node4;

node4 的 postgresql stream replication

安装配置好 1 master 2 slave async stream replication.
详细过程可以参考另外的blog,注意需要禁止 postgresql 随机启动,用 pacemaker + corosync 来管理 postgresql.

# systemctl disable postgresql

node4 安装 pacemaker corosync pcs

留意 2224 端口的使用情况

# netstat -lntp |grep -i 2224

node4 节点安装对应的软件

# apt install -y pacemaker corosync corosync-dev pcs psmisc fence-agents crmsh

# dpkg -l |grep -Ei "pacemaker|corosync|pcs|psmisc|fence-agents|crmsh"

对应的完全卸载指令

# apt-get -y remove --purge corosync corosync-dev libcorosync-common-dev libcorosync-common4 pacemaker pacemaker-cli-utils pacemaker-common pacemaker-resource-agents pcs psmisc fence-agents crmsh

node4 节点修改 hacluster 用户密码

# passwd hacluster

启动 pacemaker corosync pcs

node4 节点 启动

# systemctl status pacemaker corosync pcsd
# systemctl enable pacemaker corosync pcsd

# ls -l /lib/systemd/system/corosync.service;
ls -l /lib/systemd/system/pacemaker.service;
ls -l /lib/systemd/system/pcsd.service;

node4 节点备份 corosync 的配置文件 /etc/corosync/corosync.conf

# mv /etc/corosync/corosync.conf /etc/corosync/corosync.conf.bak

node4 节点删除 pacemaker 信息

# ls -l /var/lib/pacemaker/cib/cib*
# rm -f /var/lib/pacemaker/cib/cib*

node4 节点重启下 pacemaker corosync pcsd

# systemctl stop pacemaker corosync
# systemctl restart pcsd

# systemctl status pacemaker corosync pcsd

修改集群配置

更新PostgreSQL集群,添加新加的节点,会出现闪断
node1 节点上操作

# pcs cluster auth -u hacluster -p rootroot 192.168.56.92 192.168.56.90 192.168.56.88 192.168.56.86
# pcs cluster node add 192.168.56.86 --start

# pcs resource update msPostgresql pgsql master-max=1 master-node-max=1 clone-max=5 clone-node-max=1 notify=true
# pcs resource update pgsql pgsql node_list="node1 node2 node3 node4"

# pcs cluster enable --all
192.168.56.92: Cluster Enabled
192.168.56.90: Cluster Enabled
192.168.56.88: Cluster Enabled
192.168.56.86: Cluster Enabled

node4 节点重启 corosync pacemaker pcsd

# systemctl restart pacemaker corosync pcsd

# pcs status

Cluster name: pgcluster
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Tue Feb 19 17:04:43 2019		Last change: Tue Feb 19 17:02:17 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
4 nodes and 7 resources configured

Online: [ node1 node2 node3 node4 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node4 ]
     Stopped: [ node2 node3 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node1

Failed Actions:
* pgsql_start_0 on node3 'unknown error' (1): call=46, status=complete, exitreason='My data may be inconsistent. You have to remove /var/lib/pgsql/tmp/PGSQL.lock file to force start.',
    last-rc-change='Tue Feb 19 17:01:45 2019', queued=0ms, exec=191ms
* pgsql_monitor_4000 on node2 'not running' (7): call=62, status=complete, exitreason='none',
    last-rc-change='Tue Feb 19 17:01:47 2019', queued=0ms, exec=91ms


PCSD Status:
  node1 (192.168.56.92): Online
  node2 (192.168.56.90): Online
  node3 (192.168.56.88): Online
  node4 (192.168.56.86): Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

观察到 Stopped: [ node2 node3 ] ,这两个节点为什么停止?不懂

# rm /var/lib/pgsql/tmp/PGSQL.lock
# pcs resource cleanup msPostgresql

最终结果

# crm_mon -Afr -1

Last updated: Tue Feb 19 17:09:21 2019		Last change: Tue Feb 19 17:09:07 2019 by root via crm_attribute on node1
Stack: corosync
Current DC: node1 (version 1.1.14-70404b0) - partition with quorum
4 nodes and 7 resources configured

Online: [ node1 node2 node3 node4 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 node4 ]
 Resource Group: master-group
     vip-mas	(ocf::heartbeat:IPaddr2):	Started node1
     vip-sla	(ocf::heartbeat:IPaddr2):	Started node1

Node Attributes:
* Node node1:
    + master-pgsql                    	: 1000      
    + pgsql-data-status               	: LATEST    
    + pgsql-master-baseline           	: 000000000D000098
    + pgsql-status                    	: PRI       
* Node node2:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 000000000D000140
* Node node3:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 000000000D000140
* Node node4:
    + master-pgsql                    	: -INFINITY 
    + pgsql-data-status               	: STREAMING|ASYNC
    + pgsql-status                    	: HS:async  
    + pgsql-xlog-loc                  	: 000000000D000140

Migration Summary:
* Node node1:
* Node node3:
* Node node2:
* Node node4:

postgres@node1:~$ psql -c "select * from pg_stat_replication;"

  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state 
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
 29873 |    16384 | repl    | node4            | 192.168.56.86 |                 |       50746 | 2019-02-19 17:02:15.923324+08 |          588 | streaming | 0/D000140     | 0/D000140      | 0/D000140      | 0/D000140       |             0 | async
 27221 |    16384 | repl    | node2            | 192.168.56.90 |                 |       52994 | 2019-02-19 17:09:05.656079+08 |          588 | streaming | 0/D000140     | 0/D000140      | 0/D000140      | 0/D000140       |             0 | async
 27222 |    16384 | repl    | node3            | 192.168.56.88 |                 |       48038 | 2019-02-19 17:09:05.672641+08 |          588 | streaming | 0/D000140     | 0/D000140      | 0/D000140      | 0/D000140       |             0 | async
(3 rows)

参考:
http://www.fibrevillage.com/sysadmin/317-pacemaker-and-pcs-on-linux-example-cluster-creation-add-a-node-to-cluster-remove-a-node-from-a-cluster-desctroy-a-cluster

 类似资料: