corosync+pacemaker+crmsh集群部署

乐华晖

2023-12-01

准备工作搭建CentOS7，地址为：192.168.51.101(ceph1), 192.168.51.102(ceph2)
软件版本，corosync-2.4.5-7，pacemaker-1.1.23， crmsh-3.0.0

1.配置主机名解析

# 192.168.51.101 配置/etc/hosts
192.168.51.101   ceph1
192.168.51.102   ceph2

# 192.168.51.102 配置/etc/hosts
192.168.51.101   ceph1
192.168.51.102   ceph2

2.配置主机免秘钥登录

#192.168.51.101
[root@ceph1 ~]# ssh-keygen
[root@ceph1 ~]# ssh-copy-id -i /root/.ssh/id_rsa root@ceph2

# 192.168.51.102
[root@ceph2 ~]# ssh-keygen
[root@ceph2 ~]# ssh-copy-id -i /root/.ssh/id_rsa root@ceph1

3.设置时间同步

# 192.168.51.101 同步硬件时间
[root@ceph1 ~]# hwclock -s

# 192.168.51.102 # 同步硬件时间
[root@ceph2 ~]# hwclock -s

4.安装并配置corosync 和pacemaker

1.安装corosync和pacemaker

# 192.168.51.101
[root@ceph1 ~]# yum install corosync pacemaker -y

# 192.168.51.102
[root@ceph2 ~]# yum install corosync pacemaker -y

2.配置corosync和pacemaker

# 192.168.51.101
[root@ceph1 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
/etc/corosync/corosync.conf配置内容如下：

# Please read the corosync.conf.5 manual page
totem {
        version: 2

        # crypto_cipher and crypto_hash: Used for mutual node authentication.
        # If you choose to enable this, then do remember to create a shared
        # secret with "corosync-keygen".
        # enabling crypto_cipher, requires also enabling of crypto_hash.
        crypto_cipher: none
        crypto_hash: none

        # interface: define at least one interface to communicate
        # over. If you define more than one interface stanza, you must
        # also set rrp_mode.
        interface {
                # Rings must be consecutively numbered, starting at 0.
                ringnumber: 0
                # This is normally the *network* address of the
                # interface to bind to. This ensures that you can use
                # identical instances of this configuration file
                # across all your cluster nodes, without having to
                # modify this option.
                bindnetaddr: 192.168.51.0
                # However, if you have multiple physical network
                # interfaces configured for the same subnet, then the
                # network address alone is not sufficient to identify
                # the interface Corosync should bind to. In that case,
                # configure the *host* address of the interface
                # instead:
                # bindnetaddr: 192.168.1.1
                # When selecting a multicast address, consider RFC
                # 2365 (which, among other things, specifies that
                # 239.255.x.x addresses are left to the discretion of
                # the network administrator). Do not reuse multicast
                # addresses across multiple Corosync clusters sharing
                # the same network.
                mcastaddr: 239.255.1.1
                # Corosync uses the port you specify here for UDP
                # messaging, and also the immediately preceding
                # port. Thus if you set this to 5405, Corosync sends
                # messages over UDP ports 5405 and 5404.
                mcastport: 5405
                # Time-to-live for cluster communication packets. The
                # number of hops (routers) that this ring will allow
                # itself to pass. Note that multicast routing must be
                # specifically enabled on most network routers.
                ttl: 1
        }
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to no. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: no
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to on
        # (unless you are only logging to syslog, where double
        # timestamps can be annoying).
        timestamp: on
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
}
# 手动添加内容
nodelist {
        node {
                ring0_addr: ceph1
                nodeid: 1
        }
        node {
                ring0_addr: ceph2
                nodeid: 2
        }
}

【提示】/etc/corosync/corosync.conf配置中可以配置crypto_cipher（加密密码类型），crypto_hash（加密方式），如：
crypto_cipher: aes256，crypto_hash: sha1
配置加密后需要制作秘钥文件
[root@ceph1 ~]# corosync-keygen
[root@ceph1 ~]# ll /etc/corosync/authkey

3.同步配置到ceph2

[root@ceph1 ~]# scp -p /etc/corosync/authkey /etc/corosync/corosync.conf ceph2:/etc/corosync/

5.安装crmsh

# 192.168.51.101
[root@ceph1 ~]# wget -P /etc/yum.repos.d/ http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
[root@ceph1 ~]# yum install -y crmsh
[root@ceph1 ~]# crm configure property stonith-enabled=false
ERROR: Warnings found during check: config may not be valid
Do you still want to commit (y/n)? y


# 192.168.51.102
[root@ceph2 ~]# wget -P /etc/yum.repos.d/ http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
[root@ceph2 ~]# yum install -y crmsh
[root@ceph2 ~]# crm configure property stonith-enabled=false
ERROR: Warnings found during check: config may not be valid
Do you still want to commit (y/n)? y

6.启动服务

# 192.168.51.101
[root@ceph1 ~]# systemctl start corosync.service
[root@ceph1 ~]# systemctl start pacemaker.service 

# 192.168.51.102
[root@ceph2 ~]# systemctl start corosync.service
[root@ceph2 ~]# systemctl start pacemaker.service

1.检查成员节点通知

# 192.168.51.101
[root@ceph1 ~]# grep  TOTEM  /var/log/cluster/corosync.log
Mar 22 11:31:50 [22805] ceph1 corosync notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 22 11:31:50 [22805] ceph1 corosync notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Mar 22 11:31:50 [22805] ceph1 corosync notice  [TOTEM ] The network interface [192.168.51.101] is now up.
Mar 22 11:31:50 [22805] ceph1 corosync notice  [TOTEM ] A new membership (192.168.51.101:312) was formed. Members joined: 1
Mar 22 11:31:50 [22805] ceph1 corosync notice  [TOTEM ] A new membership (192.168.51.101:316) was formed. Members joined: 2

# 192.168.51.102
[root@ceph1 ~]# grep  TOTEM  /var/log/cluster/corosync.log
Mar 22 11:32:51 [20797] ceph2 corosync notice  [TOTEM ] Initializing transport (UDP/IP Multicast).
Mar 22 11:32:51 [20797] ceph2 corosync notice  [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
Mar 22 11:32:51 [20797] ceph2 corosync notice  [TOTEM ] The network interface [192.168.51.102] is now up.
Mar 22 11:32:51 [20797] ceph2 corosync notice  [TOTEM ] A new membership (192.168.51.102:321) was formed. Members joined: 2
Mar 22 11:32:51 [20797] ceph2 corosync notice  [TOTEM ] A new membership (192.168.51.101:325) was formed. Members joined: 1

2.检查节点初始化信息

# 192.168.51.101
[root@ceph1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
        id      = 192.168.51.101
        status  = ring 0 active with no faults
        
# 192.168.51.102
[root@ceph2 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
        id      = 192.168.51.102
        status  = ring 0 active with no faults

3.检查集群成员关系及Quorum API

# 192.168.51.101
[root@ceph1 ~]# corosync-cmapctl  | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.51.101)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.51.102)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 2
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

4.看DC节点所在节点/集群状态信息

# 192.168.51.101
[root@ceph1 ~]# crm_mon -1
Stack: corosync
Current DC: ceph1 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Tue Mar 22 11:39:34 2022
Last change: Mon Mar 21 18:19:27 2022 by root via crm_attribute on ceph1

2 nodes configured
0 resource instances configured

Online: [ ceph1 ceph2 ]

No active resources

7.安装httpd

#192.168.51.101
[root@ceph1 ~]# yum install -y httpd
[root@ceph1 ~]# systemctl start httpd
[root@ceph1 ~]# echo "<h1>corosync pacemaker on the openstack</h1>" >/var/www/html/index.html
# 192.168.51.102
[root@ceph2 ~]# yum install -y httpd
[root@ceph2 ~]# systemctl start httpd
[root@ceph2 ~]# echo "<h1>corosync pacemaker on the openstack</h1>" >/var/www/html/index.html

8.配置集群服务

# 192.168.51.101
[root@ceph1 ~]# crm
crm(live)# status ##必须保证所有节点都上线，才执行那些命令
crm(live)# ra
crm(live)ra# list systemd
httpd
crm(live)ra# cd
crm(live)# configure
crm(live)configure#property no-quorum-policy=ignore //忽略集群中当节点数小于等于quorum，节点数将无法运行，默认是stop
crm(live)configure#property default-resource-stickiness=INFINITY //资源粘性配置，主节点故障恢复后不切回资源

添加资源
crm(live)configure# primitive webip ocf:heartbeat:IPaddr parms ip="192.168.51.110" nic="ens160" cidr_netmask="" broadcast="192.168.51.255"  //定义webip资源
crm(live)configure# primitive webserver systemd:httpd op start timeout=100s op stop timeout=100s    //定义webserver资源
crm(live)configure# group webservice webip webserver# 集群默认为资源平均分配，通过组使资源在同一个节点，注意顺序IP在哪儿，webserver就在哪儿
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node 1: ceph1 \
        attributes standby=off
node 2: ceph2 \
        attributes standby=off
primitive webip IPaddr \
        params parms ip=192.168.51.110 nic=ens160 cidr_netmask="" broadcast=192.168.51.255
primitive webserver systemd:httpd \
        op start timeout=100s interval=0 \
        op stop timeout=100s interval=0
group webservice webip webserver
property cib-bootstrap-options: \
        stonith-enabled=false \
        have-watchdog=false \
        dc-version=1.1.23-1.el7_9.1-9acf116022 \
        cluster-infrastructure=corosync \
        no-quorum-policy=ignore \
        default-resource-stickiness=INFINITY

9.验证

[root@ceph1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8c:57:cf brd ff:ff:ff:ff:ff:ff
    inet 192.168.51.101/24 brd 192.168.51.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 192.168.51.110/24 brd 192.168.51.255 scope global secondary ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8c:57cf/64 scope link
       valid_lft forever preferred_lft forever
       
[root@ceph1 ~]# curl 192.168.51.110
<h1>corosync pacemaker on the openstack</h1>

[root@ceph1 ~]# crm node standby
[root@ceph1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8c:57:cf brd ff:ff:ff:ff:ff:ff
    inet 192.168.51.101/24 brd 192.168.51.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fe8c:57cf/64 scope link
       valid_lft forever preferred_lft forever

[root@ceph1 ~]# service httpd status
Redirecting to /bin/systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)
     Docs: man:httpd(8)
           man:apachectl(8)

Mar 22 11:32:48 ceph1 systemd[1]: Starting Cluster Controlled httpd...
Mar 22 11:32:48 ceph1 httpd[22958]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.51.101. Set the 'ServerName' directive globally to suppress this message
Mar 22 11:32:48 ceph1 systemd[1]: Started Cluster Controlled httpd.
Mar 22 13:51:43 ceph1 systemd[1]: Stopping The Apache HTTP Server...
Mar 22 13:51:46 ceph1 systemd[1]: Stopped The Apache HTTP Server.
Mar 22 13:54:09 ceph1 systemd[1]: Starting The Apache HTTP Server...
Mar 22 13:54:09 ceph1 httpd[23150]: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.51.101. Set the 'ServerName' directive globally to suppress this message
Mar 22 13:54:09 ceph1 systemd[1]: Started The Apache HTTP Server.
Mar 22 13:54:14 ceph1 systemd[1]: Stopping The Apache HTTP Server...
Mar 22 13:54:15 ceph1 systemd[1]: Stopped The Apache HTTP Server.

[root@ceph1 ~]# curl 192.168.51.110
<h1>corosync pacemaker on the openstack</h1>

10.扩展阅读

pacemaker有两种命令行工具，一种是pcs，一种是crmsh，crmsh使用的是crm相关命令，当前用的最多的是crm命令，故熟悉crmsh方式部署即可。