Corosync:
corosync: votes
corosync: votequorum
cman+corosync
cman+rgmanager, cman+pacemaker
corosync+pacemaker
前提
1)本配置共有两个测试节点,分别hadoop1.abc.com和hadoop2.abc.com,相的IP地址分别为172.16.100.15和172.16.100.16;
2)集群服务为apache的httpd服务;
3)提供web服务的地址为172.16.100.11,即vip;
4)系统为CentOS 6.4 64bits
1、准备工作
为了配置一台Linux主机成为HA的节点,通常需要做出如下的准备工作:
1)所有节点的主机名称和对应的IP地址解析服务可以正常工作,且每个节点的主机名称需要跟"uname -n“命令的结果保持一致;因此,需要保证两个节点上的/etc/hosts文件均为下面的内容:
192.168.1.3 hadoop1.abc.com hadoop1 192.168.1.4 hadoop2.abc.com hadoop2
为了使得重新启动系统后仍能保持如上的主机名称,还分别需要在各节点执行类似如下的命令:
Node1:
# sed -i 's@\(HOSTNAME=\).*@\1hadoop1.abc.com@g' /etc/sysconfig/network # hostname hadoop1.abc.com
Node2:
# sed -i 's@\(HOSTNAME=\).*@\1hadoop2.abc.com@g' /etc/sysconfig/network # hostname hadoop2.abc.com
2、安装pacemaker
[root@hadoop1 corosync]# yum install pacemaker [root@hadoop2 corosync]# yum install pacemaker
3、配置corosync
[root@hadoop1 ~]# yum install corosync [root@hadoop1 ~]# cd /etc/corosync/
[root@hadoop1 corosync]# ll 总用量 16 -rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example -rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d
[root@hadoop1 corosync]# cp corosync.conf.example corosync.conf
[root@hadoop1 corosync]# vim corosync.conf
接着编辑corosync.conf,添加如下内容:表示corosync启动自动启动pacemaker service { ver: 0 name: pacemaker # use_mgmtd: yes } aisexec { user: root group: root } 并设定此配置文件中 bindnetaddr后面的IP地址为你的网卡所在网络的网络地址,我们这里的两个节点在192.168.1.0网络,因此这里将其设定为172.16.0.0;如下 bindnetaddr: 172.16.0.0
4、安装crmsh
RHEL自6.4起不再提供集群的命令行配置工具crmsh,转而使用pcs;如果你习惯了使用crm命令,可下载相关的程序包自行安装即可。crmsh依赖于pssh,因此需要一并下载。
[root@hadoop1 ~]# cd /etc/yum.repos.d/ [root@hadoop1 yum.repos.d]# wget http://download.opensuse.org/repositories/network:ha-clustering:Stable/RedHat_RHEL-6/network:ha-clustering:Stable.repo [root@hadoop1 yum.repos.d]# yum install crmsh [root@hadoop1 yum.repos.d]# yum install pssh
[root@hadoop1 corosync]# ll 总用量 28 -rw-r--r--. 1 root root 989 7月 14 19:05 \ -r--------. 1 root root 128 7月 14 19:30 authkey //自动征收authkey文件了 -rw-r--r--. 1 root root 2811 7月 14 19:15 corosync.conf -rw-r--r--. 1 root root 2663 10月 15 2014 corosync.conf.example -rw-r--r--. 1 root root 1073 10月 15 2014 corosync.conf.example.udpu drwxr-xr-x. 2 root root 4096 10月 15 2014 service.d drwxr-xr-x. 2 root root 4096 10月 15 2014 uidgid.d
将corosync和authkey复制至hadoop2:
[root@hadoop1 corosync]# scp -p authkey corosync.conf hadoop2:/etc/corosync/ authkey 100% 128 0.1KB/s 00:00 corosync.conf 100% 2811 2.8KB/s 00:00
5、启动corosync
[root@hadoop1 corosync]# service corosync start Starting Corosync Cluster Engine (corosync): [确定] [root@hadoop1 corosync]# ssh hadoop2 'service corosync start' Starting Corosync Cluster Engine (corosync): [确定] [root@hadoop1 corosync]#
查看corosync引擎是否正常启动:
[root@hadoop1 cluster]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service. Jul 14 19:36:33 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
查看初始化成员节点通知是否正常发出:
[root@hadoop1 cluster]# grep TOTEM /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [TOTEM ] Initializing transport (UDP/IP Multicast). Jul 14 19:36:33 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). Jul 14 19:36:33 corosync [TOTEM ] The network interface [192.168.1.3] is now up. Jul 14 19:36:33 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。
[root@hadoop1 cluster]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources Jul 14 19:36:33 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon. Jul 14 19:36:33 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
查看pacemaker是否正常启动:
[root@hadoop1 cluster]# grep pcmk_startup /var/log/cluster/corosync.log Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: CRM: Initialized Jul 14 19:36:33 corosync [pcmk ] Logging: Initialized pcmk_startup Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615 Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Service: 9 Jul 14 19:36:33 corosync [pcmk ] info: pcmk_startup: Local hostname: hadoop1.abc.com
如果上面命令执行均没有问题,接着可以执行如下命令启动hadoop2上的corosync
[root@hadoop1 ~]# ssh hadoop2 -- /etc/init.d/corosync startStarting Corosync Cluster Engine (corosync): [确定]
注意:启动hadoop2需要在hadoop1上使用如上命令进行,不要在hadoop2节点上直接启动。下面是node1上的相关日志。
[root@hadoop1 ~]# tail /var/log/cluster/corosync.log Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: info: determine_online_status: Node hadoop2.abc.com is online Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: notice: stage6: Delaying fencing operations until there are resources to manage Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_te_invoke: Processing graph 6 (ref=pe_calc-dc-1436946268-37) derived from /var/lib/pacemaker/pengine/pe-input-14.bz2 Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: notice: run_graph: Transition 6 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-14.bz2): Complete Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE Jul 15 15:44:28 [1772] hadoop1.abc.com crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-14.bz2 Jul 15 15:44:28 [1771] hadoop1.abc.com pengine: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. Jul 15 15:44:33 [1767] hadoop1.abc.com cib: info: cib_process_ping: Reporting our current digest to hadoop1.abc.com: 24973b4c6ef4c32f7c580bdd07cc1753 for 0.5.28 (0x277e390 0)
如果安装了crmsh,可使用如下命令查看集群节点的启动状态:
[root@hadoop1 ~]# crm status
Last updated: Wed Jul 15 15:49:09 2015 Last change: Wed Jul 15 15:37:07 2015 Stack: classic openais (with plugin) Current DC: hadoop1.abc.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 0 Resources configured Online: [ hadoop1.abc.com hadoop2.abc.com ]
6、
配置集群的工作属性,禁用stonith
corosync默认启用了stonith,而当前集群并没有相应的stonith设备,因此此默认配置目前尚不可用,这可以通过如下命令验正:
[root@hadoop1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid
可以通过如下命令先禁用stonith:
[root@hadoop1 ~]# crm configure
crm(live)configure# property stonith-enabled=false
使用如下命令查看当前的配置信息:
[root@hadoop1 ~]# crm configure show
node hadoop1.abc.com node hadoop2.abc.com property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 stonith-enabled=false
7、为集群添加集群资源
corosync支持heartbeat,LSB和ocf等类型的资源代理,目前较为常用的类型为LSB和OCF两类,stonith类专为配置stonith设备而用;
可以通过如下命令查看当前集群系统所支持的类型:
[root@hadoop1 ~]# crm ra
crm(live)ra#help cd Navigate the level structure classes List classes and providers help Show help (help topics for list of topics) info Show meta data for a RA list List RA for a class (and provider) ls List levels and commands providers Show providers for a RA and a class quit Exit the interactive shell up Go back to previous level
列出类别
crm(live)ra# classes lsb ocf / heartbeat pacemaker service stonith
如果想要查看某种类别下的所用资源代理的列表,可以使用类似如下命令实现:
# crm ra list lsb
# crm ra list ocf heartbeat
# crm ra list ocf pacemaker
# crm ra list stonith
crm(live)ra# list ocf CTDB ClusterMon Delay Dummy Filesystem HealthCPU HealthSMART IPaddr IPaddr2 IPsrcaddr LVM MailTo Route SendArp Squid Stateful SysInfo SystemHealth VirtualDomain Xinetd apache conntrackd controld db2 dhcpd ethmonitor exportfs iSCSILogicalUnit mysql named nfsnotify nfsserver pgsql ping pingd postfix remote rsyncd symlink tomcat
# crm ra info [class:[provider:]]resource_agent
如:crm(live)ra# info ocf:heartbeat:IPaddr
8、接下来要创建的web集群创建一个IP地址资源,以在通过集群提供web服务时使用;这可以通过如下方式实现:
语法:
primitive <rsc> [<class>:[<provider>:]]<type>
[params attr_list]
[operations id_spec]
[op op_type [<attribute>=<value>...] ...]
op_type :: start | stop | monitor
例子:
primitive apcfence stonith:apcsmart \ params ttydev=/dev/ttyS0 hostlist="node1 node2" \ op start timeout=60s \ op monitor interval=30m timeout=60s
应用:
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.12 crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node hadoop1.abc.com node hadoop2.abc.com primitive webip IPaddr \ params ip=192.168.1.12 property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false
[root@hadoop1 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:50:3b:a4 brd ff:ff:ff:ff:ff:ff inet 192.168.1.3/24 brd 192.168.1.255 scope global eth0 inet 192.168.1.12/24 brd 192.168.1.255 scope global secondary eth0 inet6 fe80::20c:29ff:fe50:3ba4/64 scope link valid_lft forever preferred_lft forever
[root@hadoop2~]# ssh hadoop1 '/etc/init.d/corosync stop' Signaling Corosync Cluster Engine (corosync) to terminate: [确定] Waiting for corosync services to unload:.[确定] [root@hadoop2~]# crm status Last updated: Wed Jul 15 23:07:07 2015 Last change: Wed Jul 15 21:53:01 2015 Stack: classic openais (with plugin) Current DC: hadoop1.abc.com - partition WITHOUT quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 1 Resources configured Online: [ hadoop2.abc.com ] OFFLINE: [ hadoop1.abc.com ]
上面的信息显示hadoop1.abc.com已经离线,但资源WebIP却没能在hadoop2.abc.com上启动。这是因为此时的集群状态为"WITHOUT quorum",即已经失去了quorum,此时集群服务本身已经不满足正常运行的条件,这对于只有两节点的集群来讲是不合理的。因此,我们可以通过如下的命令来修改忽略quorum不能满足的集群状态检查:
[root@hadoop2 ~]# crm crm(live)# configure crm(live)configure# property no-quorum-policy=ignore
正常启动hadoop1.abc.com后,集群资源WebIP很可能会重新从hadoop2.abc.com转移回hadoop1.abc.com。资源的这种在节点间每一次的来回流动都会造成那段时间内其无法正常被访问,所以,我们有时候需要在资源因为节点故障转移到其它节点后,即便原来的节点恢复正常也禁止资源再次流转回来。这可以通过定义资源的黏性(stickiness)来实现。在创建资源时或在创建资源后,都可以指定指定资源黏性。
资源黏性值范围及其作用:
0:这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复,只是资源可能会转移到非之前活动的节点上;
大于0:资源更愿意留在当前位置,但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置;
小于0:资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置;
INFINITY:如果不是因节点不适合运行资源(节点关机、节点待机、达到migration-threshold 或配置更改)而强制资源转移,资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复;
-INFINITY:资源总是移离当前位置;
9、
结合上面已经配置好的IP地址资源,将此集群配置成为一个active/passive模型的web(httpd)服务集群
为了将此集群启用为web(httpd)服务器集群,我们得先在各节点上安装httpd,并配置其能在本地各自提供一个测试页面。
[root@hadoop1 ~]# echo "<h1>hadoop1</h1>">/var/www/html/index.html
[root@hadoop1 ~]# service httpd stop 停止 httpd: [失败]
[root@hadoop1 ~]# service httpd start 正在启动 httpd: [确定]
[root@hadoop1 ~]# service httpd stop 停止 httpd: [确定]
[root@hadoop1 ~]# chkconfig httpd off
crm(live)configure# primitive webserver lsb:httpd crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node hadoop1.abc.com node hadoop2.abc.com primitive webip IPaddr \ params ip=192.168.1.12 primitive webserver lsb:httpd property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false
接下来我们将此httpd服务添加为集群资源。将httpd添加为集群资源有两处资源代理可用:lsb和ocf:heartbeat,为了简单起见,我们这里使用lsb类型:
首先可以使用如下命令查看lsb类型的httpd资源的语法格式:
crm(live)# ra info lsb:httpd start and stop Apache HTTP Server (lsb:httpd) The Apache HTTP Server is an efficient and extensible \ server implementing the current HTTP standards. Operations' defaults (advisory minimum): start timeout=15 stop timeout=15 status timeout=15 restart timeout=15 force-reload timeout=15 monitor timeout=15 interval=15
接下来新建资源WebSite:
crm(live)# configure primitive WebSever lsb:httpd configure corosync crm(live)# configure crm(live)configure# verify crm(live)configure# commit INFO: apparently there is nothing to commit INFO: try changing something first crm(live)configure# show node hadoop1.abc.com node hadoop2.abc.com primitive WebSever lsb:httpd primitive webip IPaddr \ params ip=192.168.1.12 primitive webserver lsb:httpd property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false
先停止,再删除
[root@hadoop1 ~]# crm status Last updated: Thu Jul 16 01:15:18 2015 Last change: Thu Jul 16 01:11:16 2015 Stack: classic openais (with plugin) Current DC: hadoop2.abc.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ hadoop1.abc.com hadoop2.abc.com ] webip (ocf::heartbeat:IPaddr): Started hadoop1.abc.com webserver (lsb:httpd): Started hadoop2.abc.com WebSever (lsb:httpd): Started hadoop2.abc.com
crm(live)resource# stop WebSever crm(live)resource# status WebSever resource WebSever is NOT running
验证一下
[root@hadoop1 ~]# crm status Last updated: Thu Jul 16 05:11:27 2015 Last change: Thu Jul 16 05:08:50 2015 Stack: classic openais (with plugin) Current DC: hadoop2.abc.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 3 Resources configured Online: [ hadoop1.abc.com hadoop2.abc.com ] webip (ocf::heartbeat:IPaddr): Started hadoop1.abc.com webserver (lsb:httpd): Started hadoop2.abc.com
让hadoop1成为备节点
crm(live)node# standby hadoop1.abc.com crm(live)node# cd .. crm(live)# status Last updated: Thu Jul 16 05:28:46 2015 Last change: Thu Jul 16 05:28:32 2015 Stack: classic openais (with plugin) Current DC: hadoop2.abc.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Node hadoop1.abc.com: standby Online: [ hadoop2.abc.com ] webip (ocf::heartbeat:IPaddr): Started hadoop2.abc.com webserver (lsb:httpd): Started hadoop2.abc.com
此时在浏览器输入192.168.1.12显示的网页内容是haddop2
10、定义排列约束
crm(live)configure# colocation webserver_with_webip -inf: webserver webip
crm(live)configure# show xml
<?xml version="1.0" ?> <cib num_updates="2" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="20" admin_epoch="0" cib-last-written="Thu Jul 16 05:42:29 2015" have-quorum="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/> </cluster_property_set> </crm_config> <nodes> <node id="hadoop2.abc.com" uname="hadoop2.abc.com"> <instance_attributes id="nodes-hadoop2.abc.com"> <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/> </instance_attributes> </node> <node id="hadoop1.abc.com" uname="hadoop1.abc.com"> <instance_attributes id="nodes-hadoop1.abc.com"> <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/> </instance_attributes> </node> </nodes> <resources> <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr"> <instance_attributes id="webip-instance_attributes"> <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/> </instance_attributes> </primitive> <primitive id="webserver" class="lsb" type="httpd"/> </resources> <constraints>以下这一行显示,谁先谁后 <rsc_colocation id="webserver_with_webip" score="-INFINITY" rsc="webserver" with-rsc="webip"/> </constraints> </configuration>
11、定义顺序约束
先启动webip再启动webserver
crm(live)configure# order webip_befor_webserver Mandatory: webip:start webserver
crm(live)configure# show xm <?xml version="1.0" ?> <cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/> </cluster_property_set> </crm_config> <nodes> <node id="hadoop2.abc.com" uname="hadoop2.abc.com"> <instance_attributes id="nodes-hadoop2.abc.com"> <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/> </instance_attributes> </node> <node id="hadoop1.abc.com" uname="hadoop1.abc.com"> <instance_attributes id="nodes-hadoop1.abc.com"> <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/> </instance_attributes> </node> </nodes> <resources> <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr"> <instance_attributes id="webip-instance_attributes"> <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/> </instance_attributes> </primitive> <primitive id="webserver" class="lsb" type="httpd"/> <?xml version="1.0" ?> <cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/> </cluster_property_set> </crm_config> <nodes> <node id="hadoop2.abc.com" uname="hadoop2.abc.com"> <instance_attributes id="nodes-hadoop2.abc.com"> <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/> </instance_attributes> </node> <node id="hadoop1.abc.com" uname="hadoop1.abc.com"> <instance_attributes id="nodes-hadoop1.abc.com"> <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/> </instance_attributes> </node> </nodes> <resources> <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr"> <instance_attributes id="webip-instance_attributes"> <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/> </instance_attributes> </primitive> <primitive id="webserver" class="lsb" type="httpd"/> <?xml version="1.0" ?> <cib num_updates="1" dc-uuid="hadoop2.abc.com" crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="22" admin_epoch="0" cib-last-written="Thu Jul 16 06:03:57 2015" have-quorum="1"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.11-97629de"/> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/> <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/> </cluster_property_set> </crm_config> <nodes> <node id="hadoop2.abc.com" uname="hadoop2.abc.com"> <instance_attributes id="nodes-hadoop2.abc.com"> <nvpair id="nodes-hadoop2.abc.com-standby" name="standby" value="off"/> </instance_attributes> </node> <node id="hadoop1.abc.com" uname="hadoop1.abc.com"> <instance_attributes id="nodes-hadoop1.abc.com"> <nvpair id="nodes-hadoop1.abc.com-standby" name="standby" value="on"/> </instance_attributes> </node> </nodes> <resources> <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr"> <instance_attributes id="webip-instance_attributes"> <nvpair name="ip" value="192.168.1.12" id="webip-instance_attributes-ip"/> </instance_attributes> </primitive> <primitive id="webserver" class="lsb" type="httpd"/> </resources> <constraints> <rsc_order id="webip_befor_webserver" kind="Mandatory" first="webip" first-action="start" then="webserver"/> <rsc_colocation id="webserver_with_webip" score="-INFINITY" rsc="webserver" with-rsc="webip"/> </constraints> </configuration> </cib>
12、更倾向运行在hadoop2节点上
crm(live)configure# location webserver_hadoop2 webserver 200: hadoop2.abc.com
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd ..
13、还可以定义资源默认属性
14、定义监控功能
crm(live)resource# stop webip
crm(live)resource# stop webserver
crm(live)resource# status webip (ocf::heartbeat:IPaddr): Stopped webserver (lsb:httpd): Stopped
资源非法关掉,最后做一次清理
crm(live)resource# cleanup webip
Cleaning up webip on hadoop1.abc.com Cleaning up webip on hadoop2.abc.com Waiting for 2 replies from the CRMd.. OK crm(live)resource# cleanup webserver Cleaning up webserver on hadoop1.abc.com Cleaning up webserver on hadoop2.abc.com Waiting for 2 replies from the CRMd.. OK
crm(live)resource# cd ..
crm(live)# configure
crm(live)configure# help monitor
crm(live)configure# crm(live)configure# monitor webserver 20s:10s
crm(live)configure# verify WARNING: webserver: specified timeout 10s for monitor is smaller than the advised 15
crm(live)configure# edit
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
crm(live)configure# cd ..
crm(live)# status Last updated: Thu Jul 16 23:13:42 2015 Last change: Thu Jul 16 22:43:40 2015 Stack: classic openais (with plugin) Current DC: hadoop2.abc.com - partition with quorum Version: 1.1.11-97629de 2 Nodes configured, 2 expected votes 2 Resources configured Online: [ hadoop1.abc.com hadoop2.abc.com ] crm(live)# resource crm(live)resource# start webip crm(live)resource# start webserver
[root@hadoop2 ~]# ss -tnl | grep 80 LISTEN 0 128 :::80 :::*
[root@hadoop2 ~]# service httpd stop 停止 httpd: [确定]
[root@hadoop2 ~]# tail -f /var/log/cluster/corosync.log Jul 16 23:23:03 [7736] hadoop2.abc.com cib: info: cib_perform_op: Diff: --- 0.61.25 2 Jul 16 23:23:03 [7736] hadoop2.abc.com cib: info: cib_perform_op: Diff: +++ 0.61.26 (null) Jul 16 23:23:03 [7736] hadoop2.abc.com cib: info: cib_perform_op: + /cib: @num_updates=26 Jul 16 23:23:03 [7736] hadoop2.abc.com cib: info: cib_perform_op: + /cib/status/node_state[@id='hadoop2.abc.com']/lrm[@id='hadoop2.abc.com']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_monitor_30000']: @transition-key=1:45:0:5c125c03-7d52-4d11-b5ee-ec4bc424ed07, @transition-magic=0:0;1:45:0:5c125c03-7d52-4d11-b5ee-ec4bc424ed07, @call-id=68, @last-rc-change=1437060183, @exec-time=31 Jul 16 23:23:03 [7736] hadoop2.abc.com cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=hadoop2.abc.com/crmd/278, version=0.61.26) Jul 16 23:23:03 [7741] hadoop2.abc.com crmd: info: match_graph_event: Action webserver_monitor_30000 (1) confirmed on hadoop2.abc.com (rc=0) Jul 16 23:23:03 [7741] hadoop2.abc.com crmd: notice: run_graph: Transition 45 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-95.bz2): Complete Jul 16 23:23:03 [7741] hadoop2.abc.com crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE Jul 16 23:23:03 [7741] hadoop2.abc.com crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 16 23:23:08 [7736] hadoop2.abc.com cib: info: cib_process_ping: Reporting our current digest to hadoop2.abc.com: ae8ef3d1bb7af4518c2c6ce7c4db1f08 for 0.61.26 (0x1b35dc0 0)
[root@hadoop2 ~]# ss -tnl State Recv-Q Send-Q Local Address:Port Address:Port LISTEN 0 128 :::34476 :::* LISTEN 0 128 :::111 :::* LISTEN 0 128 *:111 *:* LISTEN 0 128 :::80 :::* LISTEN 0 128 :::22 :::* LISTEN 0 128 *:22 *:* LISTEN 0 128 127.0.0.1:631 *:* LISTEN 0 128 ::1:631 :::* LISTEN 0 100 ::1:25 :::* LISTEN 0 100 127.0.0.1:25 *:* LISTEN 0 128 *:42907
定义多一个Vip地址
crm(live)# configure
crm(live)configure# primitive vip ocf:heartbeat:IP ocf:heartbeat:IPaddr ocf:heartbeat:IPaddr2 ocf:heartbeat:IPsrcaddr
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 monitor interval=30s timeout=15s ERROR: syntax in primitive: Unknown arguments: monitor interval=30s timeout=15s near <monitor> parsing 'primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 monitor interval=30s timeout=15s' crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 op monitor interval=30s timeout=15s
crm(live)configure# verify WARNING: vip: specified timeout 15s for monitor is smaller than the advised 20s crm(live)configure# delete vip crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.13 op monitor interval=30s timeout=20s crm(live)configure# verify crm(live)configure# commit crm(live)configure# show node hadoop1.abc.com \ attributes standby=off node hadoop2.abc.com \ attributes standby=off primitive vip IPaddr \ params ip=192.168.1.13 \ op monitor interval=30s timeout=20s primitive webip IPaddr \ params ip=192.168.1.12 \ meta target-role=Started primitive webserver lsb:httpd \ meta target-role=Started \ op monitor interval=30s timeout=15s location webip_on_hadoop2 webip 200: hadoop2.abc.com location webserver_on_hadoop2 webserver 200: hadoop2.abc.com order webip_befor_webserver Mandatory: webip:start webserver property cib-bootstrap-options: \ dc-version=1.1.11-97629de \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1437057604
转载于:https://blog.51cto.com/zouqingyun/1674207