目的: 实现高可用的Web群集,后方共享存储:ISCSI(IP-SAN); 为了实现资源同步,采用OCFS群集文件系统. 特点: 高可用节点之间,不必需要心跳线链接 需要去掉通信的口令(scp),实现无障碍通信. --------------------------------------------- 地址规划: *HA架构服务器* node1.a.com eth0-ip: eth1: node2.a.com eth0-ip: eth1: Vip: 注:eth0桥接、eth1 Host-Only --------------------------------------------------------- *Target服务器端* eth0-ip: 注:eth0 Host-Only -------------------------------------------------------- ***配置步骤*** ———————————————————————————— Step1:准备工作 -------------------- ①分别在给个节点上配置静态ip地址(service network restart) ②进行节点间的时钟同步.(hwclock / date -s "2013-06-14 **:**:**") ③修改HA节点的主机名,使相互能进行名称解析. vim /etc/sysyconfig/network 1 NETWORKING=yes 2 NETWORKING_IPV6=yes 3 HOSTNAME=node1.a.com(node2.a.com) vim /etc/hosts 3 localhost.localdomain localhost 4 ::1 localhost6.localdomain6 localhost6 5 node1.a.com node1 6 node2.a.com node2 hostname node1.a.com ④实现节点间的无障碍通信(通信时不需要输入对方的root密码) node1: ssh-keygen -t rsa //生成node1节点的ssh服务的公钥和私钥对 cd /root/.ssh/ sh-copy-id -i id_rsa.pub node2 //将node1的公钥传递给node2 输入node2的root密码:123456 node2: ssh-keygen -t rsa //生成node2节点的ssh服务的公钥和私钥对 cd /root/.ssh/ sh-copy-id -i id_rsa.pub node1 //将node1的公钥传递给node1 输入node1的root密码:123456 node1上无障碍通信测试:scp /etc/fstab node2(不再需要root密码) ⑤node1(node2)上配置本地yum源,挂载光盘,安装Corosync相关软件包 yum localinstall cluster-glue-1.0.6-1.6.el5.i386.rpm \\ cluster-glue-libs-1.0.6-1.6.el5.i386.rpm \\ corosync-1.2.7-1.1.el5.i386.rpm \\ corosynclib-1.2.7-1.1.el5.i386.rpm \\ heartbeat-3.0.3-2.3.el5.i386.rpm \\ heartbeat-libs-3.0.3-2.3.el5.i386.rpm \\ libesmtp-1.0.4-5.el5.i386.rpm \\ pacemaker-1.1.5-1.1.el5.i386.rpm \\ pacemaker-libs-1.1.5-1.1.el5.i386.rpm \\ perl-TimeDate-1.16-5.el5.noarch.rpm \\ resource-agents-1.0.4-1.1.el5.i386.rpm --nogpgcheck rpm -ivh openais-1.1.3-1.6.el5.i386.rpm rpm -ivh openaislib-1.1.3-1.6.el5.i386.rpm --------------------------------------- Step2:进行Corosync的具体配置 --------------------------------------- ①拷贝生成配置文件,并进行相关的配置 cd /etc/corosync/ cp -p corosync.conf.example corosync.conf vim corosync.conf # Please read the corosync.conf.5 manual page compatibility: whitetank(表示兼容corosync 0.86的版本,向后兼容,兼容老的版本,一些新的功能可能无法实用) totem { (图腾的意思 ,多个节点传递心跳时的相关协议的信息) version: 2 版本号 secauth: off 是否打开安全认证 threads: 0 多少个线程 0 :无限制 interface { ringnumber: 0 bindnetaddr: (通过哪个网络地址进行通讯,可以给个主机地址) mcastaddr: mcastport: 5405 } } logging { (进行的日志的相关选项配置) fileline: off 一行显示所有的日志信息 to_stderr: no 是否发送标准的出错到标准的出错设备上(屏幕) to_logfile: yes 将信息输出到日志文件中 to_syslog: yes 同时将信息写入到系统日志中(两个用一个,占系统资源) logfile: /var/log/cluster/corosync.log (***日志文件的存放目录,需要手动创建,不创建,服务将会起不来***) debug: off 是否开启debug功能,系统排查时,可以启用该功能 timestamp: on 日志是否记录时间 (以下是openais的东西,可以不用打开) logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } service { (补充一些东西,前面只是底层的东西,因为要用pacemaker) ver: 0 name: pacemaker } aisexec { (虽然用不到openais ,但是会用到一些子选项) user: root group: root } ②为了方便其他主机加入该集群,需要认证,生成一个authkey corosync-keygen [root@node1 corosync]# ll total 28 -rw-r--r-- 1 root root 5384 Jul 28 2010 amf.conf.example -r-------- 1 root root 128 May 7 16:16 authkey -rw-r--r-- 1 root root 513 May 7 16:14 corosync.conf -rw-r--r-- 1 root root 436 Jul 28 2010 corosync.conf.example drwxr-xr-x 2 root root 4096 Jul 28 2010 service.d drwxr-xr-x 2 root root 4096 Jul 28 2010 uidgid.d ③创建日志文件的存放目录 mkdir /var/log/cluster ④进行节点间的配置同步. [root@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/ authkey 100% 128 0.1KB/s 00:00 corosync.conf 100% 513 0.5KB/s 00:00 [root@node1 corosync]# ssh node2 'mkdir /var/log/cluster' ⑤启动服务 service corosync start ssh node2 '/etc/init.d/corosync start' ⑥查看corosync的引擎启动情况 grep -i -e "corosync cluster engine" -e "configuration file" /var/log/messages ⑦查看初始化成员节点通知是否发出 grep -i totem /var/log/messages ⑧检查过程中是否有错误产生 grep -i error: /var/log/messages |grep -v unpack_resources ⑨检查pacemaker是否已经启动了 grep -i pcmk_startup /var/log/messages ⑩在任何一个节点上 查看集群的成员状态 [root@node1 ~]# crm status ============ Last updated: Fri Jun 14 22:06:21 2013 Stack: openais Current DC: node1.a.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1.a.com node2.a.com ] ------------------------------------------------------------------- Step3:提供高可用性的服务 --------------------------------- Corosync中,定义服务可以用两种接口: 1:图形(hb_gui)Heartbeat的一种图形工具,需要安装Heartbeat需要的软件包 yum localinstall heartbeat-2.1.4-9.el5.i386.rpm \\ heartbeat-gui-2.1.4-9.el5.i386.rpm \\ heartbeat-pils-2.1.4-10.el5.i386.rpm \\ heartbeat-stonith-2.1.4-10.el5.i386.rpm \\ libnet-1.1.4-3.el5.i386.rpm \\ perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck 安装完后:hb_gui图形进行群集配置 2:crm(pacemaker提供的一种shell) ①显示当前的配置信息 crm configure show ②进行配置文件的语法检测 crm_verify -L [root@node1 corosync]# crm_verify -L crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid -V may provide more details 可以看到有stonith错误,在高可用的环境里面,会禁止使用任何支援 可以禁用stonith 方法: [root@node1 corosync]# crm //进入crm的shell模式下 crm(live)# configure //进入全局配置模式 crm(live)configure# property stonith-enabled=false //关闭stonith机制 crm(live)configure# commit //提交保存配置信息 crm(live)configure# show //显示当前配置 crm(live)configure# exit 再次进行语法检测:crm_verify -L 就不会报错了. ③群集资源类型4种 [root@node1 corosync]# crm crm(live)# configure crm(live)# help primitive 本地主资源 (只能运行在一个节点上) group 把多个资源轨道一个组里面,便于管理 clone 需要在多个节点上同时启用的 (如ocfs2 ,stonith ,没有主次之分) master 有主次之分,如drbd 。。。。。 。。。。。 ④用资源代理进行服务的配置 [root@node1 corosync]# crm crm(live)# ra crm(live)# classes heartbeat lsb ocf / heartbeat pacemaker stonith ⑤查看资源代理的脚本列表 [root@node1 corosync]# crm crm(live)# ra crm(live)ra# list lsb NetworkManager acpid anacron apmd atd auditd autofs avahi-daemon avahi-dnsconfd bluetooth capi conman corosync cpuspeed crond cups cups-config-daemon dnsmasq drbd dund firstboot functions gpm haldaemon halt heartbeat hidd hplip httpd ip6tables ipmi iptables irda irqbalance iscsi iscsid isdn kdump killall krb524 kudzu lm_sensors logd lvm2-monitor mcstrans mdmonitor mdmpd messagebus microcode_ctl multipathd netconsole netfs netplugd network nfs nfslock nscd ntpd o2cb ocfs2 openais openibd pacemaker pand pcscd portmap psacct rawdevices rdisc readahead_early readahead_later restorecond rhnsd rpcgssd rpcidmapd rpcsvcgssd saslauthd sendmail setroubleshoot single smartd sshd syslog vncserver wdaemon winbind wpa_supplicant xfs xinetd ypbind yum-updatesd 查看ocf的heartbeat crm(live)ra# list ocf heartbeat ⑥使用info或meta显示一个资源的详细信息 meta ocf:heartbeat:IPaddr ⑦配置资源(IP地址:vip- Web服务:httpd) [root@node1 ~]# crm crm(live)# configure crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip= crm(live)configure# show //查看 node node1.a.com node node2.a.com primitive webip ocf:heartbeat:IPaddr \\ params ip="" property $id="cib-bootstrap-options" \\ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\ cluster-infrastructure="openais" \\ expected-quorum-votes="2" \\ stonith-enabled="false" crm(live)configure# commit //提交 crm(live)# status //状态查询 ============ Last updated: Mon May 7 19:39:37 2013 Stack: openais Current DC: node1.a.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.a.com node2.a.com ] webip(ocf::heartbeat:IPaddr):Started node1.a.com 可以看出该资源在node1上启动 使用ifconfig 在node1上进行查看 [root@node1 ~]# ifconfig eth0:0 Link encap:Ethernet HWaddr 00:0C:29:25:D2:BC inet addr: Bcast: Mask: UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interrupt:67 Base address:0x2000 定义httpd资源 在node1和node2上安装httpd服务,不需开机启动. yum install httpd chkconfig httpd off 查看httpd服务的资源代理:lsb [root@node1 corosync]# crm crm(live)# ra crm(live)ra# list lsb 查看httpd的参数 crm(live)ra# meta lsb:httpd 定义httpd的资源 crm(live)configure# primitive webserver lsb:httpd crm(live)configure# show node node1.a.com node node2.a.com primitive webip ocf:heartbeat:IPaddr \\ params ip="" primitive webserver lsb:httpd property $id="cib-bootstrap-options" \\ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\ cluster-infrastructure="openais" \\ expected-quorum-votes="2" \\ stonith-enabled="false" crm(live)# status ============ Last updated: Mon May 7 20:01:12 2013 Stack: openais Current DC: node1.a.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ node1.a.com node2.a.com ] webIP(ocf::heartbeat:IPaddr):Started node1.a.com webserver(lsb:httpd):Started node2.a.com 发现httpd已经启动了,但是在node2节点上 (高级群集服务资源越来越多,会分布在不同的节点上,以尽量负载均衡) 需要约束在同一个节点上,定义成一个组 ⑧定义一个资源组,将资源进行绑定 crm(live)# configure crm(live)configure# help group The `group` command creates a group of resources. Usage: ............... group [...] [meta attr_list] [params attr_list] attr_list :: [$id=] = [=...] | $id-ref= ............... Example: ............... group internal_www disk0 fs0 internal_ip apache \\ meta target_role=stopped ............... 定义组进行资源绑定 crm(live)configure# group web-res webip webserver crm(live)configure# show node node1.a.com node node2.a.com primitive webip ocf:heartbeat:IPaddr \\ params ip="" primitive webserver lsb:httpd group web-res webip webserver property $id="cib-bootstrap-options" \\ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\ cluster-infrastructure="openais" \\ expected-quorum-votes="2" \\ stonith-enabled="false" 查看群集的状态 crm(live)# status ============ Last updated: Mon May 7 20:09:06 2013 Stack: openais Current DC: node1.a.com - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node1.a.com node2.a.com ] Resource Group: web-res webIP(ocf::heartbeat:IPaddr):Started node1.a.com webserver(lsb:httpd):Started node1.a.com (现在ip地址和 httpd都已经在node1上了) ------------------------------------------------------------ Step4:进行节点间的切换测试. --------------------------------------- node1:将corosync服务停掉,在节点node2上观察 service corosync stop [root@node2 corosync]# crm status ============ Last updated: Mon May 7 20:16:58 2013 Stack: openais Current DC: node2.a.com - partition WITHOUT quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 1 Resources configured. ============ Online: [ node2.a.com ] OFFLINE: [ node1.a.com ] 可以看到:由于node2节点上没有票数,导致不能正常的资源切换. 解决方法:忽略仲裁磁盘选项.quorum 可选参数有: ignore (忽略) freeze (冻结,表示已经启用的资源继续实用,没有启用的资源不能启用) stop (默认) suicide (所有的资源杀掉) 再node1上: service corosync start [root@node1 corosync]# crm crm(live)# configure crm(live)configure# property no-quorum-policy=ignore crm(live)configure# commit crm(live)configure# show (在次查看quorum 的属性) node node1.a.com node node2.a.com primitive webip ocf:heartbeat:IPaddr \\ params ip="" primitive webserver lsb:httpd group web-res webip webserver property $id="cib-bootstrap-options" \\ dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\ cluster-infrastructure="openais" \\ expected-quorum-votes="2" \\ stonith-enabled="false" \\ no-quorum-policy="ignore" (已经关闭) 再次进行切换测试,资源轮转正常! ------------------------------------------------------ Step5:corosync的常见指令 -------------------------- ①crm_attribute 修改集群的全局属性信息 ②crm_resource 修改资源 ③6crm_node 管理节点 crm_node -e 查看节点的时代(配置文件修改过几次了) crm_node -q 显示当前节点的票数 1 ④cibadmin 集群配置的工具 -u, --upgrade Upgrade the configuration to the latest syntax -Q, --query Query the contents of the CIB -E, --erase Erase the contents of the whole CIB -B, --bump Increase the CIB's epoch value by 1 如果某一个资源定义错了,就可以实用该工具进行删除 -D, --delete Delete the first object matching the supplied criteria, Eg. 也可以在crm的命令行下 crm(live)configure# delete usage: delete [...] 也可以在该模式下执行edit 执行完毕后,commit 提交 -------------------------------------------------------------------- Step6:ISCSI(IP-SAN)存储配置详情 ------------------------------------------------ 一:target(后方的存储介质) ①新添加一块磁盘(或分区) fdisk -l 分区:fdisk /dev/sda(n--p--4--+2g-w)---添加一块磁盘sda6 更新分区表:(cat /proc/partitions) partprobe /dev/sda(不重启,更新分区表) ②安装target需要的软件包,启动服务. cd /mnt/cdrom/ClusterStorage rpm -ivh perl-Config-General-2.40-1.e15.noarchrpm rpm -ivh scsi-target-utils-0.0-5.20080917snap.e15.i386.rpm service tgtd start ③添加新的iscsi的target. 添加:tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2013-06.com.a.target:disk 显示:tgtadm --lld iscsi --op show --mode target 存储:tgtadm --lld iscsi --op new --mode=logicalunit --tid=1 --lun=1 --backing-store /dev/sda4 --lld [driver] --op new --mode=logicalunit --tid=[id] --lun=[lun] --backing-store [path] 验证:tgtadm --lld iscsi --op bind --mode=target --tid=1 --initiator-address= tgtadm --lld [driver] --op bind --mode=target --tid=[id] --initiator-address=[address] ④将配置添加到配置文件中,可以开机自动加载. vim /etc/tgt/targets.conf backing-store /dev/sda6 initiator-address 二:initiator(node1和node2) cd /mnt/cdrom/Server rpm -ivh iscsi-initiator-utils- service iscsi start 发现:iscsiadm --mode discovery --type sendtargets --portal 认证登录:iscsiadm --mode node --targetname iqn.2013-06.com.a.target:disk --portal --login ⑤Target端显示在线的用户情况 tgt-admin -s Target 1: iqn.2013-06.com.a.target:disk System information: Driver: iscsi State: ready I_T nexus information: I_T nexus: 1 Initiator: iqn.2013-06.com.a.realserver2 Connection: 0 IP Address: I_T nexus: 2 Initiator: iqn.2013-06.com.a.realserver1 Connection: 0 IP Address: LUN information: LUN: 0 Type: controller SCSI ID: deadbeaf1:0 SCSI SN: beaf10 Size: 0 MB Online: Yes Removable media: No Backing store: No backing store LUN: 1 Type: disk SCSI ID: deadbeaf1:1 SCSI SN: beaf11 Size: 4178 MB Online: Yes Removable media: No Backing store: /dev/sda6 Account information: ACL information: ⑥node1和node2上查看本地的磁盘列表。 fdisk -l Disk /dev/sdb: 4178 MB, 4178409984 bytes 129 heads, 62 sectors/track, 1020 cylinders Units = cylinders of 7998 * 512 = 4094976 bytes Disk /dev/sdb doesn't contain a valid partition table ------------------------------------------------------------- Step7:将新的磁盘sdb格式为OCFS2群集文件系统. ------------------------------------------------------------- ①在两个节点上安装需要的软件包 yum localinstall ocfs2-2.6.18-164.el5-1.4.7-1.el5.i686.rpm \\ ocfs2-tools-1.4.4-1.el5.i386.rpm \\ ocfs2console-1.4.4-1.el5.i386.rpm ②对主配置文件进行配置. 方法一:手动创建主配置文件 mkdir /etc/ocfs2/ vim cluster.conf node: ip_port = 7777 ip_address = number = 0 name = node1.a.com cluster = ocfs2 node: ip_port = 7777 ip_address = number = 1 name = node2.a.com cluster = ocfs2 cluster: node_count = 2 name = ocfs2 进行节点间的配置同步. scp -r /etc/ocfs2 node2:/etc/ 方法二:GUI图形下进行配置 ocfs2console ③两个节点上分别加载o2cb模块,启动服务. /etc/init.d/o2cb load Loading module "configfs":OK Mounting configfs filesystem at /config:OK Loading module "ocfs2_nodemanager":OK Loading module "ocfs2_dlm":OK Loading module "ocfs2_dlmfs":OK /etc/init.d/ocfs2 start chkconfig ocfs2 on /etc/init.d/o2cb online ocfs2 /etc/init.d/o2cb configure Configuring the O2CB driver. 这将配置 O2CB 驱动程序的引导属性。以下问题将决定在引导时是否加载驱动程序。当前值将在方括号(“[]”)中显示。按 而不键入答案将保留该当前值。Ctrl-C 将终止。 Load O2CB driver on boot (y/n) [n]:y Cluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2 Writing O2CB configuration:OK Loading module "configfs":OK Mounting configfs filesystem at /config:OK Loading module "ocfs2_nodemanager":OK Loading module "ocfs2_dlm":OK Loading module "ocfs2_dlmfs":OK Mounting ocfs2_dlmfs filesystem at /dlm:OK Starting cluster ocfs2:OK /etc/init.d/o2cb status Driver for "configfs": Loaded Filesystem "configfs": Mounted Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster ocfs2: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active ④node1上格式化OCFS2文件系统 mkfs -t ocfs2 /dev/sdb ⑤两个节点上分别挂载 mount -t ocfs2 /dev/sdb /var/www/html mount /dev/sdb on /var/www/html type ocfs2 (rw,_netdev,heartbeat=local) cd /var/www/html echo "Welcome" >index.html ⑥两个节点上进行开机自动挂载 vim /etc/fstab /dev/sdb /var/www/html ocfs2 defaults 0 0 ------------------------------------------------------------------- Step8:访问测试 -------------------- Welcome