目的:
实现高可用的Web群集,后方共享存储:ISCSI(IP-SAN);
为了实现资源同步,采用OCFS群集文件系统.
特点:
高可用节点之间,不必需要心跳线链接
需要去掉通信的口令(scp),实现无障碍通信.
---------------------------------------------
地址规划:
*HA架构服务器*
node1.a.com  eth0-ip:192.168.102.101 eth1:192.168.1.100
node2.a.com  eth0-ip:192.168.102.102 eth1:192.168.1.200
Vip:192.168.102.200
注:eth0桥接、eth1 Host-Only
---------------------------------------------------------
*Target服务器端*
eth0-ip:192.168.1.10
注:eth0 Host-Only
--------------------------------------------------------
***配置步骤***
————————————————————————————
Step1:准备工作
--------------------
①分别在给个节点上配置静态ip地址(service network restart)
②进行节点间的时钟同步.(hwclock / date -s "2013-06-14 **:**:**")
③修改HA节点的主机名,使相互能进行名称解析.
vim /etc/sysyconfig/network
 1 NETWORKING=yes
 2 NETWORKING_IPV6=yes
 3 HOSTNAME=node1.a.com(node2.a.com)                         
vim /etc/hosts
 3 127.0.0.1       localhost.localdomain localhost
 4 ::1             localhost6.localdomain6 localhost6
 5 192.168.102.101 node1.a.com node1
 6 192.168.102.102 node2.a.com node2
hostname node1.a.com
④实现节点间的无障碍通信(通信时不需要输入对方的root密码)
node1:
ssh-keygen -t rsa //生成node1节点的ssh服务的公钥和私钥对
cd /root/.ssh/
sh-copy-id -i id_rsa.pub  node2  //将node1的公钥传递给node2
输入node2的root密码:123456
node2:
ssh-keygen -t rsa //生成node2节点的ssh服务的公钥和私钥对
cd /root/.ssh/
sh-copy-id -i id_rsa.pub  node1  //将node1的公钥传递给node1
输入node1的root密码:123456
node1上无障碍通信测试:scp /etc/fstab node2(不再需要root密码)
⑤node1(node2)上配置本地yum源,挂载光盘,安装Corosync相关软件包
yum localinstall  cluster-glue-1.0.6-1.6.el5.i386.rpm  \\
                 cluster-glue-libs-1.0.6-1.6.el5.i386.rpm \\ 
                 corosync-1.2.7-1.1.el5.i386.rpm \\
                 corosynclib-1.2.7-1.1.el5.i386.rpm  \\
                 heartbeat-3.0.3-2.3.el5.i386.rpm  \\
                 heartbeat-libs-3.0.3-2.3.el5.i386.rpm \\
                 libesmtp-1.0.4-5.el5.i386.rpm \\
                 pacemaker-1.1.5-1.1.el5.i386.rpm \\
                 pacemaker-libs-1.1.5-1.1.el5.i386.rpm  \\
                 perl-TimeDate-1.16-5.el5.noarch.rpm  \\
                 resource-agents-1.0.4-1.1.el5.i386.rpm  --nogpgcheck
rpm -ivh openais-1.1.3-1.6.el5.i386.rpm
rpm -ivh openaislib-1.1.3-1.6.el5.i386.rpm
---------------------------------------
Step2:进行Corosync的具体配置
---------------------------------------
①拷贝生成配置文件,并进行相关的配置
cd /etc/corosync/
cp -p corosync.conf.example corosync.conf
vim  corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank(表示兼容corosync 0.86的版本,向后兼容,兼容老的版本,一些新的功能可能无法实用)
totem {                (图腾的意思  ,多个节点传递心跳时的相关协议的信息)
       version: 2      版本号
       secauth: off    是否打开安全认证
       threads: 0      多少个线程  0 :无限制
       interface {
               ringnumber: 0
               bindnetaddr: 192.168.102.0 (通过哪个网络地址进行通讯,可以给个主机地址)
               mcastaddr: 226.94.1.1
               mcastport: 5405
       }
}
logging {               (进行的日志的相关选项配置)
       fileline: off    一行显示所有的日志信息
       to_stderr: no    是否发送标准的出错到标准的出错设备上(屏幕)
       to_logfile: yes  将信息输出到日志文件中
       to_syslog: yes   同时将信息写入到系统日志中(两个用一个,占系统资源)
       logfile: /var/log/cluster/corosync.log (***日志文件的存放目录,需要手动创建,不创建,服务将会起不来***)
       debug: off       是否开启debug功能,系统排查时,可以启用该功能                                           
       timestamp: on    日志是否记录时间
                        (以下是openais的东西,可以不用打开)
       logger_subsys {
               subsys: AMF
               debug: off
       }
}
amf {
       mode: disabled
}
service {                (补充一些东西,前面只是底层的东西,因为要用pacemaker)
       ver: 0
       name: pacemaker
}
aisexec {                (虽然用不到openais ,但是会用到一些子选项)
       user: root
       group: root
}
②为了方便其他主机加入该集群,需要认证,生成一个authkey
corosync-keygen
[root@node1 corosync]# ll
total 28
-rw-r--r-- 1 root root 5384 Jul 28  2010 amf.conf.example
-r-------- 1 root root  128 May  7 16:16 authkey
-rw-r--r-- 1 root root  513 May  7 16:14 corosync.conf
-rw-r--r-- 1 root root  436 Jul 28  2010 corosync.conf.example
drwxr-xr-x 2 root root 4096 Jul 28  2010 service.d
drwxr-xr-x 2 root root 4096 Jul 28  2010 uidgid.d
③创建日志文件的存放目录
mkdir /var/log/cluster
④进行节点间的配置同步.
[root@node1 corosync]# scp -p authkey  corosync.conf node2:/etc/corosync/
authkey                                          100%  128     0.1KB/s   00:00   
corosync.conf                                    100%  513     0.5KB/s   00:00
[root@node1 corosync]# ssh node2 'mkdir /var/log/cluster'
⑤启动服务
service corosync start
ssh node2 '/etc/init.d/corosync start'
⑥查看corosync的引擎启动情况
grep -i  -e "corosync cluster engine" -e "configuration file" /var/log/messages 
⑦查看初始化成员节点通知是否发出
grep -i totem /var/log/messages
⑧检查过程中是否有错误产生
grep -i error:  /var/log/messages  |grep -v unpack_resources
⑨检查pacemaker是否已经启动了
grep -i pcmk_startup /var/log/messages
⑩在任何一个节点上  查看集群的成员状态
[root@node1 ~]# crm status
============
Last updated: Fri Jun 14 22:06:21 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
-------------------------------------------------------------------
Step3:提供高可用性的服务
---------------------------------
Corosync中,定义服务可以用两种接口:
1:图形(hb_gui)Heartbeat的一种图形工具,需要安装Heartbeat需要的软件包
yum localinstall heartbeat-2.1.4-9.el5.i386.rpm \\
                heartbeat-gui-2.1.4-9.el5.i386.rpm \\
                heartbeat-pils-2.1.4-10.el5.i386.rpm \\
                heartbeat-stonith-2.1.4-10.el5.i386.rpm \\
                libnet-1.1.4-3.el5.i386.rpm \\
                perl-MailTools-1.77-1.el5.noarch.rpm --nogpgcheck
安装完后:hb_gui图形进行群集配置
2:crm(pacemaker提供的一种shell)
①显示当前的配置信息 crm configure show
②进行配置文件的语法检测 crm_verify  -L
[root@node1 corosync]# crm_verify  -L
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[878]: 2013/06/14_17:29:33 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
可以看到有stonith错误,在高可用的环境里面,会禁止使用任何支援
可以禁用stonith
方法:
[root@node1 corosync]# crm  //进入crm的shell模式下
crm(live)# configure        //进入全局配置模式
crm(live)configure# property stonith-enabled=false  //关闭stonith机制
crm(live)configure# commit    //提交保存配置信息
crm(live)configure# show      //显示当前配置
crm(live)configure# exit
再次进行语法检测:crm_verify  -L 就不会报错了.
③群集资源类型4种
[root@node1 corosync]# crm
crm(live)# configure 
crm(live)# help
primitive   本地主资源 (只能运行在一个节点上)
group       把多个资源轨道一个组里面,便于管理
clone       需要在多个节点上同时启用的  (如ocfs2  ,stonith ,没有主次之分)
master      有主次之分,如drbd
。。。。。
。。。。。
④用资源代理进行服务的配置
[root@node1 corosync]# crm
crm(live)# ra 
crm(live)# classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
⑤查看资源代理的脚本列表
[root@node1 corosync]# crm
crm(live)# ra
crm(live)ra# list lsb
NetworkManager      acpid               anacron             apmd
atd                 auditd              autofs              avahi-daemon
avahi-dnsconfd      bluetooth           capi                conman
corosync            cpuspeed            crond               cups
cups-config-daemon  dnsmasq             drbd                dund
firstboot           functions           gpm                 haldaemon
halt                heartbeat           hidd                hplip
httpd               ip6tables           ipmi                iptables
irda                irqbalance          iscsi               iscsid
isdn                kdump               killall             krb524
kudzu               lm_sensors          logd                lvm2-monitor
mcstrans            mdmonitor           mdmpd               messagebus
microcode_ctl       multipathd          netconsole          netfs
netplugd            network             nfs                 nfslock
nscd                ntpd                o2cb                ocfs2
openais             openibd             pacemaker           pand
pcscd               portmap             psacct              rawdevices
rdisc               readahead_early     readahead_later     restorecond
rhnsd               rpcgssd             rpcidmapd           rpcsvcgssd
saslauthd           sendmail            setroubleshoot      single
smartd              sshd                syslog              vncserver
wdaemon             winbind             wpa_supplicant      xfs
xinetd              ypbind              yum-updatesd     
查看ocf的heartbeat 
crm(live)ra# list ocf  heartbeat
⑥使用info或meta显示一个资源的详细信息
meta ocf:heartbeat:IPaddr
⑦配置资源(IP地址:vip-192.168.102.200 Web服务:httpd)
[root@node1 ~]# crm
crm(live)# configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr  params ip=192.168.2.100
crm(live)configure# show   //查看
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \\
params ip="192.168.102.200"
property $id="cib-bootstrap-options" \\
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\
cluster-infrastructure="openais" \\
expected-quorum-votes="2" \\
stonith-enabled="false"
crm(live)configure# commit  //提交 
crm(live)# status           //状态查询
============
Last updated: Mon May  7 19:39:37 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
webip(ocf::heartbeat:IPaddr):Started node1.a.com
可以看出该资源在node1上启动
使用ifconfig 在node1上进行查看
[root@node1 ~]# ifconfig
eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:25:D2:BC 
         inet addr:192.168.102.200  Bcast:192.168.102.255  Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         Interrupt:67 Base address:0x2000
定义httpd资源
在node1和node2上安装httpd服务,不需开机启动.
yum  install httpd
chkconfig httpd off
查看httpd服务的资源代理:lsb
[root@node1 corosync]# crm
crm(live)# ra
crm(live)ra# list lsb
查看httpd的参数
crm(live)ra# meta lsb:httpd
定义httpd的资源
crm(live)configure# primitive webserver lsb:httpd
crm(live)configure# show
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \\
params ip="192.168.102.200"
primitive webserver lsb:httpd
property $id="cib-bootstrap-options" \\
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\
cluster-infrastructure="openais" \\
expected-quorum-votes="2" \\
stonith-enabled="false"
crm(live)# status
============
Last updated: Mon May  7 20:01:12 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
webIP(ocf::heartbeat:IPaddr):Started node1.a.com
webserver(lsb:httpd):Started node2.a.com
发现httpd已经启动了,但是在node2节点上
(高级群集服务资源越来越多,会分布在不同的节点上,以尽量负载均衡)
需要约束在同一个节点上,定义成一个组
⑧定义一个资源组,将资源进行绑定
crm(live)# configure
crm(live)configure# help group
The `group` command creates a group of resources.
Usage:
...............
       group  [...]
         [meta attr_list]
         [params attr_list]
       attr_list :: [$id=] = [=...] | $id-ref=
...............
Example:
...............
       group internal_www disk0 fs0 internal_ip apache \\
         meta target_role=stopped
...............
定义组进行资源绑定
crm(live)configure# group web-res  webip  webserver
crm(live)configure# show
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \\
params ip="192.168.102.200"
primitive webserver lsb:httpd
group web-res webip webserver
property $id="cib-bootstrap-options" \\
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\
cluster-infrastructure="openais" \\
expected-quorum-votes="2" \\
stonith-enabled="false"
查看群集的状态
crm(live)# status
============
Last updated: Mon May  7 20:09:06 2013
Stack: openais
Current DC: node1.a.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node1.a.com node2.a.com ]
Resource Group: web-res
webIP(ocf::heartbeat:IPaddr):Started node1.a.com
webserver(lsb:httpd):Started node1.a.com
(现在ip地址和 httpd都已经在node1上了)
------------------------------------------------------------
Step4:进行节点间的切换测试.
---------------------------------------
node1:将corosync服务停掉,在节点node2上观察
service corosync stop
[root@node2 corosync]# crm  status
============
Last updated: Mon May  7 20:16:58 2013
Stack: openais
Current DC: node2.a.com - partition WITHOUT quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.a.com ]
OFFLINE: [ node1.a.com ]
可以看到:由于node2节点上没有票数,导致不能正常的资源切换.
解决方法:忽略仲裁磁盘选项.quorum
可选参数有:
ignore  (忽略)
freeze  (冻结,表示已经启用的资源继续实用,没有启用的资源不能启用)
stop    (默认)
suicide (所有的资源杀掉)
再node1上:
service corosync start
[root@node1 corosync]# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
crm(live)configure# show   (在次查看quorum 的属性)
node node1.a.com
node node2.a.com
primitive webip ocf:heartbeat:IPaddr \\
params ip="192.168.102.200"
primitive webserver lsb:httpd
group web-res webip webserver
property $id="cib-bootstrap-options" \\
dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \\
cluster-infrastructure="openais" \\
expected-quorum-votes="2" \\
stonith-enabled="false" \\
no-quorum-policy="ignore"   (已经关闭)
再次进行切换测试,资源轮转正常!
------------------------------------------------------
Step5:corosync的常见指令
--------------------------
①crm_attribute    修改集群的全局属性信息
②crm_resource     修改资源
③6crm_node        管理节点
crm_node   -e      查看节点的时代(配置文件修改过几次了)
crm_node   -q      显示当前节点的票数
1
④cibadmin         集群配置的工具
-u, --upgrade    Upgrade the configuration to the latest syntax
-Q, --query    Query the contents of the CIB
-E, --erase    Erase the contents of the whole CIB
-B, --bump    Increase the CIB's epoch value by 1
如果某一个资源定义错了,就可以实用该工具进行删除
-D, --delete    Delete the first object matching the supplied criteria, Eg.
也可以在crm的命令行下
crm(live)configure# delete
usage: delete  [...]
也可以在该模式下执行edit
执行完毕后,commit 提交
--------------------------------------------------------------------
Step6:ISCSI(IP-SAN)存储配置详情
------------------------------------------------
一:target(后方的存储介质)
①新添加一块磁盘(或分区)
fdisk -l
分区:fdisk /dev/sda(n--p--4--+2g-w)---添加一块磁盘sda6
更新分区表:(cat /proc/partitions)
partprobe  /dev/sda(不重启,更新分区表)
②安装target需要的软件包,启动服务.
cd /mnt/cdrom/ClusterStorage
rpm -ivh perl-Config-General-2.40-1.e15.noarchrpm
rpm -ivh scsi-target-utils-0.0-5.20080917snap.e15.i386.rpm
service tgtd start
③添加新的iscsi的target.
添加:tgtadm --lld iscsi --op new --mode target --tid=1 --targetname iqn.2013-06.com.a.target:disk
显示:tgtadm --lld iscsi --op show --mode target
存储:tgtadm --lld iscsi --op new --mode=logicalunit --tid=1  --lun=1 --backing-store /dev/sda4
            --lld [driver] --op new --mode=logicalunit --tid=[id] --lun=[lun] --backing-store [path]
验证:tgtadm --lld iscsi --op bind --mode=target --tid=1 --initiator-address=192.168.1.0/24
     tgtadm --lld [driver] --op bind --mode=target --tid=[id] --initiator-address=[address]
④将配置添加到配置文件中,可以开机自动加载.
vim /etc/tgt/targets.conf
       backing-store /dev/sda6
       initiator-address 192.168.1.0/24
二:initiator(node1和node2)
cd /mnt/cdrom/Server
rpm -ivh iscsi-initiator-utils-6.2.0.871-0.10.el5.i386.rpm
service iscsi start
发现:iscsiadm --mode discovery --type sendtargets --portal 192.168.1.10
认证登录:iscsiadm --mode node --targetname iqn.2013-06.com.a.target:disk --portal 192.168.1.10:3260 --login
⑤Target端显示在线的用户情况
tgt-admin -s
Target 1: iqn.2013-06.com.a.target:disk
   System information:
       Driver: iscsi
       State: ready
   I_T nexus information:
       I_T nexus: 1
           Initiator: iqn.2013-06.com.a.realserver2
           Connection: 0
               IP Address: 192.168.1.200
       I_T nexus: 2
           Initiator: iqn.2013-06.com.a.realserver1
           Connection: 0
               IP Address: 192.168.1.100
   LUN information:
       LUN: 0
           Type: controller
           SCSI ID: deadbeaf1:0
           SCSI SN: beaf10
           Size: 0 MB
           Online: Yes
           Removable media: No
           Backing store: No backing store
       LUN: 1
           Type: disk
           SCSI ID: deadbeaf1:1
           SCSI SN: beaf11
           Size: 4178 MB
           Online: Yes
           Removable media: No
           Backing store: /dev/sda6
   Account information:
   ACL information:
       192.168.1.0/24
⑥node1和node2上查看本地的磁盘列表。
fdisk -l
Disk /dev/sdb: 4178 MB, 4178409984 bytes
129 heads, 62 sectors/track, 1020 cylinders
Units = cylinders of 7998 * 512 = 4094976 bytes
Disk /dev/sdb doesn't contain a valid partition table
-------------------------------------------------------------
Step7:将新的磁盘sdb格式为OCFS2群集文件系统.
-------------------------------------------------------------
①在两个节点上安装需要的软件包
yum localinstall ocfs2-2.6.18-164.el5-1.4.7-1.el5.i686.rpm \\
                ocfs2-tools-1.4.4-1.el5.i386.rpm \\
                ocfs2console-1.4.4-1.el5.i386.rpm
②对主配置文件进行配置.
方法一:手动创建主配置文件
mkdir  /etc/ocfs2/
vim cluster.conf
node:
       ip_port = 7777
       ip_address = 192.168.102.101
       number = 0
       name = node1.a.com
       cluster = ocfs2
node:
       ip_port = 7777
       ip_address = 192.168.102.102
       number = 1
       name = node2.a.com
       cluster = ocfs2
cluster:
       node_count = 2
       name = ocfs2
进行节点间的配置同步.
scp -r /etc/ocfs2  node2:/etc/
方法二:GUI图形下进行配置
ocfs2console
③两个节点上分别加载o2cb模块,启动服务.
/etc/init.d/o2cb load
  Loading module "configfs":OK
  Mounting configfs filesystem at /config:OK
  Loading module "ocfs2_nodemanager":OK
  Loading module "ocfs2_dlm":OK
  Loading module "ocfs2_dlmfs":OK
/etc/init.d/ocfs2 start
chkconfig ocfs2 on
/etc/init.d/o2cb online ocfs2
/etc/init.d/o2cb configure
Configuring the O2CB driver.
这将配置 O2CB 驱动程序的引导属性。以下问题将决定在引导时是否加载驱动程序。当前值将在方括号(“[]”)中显示。按 而不键入答案将保留该当前值。Ctrl-C 将终止。
Load O2CB driver on boot (y/n) [n]:y
Cluster to start on boot (Enter "none" to clear) [ocfs2]:ocfs2
Writing O2CB configuration:OK
Loading module "configfs":OK
Mounting configfs filesystem at /config:OK
Loading module "ocfs2_nodemanager":OK
Loading module "ocfs2_dlm":OK
Loading module "ocfs2_dlmfs":OK
Mounting ocfs2_dlmfs filesystem at /dlm:OK
Starting cluster ocfs2:OK
/etc/init.d/o2cb status
  Driver for "configfs": Loaded
  Filesystem "configfs": Mounted
  Driver for "ocfs2_dlmfs": Loaded
  Filesystem "ocfs2_dlmfs": Mounted
  Checking O2CB cluster ocfs2: Online
  Heartbeat dead threshold = 31
   Network idle timeout: 30000
   Network keepalive delay: 2000
   Network reconnect delay: 2000
  Checking O2CB heartbeat: Active
④node1上格式化OCFS2文件系统
mkfs -t ocfs2 /dev/sdb 
⑤两个节点上分别挂载
mount -t ocfs2  /dev/sdb  /var/www/html
mount
/dev/sdb on /var/www/html type ocfs2 (rw,_netdev,heartbeat=local)
cd /var/www/html
echo "Welcome" >index.html
⑥两个节点上进行开机自动挂载
vim  /etc/fstab
/dev/sdb                /var/www/html           ocfs2   defaults        0 0
-------------------------------------------------------------------
Step8:访问测试
--------------------
http://192.168.102.200
Welcome