歡迎您光臨本站 註冊首頁

RHEL5下搭建簡單的雙節點Apache集群

←手機掃碼閱讀     火星人 @ 2014-03-04 , reply:0

RHEL5下搭建簡單的雙節點Apache集群

RHEL5下搭建簡單的雙節點Apache集群


實驗環境
宿主機:Red Hat Enterprise Linux 5 update 4 xen 內核
虛擬機:Red Hat Enterprise Linux 5 update 4 兩台
(都沒有開啟 iptables 和 SELinux)
# xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      482     2 r-----   2883.5
node1                                     24      127     1 -b----     23.5
node2                                     22      127     1 -b----     52.0

實驗環境圖:








配置基本環境
Dom-0(fenced.jaylin.org)和兩台虛擬機(node1.jaylin.org 和 node2.jaylin.org )的 /etc/hosts 文件一致,如下:
# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1                localhost.localdomain localhost
::1                localhost6.localdomain6 localhost6

192.168.10.1         fenced.jaylin.org fenced
192.168.10.100        node1.jaylin.org node1
192.168.10.101        node2.jaylin.org node2

配置 Dom-0 的網路環境
修改 /etc/xen/xend-config.sxp 文件:
將原來的:
(network-script network-bridge)
改為:
(network-script network-custom)
也就是說在啟動網卡的時候指定啟動 network-custom 腳本著哦功能定義的內容,這個文件的位置,許可權和內容如下:
# ll /etc/xen/scripts/network-custom
-rwxr-xr-x 1 root root 256 Mar 12 00:08 /etc/xen/scripts/network-custom

# cat /etc/xen/scripts/network-custom
#!/bin/bash
. /etc/xen/scripts/network-bridge
brctl addr bridge1
ifconfig bridge1 up
ifconfig bridge1 192.168.10.1

為了能夠讓這個網卡在 virt-manager 中被指定和橋接,還要做如下的更改:
# ll /etc/libvirt/qemu/networks/ -R
/etc/libvirt/qemu/networks/:
total 28
drwx------ 2 root root 4096 Mar 12 00:00 autostart
-rw-r--r-- 1 root root  200 Mar 12 00:04 bridge1.xml
-rw-r--r-- 1 root root  282 Mar  5 18:01 default.xml

/etc/libvirt/qemu/networks/autostart:
total 4
lrwxrwxrwx 1 root root 14 Mar 12 00:00 bridge1.xml -> ../bridge1.xml
lrwxrwxrwx 1 root root 14 Mar  5 18:01 default.xml -> ../default.xml

# cat /etc/libvirt/qemu/networks/bridge1.xml
<network>
  <name>bridge1</name>
  <uuid>f07378c8-7918-45b8-9c19-2d622744e671</uuid>
  <bridge name="bridge1" />
  <forward />
  <ip address="192.168.10.1" netmask="255.255.255.0">
  </ip>
</network>

默認的 default.xml 文件的內容不變:
# cat /etc/libvirt/qemu/networks/default.xml
<network>
  <name>default</name>
  <uuid>6c7870a1-695d-4873-a113-ee366549aae7</uuid>
  <bridge name="virbr0" />
  <forward/>
  <ip address="192.168.122.1" netmask="255.255.255.0">
    <dhcp>
      <range start="192.168.122.2" end="192.168.122.254" />
    </dhcp>
  </ip>
</network>

之後重新啟動 libvirtd 和 network,
bridge1   Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0
          inet6 addr: fe80::200:ff:fe00:0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:69567 errors:0 dropped:0 overruns:0 frame:0
          TX packets:76618 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:58728469 (56.0 MiB)  TX bytes:38353285 (36.5 MiB)

eth0      Link encap:Ethernet  HWaddr 00:14:22:36:EE:A8  
          inet addr:10.66.129.40  Bcast:10.66.129.255  Mask:255.255.254.0
          inet6 addr: fe80::214:22ff:fe36:eea8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:300 errors:0 dropped:0 overruns:0 frame:0
          TX packets:262 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:25048 (24.4 KiB)  TX bytes:45629 (44.5 KiB)
        ......
peth0     Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          inet6 addr: fe80::fcff:ffff:feff:ffff/64 Scope:Link
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:37448 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9521 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:5514109 (5.2 MiB)  TX bytes:1383158 (1.3 MiB)
          Interrupt:16 Memory:fe8f0000-fe900000

xenbr0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:20015 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1513384 (1.4 MiB)  TX bytes:0 (0.0 b)

兩個虛擬機都橋接到 bridge1 上。
(兩個虛擬機都安裝了 Clustering 和 Cluster Storage 包組)
# cat /etc/xen/node1
name = "node1"
uuid = "e8d9d264-8c7d-e66c-8492-ddc3ce93e2c8"
maxmem = 128
memory = 128
vcpus = 1
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/home/XEN/RHEL5u4_node1.img,xvda,w" ]
vif = [ "mac=00:16:36:57:d0:1d,bridge=bridge1" ]

# cat /etc/xen/node2
name = "node2"
uuid = "35bf49db-7d27-fd6e-ee1c-6009c44cdcf4"
maxmem = 128
memory = 128
vcpus = 1
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "tap:aio:/home/XEN/RHEL5u4_node2.img,xvda,w" ]
vif = [ "mac=00:16:36:49:18:7d,bridge=bridge1" ]


配置集群
Dom-0 的配置
在 Dom-0 上安裝 Clustering 和 Cluster Storage 兩個包組。
然後在 Dom-0 上安裝 iscsi 服務組件:
# yum install scsi-target-utils -y

創建用以共享存儲的設備
# dd if=/dev/zero of=sharedevice bs=1G count=1

配置共享設備
# chkconfig --level 35 tgtd on
# service tgtd start
# tgt-setup-lun -d /scsi/sharedevice -n sharedevice 192.168.122.100 192.168.122.101
# tgt-admin --dump > /etc/tgt/targets.conf
# cat /etc/tgt/targets.conf
default-driver iscsi

<target iqn.2001-04.com.dhcp-129-220-sharedevice>
        backing-store /scsi/sharedevice
        initiator-address 192.168.10.100
        initiator-address 192.168.10.101
</target>

創建 fence 密鑰
# dd if=/dev/urandom of=/etc/cluster/fence_xvm.key bs=1 count=4096
將密鑰傳給兩個節點虛擬機,
# scp /etc/cluster/fence_xvm.key 192.168.122.100:/etc/cluster/
# scp /etc/cluster/fence_xvm.key 192.168.122.101:/etc/cluster/

修改 fence_xvmd 啟動參數,編輯 /etc/init.d/cman 腳本文件,
在 . /etc/init.d/functions 一行的下面添加下面的一行,
FENCE_XVMD_OPTS="-I bridge1"
(注意是「i」的大寫字母,不是「L」的小寫字母)
其中 bridge1 是集群節點連接的橋。

添加多播路由(在 RHEL5 上貌似可以不做)
創建 /etc/sysconfig/static-routes 文件,內容如下:
any net 224.0.0.0 netmask 240.0.0.0 gw 192.168.10.1

重新啟動網路使路由生效,
# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 bridge1
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0
10.66.128.0     0.0.0.0         255.255.254.0   U     0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     0      0        0 eth0
224.0.0.0       192.168.10.1    240.0.0.0       UG    0      0        0 bridge1
0.0.0.0         10.66.129.254   0.0.0.0         UG    0      0        0 eth0

在 Dom-0 上配置 fence 集群,
# system-config-cluster
集群名字為 fenced
點擊左側的「Cluster」,再點擊右側的「Edit Cluster Properties」,勾選「Run XVM Daemon」。
點擊左側的「Cluster」,再點擊「Add a Cluster Node」,創建集群節點,節點名為 fenced。
Ctrl+s 保存配置文件,啟動集群服務。
# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="fenced" config_version="2" name="fenced">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="fenced" nodeid="1" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman/>
        <fencedevices/>
        <rm>
                <failoverdomains/>
                <resources/>
        </rm>
        <fence_xvmd/>
</cluster>

# chkconfig cman on
# /etc/init.d/cman start
Starting cluster:
   Enabling workaround for Xend bridged networking... done
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
   Starting virtual machine fencing host... done
                                                           [  OK  ]

在 Dom-0上測試 fence 功能
# fence_xvm -H node1
Remote: Operation was successful
名為 node1 的虛擬機就被斷電關機了,之後又重新啟動。
(注意:node1 是虛擬機的名字,即 xm list 看到的,並不是節點的主機名)

到此 Dom-0 上的配置就完畢了。


兩個節點的配置(node1 和 node2)
在一個節點,例如 node1 上操作,創建集群,
# system-config-cluster
(1) 集群名字為 haclusterabc;
(2) 點擊「Cluster—>Cluster Nodes—>Add a Cluster Node」,分別創建 node1 和 node2 兩個集群節點;
(3) 點擊「Cluster—>Fence Device—>Add a Fence Deviec」,添加 Fence 設備,選擇「Virtual Machine Fencing」,名字隨便,這裡是 fence_xvmd;
(4) 分別點擊「Cluster—>Cluster Nodes」里的兩個節點 node1 和 node2,點擊「Manage Fencing For This Node」,指定 Fence 設備。首先點擊「Add a New Fence to this Level」,添加一級 Fence,然後點擊「Fence-Level-1」,再點擊「Add a New Fence to this Level」,為一級 Fence 指定 Fence 設備,這裡就指定之前創建的 fence_xvmd 設備。「Domain」填寫對應的虛擬機的名字(不是主機名);
(5) 點擊「Cluster—>Managed Resources—>Resource—>Create a Resource」,添加一個虛擬 IP 資源和一個服務腳本資源。「IP Address」 為 192.168.10.10,此 IP 是供客戶端訪問集群服務時使用的虛擬 IP。「Script」的名字是 apache(隨便),路徑是 /etc/init.d/httpd;
(6) 點擊「Cluster—>Managed Resources—>Services—>Create a Service」,添加 apache 服務。
進入 Service Management 界面:
服務啟動方式設置為「Autostart This Service」,修復方式設置為「Relocate」;
點擊「Add a Shared Resource to this service」,指定剛才添加的資源為該服務使用,即添加 IP 地址和腳本;
(7) 保存文件退出。
# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="16" name="haclusterabc">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="node1" name="fence_xvmd"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device domain="node2" name="fence_xvmd"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_xvm" name="fence_xvmd"/>
        </fencedevices>
        <rm>
                <failoverdomains/>
                <resources>
                        <ip address="192.168.10.10" monitor_link="1"/>
                        <script file="/etc/init.d/httpd" name="apache"/>
                </resources>
                <service autostart="1" name="apache" recovery="relocate">
                        <ip ref="192.168.10.10"/>
                        <script ref="apache"/>
                </service>
        </rm>
</cluster>
將 cluster.conf 文件 scp 給 node2,
# scp /etc/cluster/cluster.conf node2:/etc/cluster

在 node1 上添加 iscsi 共享設備
# yum install iscsi-initiator-utils -y
# chkconfig iscsi on
# service iscsi start
# iscsiadm --mode discovery --type sendtargets --portal 192.168.10.1
192.168.10.1:3260,1 iqn.2001-04.com.dhcp-129-220-sharedevice
# iscsiadm --mode node --targetname iqn.2001-04.com.dhcp-129-220-sharedevice --portal 192.168.10.1:3260 --login
# fdisk -l
可以看到多了一個 /dev/sda 設備(虛擬機都是 /dev/xvda 設備)。

對 /dev/sda 分區並創建 GFS 文件系統
# yum install kmod-gfs-xen -y
分區之後,
# partprobe
# fdisk -l /dev/sda

Disk /dev/sda: 1073 MB, 1073741824 bytes
34 heads, 61 sectors/track, 1011 cylinders
Units = cylinders of 2074 * 512 = 1061888 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1        1011     1048376+  83  Linux

格式化為 GFS 文件系統,
# gfs_mkfs -j 2 -p lock_dlm -t haclusterabc:sharedevice /dev/sda1
This will destroy any data on /dev/sda1.
  It appears to contain a gfs filesystem.

Are you sure you want to proceed? y

Device:                    /dev/sda1
Blocksize:                 4096
Filesystem Size:           196504
Journals:                  2
Resource Groups:           8
Locking Protocol:          lock_dlm
Lock Table:                haclusterabc:sharedevice

Syncing...
All Done
注意:GFS 的創建只在一個節點上操作就可以了。

設置開機自動掛載共享存儲
# mkdir /share
# chkconfig gfs on
在 /etc/fstab 文件添加一行:
/dev/sda1                /share                        gfs        _netdev                0 0
這三步在 node2 上也相同。

在兩個節點上都安裝 httpd 包。
分別修改兩個節點上的 /etc/httpd/conf/httpd.conf 文件,將根目錄改為 /share 目錄:
DocumentRoot "/share"

也就是說,node1 和 node2 中 /etc/cluster/cluster.conf,/etc/fstab 和 /etc/httpd/conf/httpd.conf 文件是一樣的。

確保兩個節點的 gfs,iscsi,cman,rgmanager 服務是開機自動啟動的,acpid 服務是開機不啟動的。

分別在兩個節點上執行下面的命令,
# service gfs start
掛載 GFS 共享設備。
/dev/sda1 on /share type gfs (rw,hostdata=jid=1:id=131074:first=0)
在 /share 目錄下,創建一個文件 index.html,
# cat /share/index.html
This is a test page~
# service gfs stop
卸載 GFS 共享設備。
《解決方案》

測試集群
在兩個節點上同時啟動 cman 服務,
# /etc/init.d/cman start
Starting cluster:
   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... done
                                                           [  OK  ]

這裡 node2 節點先啟動的,其相關日誌如下:
Mar 31 19:23:13 node2 ccsd: Starting ccsd 2.0.115:
Mar 31 19:23:13 node2 ccsd:  Built: Aug  5 2009 08:24:44
Mar 31 19:23:13 node2 ccsd:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Mar 31 19:23:13 node2 ccsd: cluster.conf (cluster name = haclusterabc, version = 16) found.
Mar 31 19:23:13 node2 ccsd: Remote copy of cluster.conf is from quorate node.
Mar 31 19:23:13 node2 ccsd:  Local version # : 16
Mar 31 19:23:13 node2 ccsd:  Remote version #: 16
Mar 31 19:23:14 node2 ccsd: Remote copy of cluster.conf is from quorate node.
Mar 31 19:23:14 node2 ccsd:  Local version # : 16
Mar 31 19:23:14 node2 ccsd:  Remote version #: 16
Mar 31 19:23:14 node2 ccsd: Remote copy of cluster.conf is from quorate node.
Mar 31 19:23:14 node2 ccsd:  Local version # : 16
Mar 31 19:23:14 node2 ccsd:  Remote version #: 16
Mar 31 19:23:14 node2 ccsd: Remote copy of cluster.conf is from quorate node.
Mar 31 19:23:14 node2 ccsd:  Local version # : 16
Mar 31 19:23:14 node2 ccsd:  Remote version #: 16
Mar 31 19:23:14 node2 openais: AIS Executive Service RELEASE 'subrev 1887 version 0.80.6'
Mar 31 19:23:14 node2 openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Mar 31 19:23:14 node2 openais: Copyright (C) 2006 Red Hat, Inc.
Mar 31 19:23:14 node2 openais: AIS Executive Service: started and ready to provide service.
Mar 31 19:23:14 node2 openais: Using default multicast address of 239.192.104.179
Mar 31 19:23:14 node2 openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Mar 31 19:23:14 node2 openais: token hold (386 ms) retransmits before loss (20 retrans)
Mar 31 19:23:14 node2 openais: join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Mar 31 19:23:14 node2 openais: downcheck (1000 ms) fail to recv const (50 msgs)
Mar 31 19:23:14 node2 openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Mar 31 19:23:14 node2 openais: window size per rotation (50 messages) maximum messages per rotation (17 messages)
Mar 31 19:23:14 node2 openais: send threads (0 threads)
Mar 31 19:23:14 node2 openais: RRP token expired timeout (495 ms)
Mar 31 19:23:14 node2 openais: RRP token problem counter (2000 ms)
Mar 31 19:23:14 node2 openais: RRP threshold (10 problem count)
Mar 31 19:23:14 node2 openais: RRP mode set to none.
Mar 31 19:23:14 node2 openais: heartbeat_failures_allowed (0)
Mar 31 19:23:14 node2 openais: max_network_delay (50 ms)
Mar 31 19:23:14 node2 openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Mar 31 19:23:14 node2 openais: Receive multicast socket recv buffer size (288000 bytes).
Mar 31 19:23:14 node2 openais: Transmit multicast socket send buffer size (221184 bytes).
Mar 31 19:23:14 node2 openais: The network interface is now up.
Mar 31 19:23:14 node2 openais: Created or loaded sequence id 188.192.168.10.101 for this ring.
Mar 31 19:23:14 node2 openais: entering GATHER state from 15.
Mar 31 19:23:14 node2 openais: CMAN 2.0.115 (built Aug  5 2009 08:24:48) started
Mar 31 19:23:14 node2 openais: Service initialized 'openais CMAN membership service 2.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais extended virtual synchrony service'
Mar 31 19:23:14 node2 openais: Service initialized 'openais cluster membership service B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais availability management framework B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais checkpoint service B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais event service B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais distributed locking service B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais message service B.01.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais configuration service'
Mar 31 19:23:14 node2 openais: Service initialized 'openais cluster closed process group service v1.01'
Mar 31 19:23:14 node2 openais: Service initialized 'openais cluster config database access v1.01'
Mar 31 19:23:14 node2 openais: Not using a virtual synchrony filter.
Mar 31 19:23:14 node2 openais: Creating commit token because I am the rep.
Mar 31 19:23:14 node2 openais: Saving state aru 0 high seq received 0
Mar 31 19:23:14 node2 openais: Storing new sequence id for ring c0
Mar 31 19:23:14 node2 openais: entering COMMIT state.
Mar 31 19:23:14 node2 openais: entering RECOVERY state.
Mar 31 19:23:14 node2 openais: position member 192.168.10.101:
Mar 31 19:23:14 node2 openais: previous ring seq 188 rep 192.168.10.101
Mar 31 19:23:14 node2 openais: aru 0 high delivered 0 received flag 1
Mar 31 19:23:14 node2 openais: Did not need to originate any messages in recovery.
Mar 31 19:23:14 node2 openais: Sending initial ORF token
Mar 31 19:23:14 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node2 openais: New Configuration:
Mar 31 19:23:14 node2 openais: Members Left:
Mar 31 19:23:14 node2 openais: Members Joined:
Mar 31 19:23:14 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node2 openais: New Configuration:
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node2 openais: Members Left:
Mar 31 19:23:14 node2 openais: Members Joined:
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node2 openais: This node is within the primary component and will provide service.
Mar 31 19:23:14 node2 openais: entering OPERATIONAL state.
Mar 31 19:23:14 node2 openais: quorum regained, resuming activity
Mar 31 19:23:14 node2 openais: got nodejoin message 192.168.10.101
Mar 31 19:23:14 node2 openais: entering GATHER state from 11.
Mar 31 19:23:14 node2 openais: Saving state aru a high seq received a
Mar 31 19:23:14 node2 openais: Storing new sequence id for ring c4
Mar 31 19:23:14 node2 openais: entering COMMIT state.
Mar 31 19:23:14 node2 openais: entering RECOVERY state.
Mar 31 19:23:14 node2 openais: position member 192.168.10.100:
Mar 31 19:23:14 node2 openais: previous ring seq 192 rep 192.168.10.100
Mar 31 19:23:14 node2 openais: aru 19 high delivered 19 received flag 1
Mar 31 19:23:14 node2 openais: position member 192.168.10.101:
Mar 31 19:23:14 node2 openais: previous ring seq 192 rep 192.168.10.101
Mar 31 19:23:14 node2 openais: aru a high delivered a received flag 1
Mar 31 19:23:14 node2 openais: Did not need to originate any messages in recovery.
Mar 31 19:23:14 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node2 openais: New Configuration:
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node2 openais: Members Left:
Mar 31 19:23:14 node2 openais: Members Joined:
Mar 31 19:23:14 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node2 openais: New Configuration:
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node2 openais: Members Left:
Mar 31 19:23:14 node2 openais: Members Joined:
Mar 31 19:23:14 node2 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:23:14 node2 openais: This node is within the primary component and will provide service.
Mar 31 19:23:14 node2 openais: entering OPERATIONAL state.
Mar 31 19:23:14 node2 openais: got nodejoin message 192.168.10.100
Mar 31 19:23:14 node2 openais: got nodejoin message 192.168.10.101
Mar 31 19:23:14 node2 openais: got joinlist message from node 1
Mar 31 19:23:14 node2 ccsd: Initial status:: Quorate

后啟動的 node1 節點,相關日誌如下:
Mar 31 19:23:14 node1 openais: entering GATHER state from 11.
Mar 31 19:23:14 node1 openais: Creating commit token because I am the rep.
Mar 31 19:23:14 node1 openais: Saving state aru 19 high seq received 19
Mar 31 19:23:14 node1 openais: Storing new sequence id for ring c4
Mar 31 19:23:14 node1 openais: entering COMMIT state.
Mar 31 19:23:14 node1 openais: entering RECOVERY state.
Mar 31 19:23:14 node1 openais: position member 192.168.10.100:
Mar 31 19:23:14 node1 openais: previous ring seq 192 rep 192.168.10.100
Mar 31 19:23:14 node1 openais: aru 19 high delivered 19 received flag 1
Mar 31 19:23:14 node1 openais: position member 192.168.10.101:
Mar 31 19:23:14 node1 openais: previous ring seq 192 rep 192.168.10.101
Mar 31 19:23:14 node1 openais: aru a high delivered a received flag 1
Mar 31 19:23:14 node1 openais: Did not need to originate any messages in recovery.
Mar 31 19:23:14 node1 openais: Sending initial ORF token
Mar 31 19:23:14 node1 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node1 openais: New Configuration:
Mar 31 19:23:14 node1 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:23:14 node1 openais: Members Left:
Mar 31 19:23:14 node1 openais: Members Joined:
Mar 31 19:23:14 node1 openais: CLM CONFIGURATION CHANGE
Mar 31 19:23:14 node1 openais: New Configuration:
Mar 31 19:23:14 node1 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:23:14 node1 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node1 openais: Members Left:
Mar 31 19:23:14 node1 openais: Members Joined:
Mar 31 19:23:14 node1 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:23:14 node1 openais: This node is within the primary component and will provide service.
Mar 31 19:23:14 node1 openais: entering OPERATIONAL state.
Mar 31 19:23:14 node1 openais: got nodejoin message 192.168.10.100
Mar 31 19:23:14 node1 openais: got nodejoin message 192.168.10.101
Mar 31 19:23:14 node1 openais: got joinlist message from node 1


兩個節點執行 mount -a 命令掛載 GFS 設備,
先掛載的是 node2 節點,其相關日誌如下:
Mar 31 19:25:34 node2 kernel: GFS 0.1.34-2.el5 (built Jul 23 2009 12:49:48) installed
Mar 31 19:25:34 node2 kernel: Lock_DLM (built Jul 23 2009 12:49:45) installed
Mar 31 19:25:34 node2 kernel: Lock_Nolock (built Jul 23 2009 12:49:44) installed
Mar 31 19:25:34 node2 kernel: Trying to join cluster "lock_dlm", "haclusterabc:sharedevice"
Mar 31 19:25:34 node2 kernel: dlm: Using TCP for communications
Mar 31 19:25:34 node2 kernel: dlm: got connection from 1
Mar 31 19:25:34 node2 kernel: Joined cluster. Now mounting FS...
Mar 31 19:25:34 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=1: Trying to acquire journal lock...
Mar 31 19:25:34 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=1: Looking at journal...
Mar 31 19:25:34 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=1: Done

后掛載的 node1 節點的相關日誌如下:
Mar 31 19:25:20 node1 kernel: GFS 0.1.34-2.el5 (built Jul 23 2009 12:49:48) installed
Mar 31 19:25:20 node1 kernel: Lock_DLM (built Jul 23 2009 12:49:45) installed
Mar 31 19:25:20 node1 kernel: Lock_Nolock (built Jul 23 2009 12:49:44) installed
Mar 31 19:25:20 node1 kernel: Trying to join cluster "lock_dlm", "haclusterabc:sharedevice"
Mar 31 19:25:20 node1 kernel: dlm: Using TCP for communications
Mar 31 19:25:20 node1 kernel: Joined cluster. Now mounting FS...
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=0: Trying to acquire journal lock...
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=0: Looking at journal...
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=0: Done
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=1: Trying to acquire journal lock...
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=1: Looking at journal...
Mar 31 19:25:20 node1 kernel: GFS: fsid=haclusterabc:sharedevice.0: jid=1: Done
Mar 31 19:25:34 node1 kernel: dlm: connecting to 2


啟動 rgmanager 服務,兩個節點上都顯示當前提供服務的是 node2 節點:
# clustat
Cluster Status for haclusterabc @ Wed Mar 31 19:30:32 2010
Member Status: Quorate

Member Name                             ID   Status
------ ----                             ---- ------
node1                                       1 Online
node2                                       2 Online, Local, rgmanager

Service Name                   Owner (Last)                   State         
------- ----                   ----- ------                   -----         
service:apache                 node2                          started  

# clustat
Cluster Status for haclusterabc @ Wed Mar 31 19:31:09 2010
Member Status: Quorate

Member Name                             ID   Status
------ ----                             ---- ------
node1                                       1 Online, Local, rgmanager
node2                                       2 Online, rgmanager

Service Name                   Owner (Last)                   State         
------- ----                   ----- ------                   -----         
service:apache                 node2                          started     


從 Dom-0 上測試,可以看到之前創建的頁面:
# elinks --dump http://192.168.10.10
This is a test page~

在 node2 上的 /var/log/httpd/access_log 中有如下記錄,
# tail /var/log/httpd/access_log
192.168.10.1 - - "GET / HTTP/1.1" 200 21 "-" "ELinks/0.11.1 (textmode; Linux; -)"
而 node1 上沒有記錄。
說明 Dom-0 是從 node2 節點上獲得頁面的。
《解決方案》

在 node1 上執行下面的命令 fence 掉 node2
# fence_node node2
node2 斷電重啟了。
過一段時間 Dom-0 又可以訪問 Apache 頁面了,
# elinks --dump http://192.168.10.10
   This is a test page~
這次在 node1 的日誌中有記錄,
# tail /var/log/httpd/access_log
192.168.10.1 - - "GET / HTTP/1.1" 200 21 "-" "ELinks/0.11.1 (textmode; Linux; -)"
說明這次是從 node1 節點上獲得的頁面。

這時集群的服務已經由 node1 提供,
# clustat
Cluster Status for haclusterabc @ Wed Mar 31 19:38:59 2010
Member Status: Quorate

Member Name                          ID   Status
------ ----                          ---- ------
node1                                    1 Online, rgmanager
node2                                    2 Online, Local, rgmanager

Service Name                Owner (Last)                State         
------- ----                ----- ------                -----         
service:apache              node1                       started     


這次再讓 node2 節點 fence 掉 node1 節點,
# fence_node node1
我們觀察一下日誌:
Mar 31 19:40:42 node2 fence_node: Fence of "node1" was successful
Mar 31 19:40:50 node2 openais: The token was lost in the OPERATIONAL state.
Mar 31 19:40:50 node2 openais: Receive multicast socket recv buffer size (288000 bytes).
Mar 31 19:40:50 node2 openais: Transmit multicast socket send buffer size (221184 bytes).
Mar 31 19:40:50 node2 openais: entering GATHER state from 2.
Mar 31 19:40:54 node2 openais: entering GATHER state from 0.
Mar 31 19:40:54 node2 openais: Creating commit token because I am the rep.
Mar 31 19:40:54 node2 openais: Saving state aru 3e high seq received 3e
Mar 31 19:40:54 node2 openais: Storing new sequence id for ring d0
Mar 31 19:40:55 node2 openais: entering COMMIT state.
Mar 31 19:40:55 node2 openais: entering RECOVERY state.
Mar 31 19:40:55 node2 openais: position member 192.168.10.101:
Mar 31 19:40:55 node2 openais: previous ring seq 204 rep 192.168.10.100
Mar 31 19:40:55 node2 openais: aru 3e high delivered 3e received flag 1
Mar 31 19:40:55 node2 openais: Did not need to originate any messages in recovery.
Mar 31 19:40:55 node2 openais: Sending initial ORF token
Mar 31 19:40:55 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:40:55 node2 openais: New Configuration:
Mar 31 19:40:55 node2 fenced: node1 not a cluster member after 0 sec post_fail_delay
Mar 31 19:40:55 node2 kernel: dlm: closing connection to node 1
Mar 31 19:40:55 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:40:55 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=0: Trying to acquire journal lock...
Mar 31 19:40:55 node2 openais: Members Left:
Mar 31 19:40:55 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=0: Looking at journal...
Mar 31 19:40:55 node2 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:40:55 node2 kernel: GFS: fsid=haclusterabc:sharedevice.1: jid=0: Done
Mar 31 19:40:55 node2 openais: Members Joined:
Mar 31 19:40:55 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:40:55 node2 openais: New Configuration:
Mar 31 19:40:55 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:40:55 node2 openais: Members Left:
Mar 31 19:40:55 node2 openais: Members Joined:
Mar 31 19:40:55 node2 openais: This node is within the primary component and will provide service.
Mar 31 19:40:55 node2 openais: entering OPERATIONAL state.
Mar 31 19:40:55 node2 openais: got nodejoin message 192.168.10.101
Mar 31 19:40:55 node2 openais: got joinlist message from node 2
Mar 31 19:41:00 node2 clurgmgrd: <notice> Taking over service service:apache from down member node1
Mar 31 19:41:00 node2 avahi-daemon: Registering new address record for 192.168.10.10 on eth0.
Mar 31 19:41:02 node2 clurgmgrd: <notice> Service service:apache started
Mar 31 19:41:21 node2 openais: entering GATHER state from 11.
Mar 31 19:41:21 node2 openais: Saving state aru 24 high seq received 24
Mar 31 19:41:21 node2 openais: Storing new sequence id for ring d4
Mar 31 19:41:21 node2 openais: entering COMMIT state.
Mar 31 19:41:21 node2 openais: entering RECOVERY state.
Mar 31 19:41:21 node2 openais: position member 192.168.10.100:
Mar 31 19:41:21 node2 openais: previous ring seq 208 rep 192.168.10.100
Mar 31 19:41:21 node2 openais: aru a high delivered a received flag 1
Mar 31 19:41:21 node2 openais: position member 192.168.10.101:
Mar 31 19:41:21 node2 openais: previous ring seq 208 rep 192.168.10.101
Mar 31 19:41:21 node2 openais: aru 24 high delivered 24 received flag 1
Mar 31 19:41:21 node2 openais: Did not need to originate any messages in recovery.
Mar 31 19:41:21 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:41:21 node2 openais: New Configuration:
Mar 31 19:41:21 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:41:21 node2 openais: Members Left:
Mar 31 19:41:21 node2 openais: Members Joined:
Mar 31 19:41:21 node2 openais: CLM CONFIGURATION CHANGE
Mar 31 19:41:21 node2 openais: New Configuration:
Mar 31 19:41:21 node2 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:41:21 node2 openais:         r(0) ip(192.168.10.101)  
Mar 31 19:41:21 node2 openais: Members Left:
Mar 31 19:41:21 node2 openais: Members Joined:
Mar 31 19:41:21 node2 openais:         r(0) ip(192.168.10.100)  
Mar 31 19:41:21 node2 openais: This node is within the primary component and will provide service.
Mar 31 19:41:21 node2 openais: entering OPERATIONAL state.
Mar 31 19:41:21 node2 openais: got nodejoin message 192.168.10.100
Mar 31 19:41:21 node2 openais: got nodejoin message 192.168.10.101
Mar 31 19:41:21 node2 openais: got joinlist message from node 2
Mar 31 19:41:26 node2 kernel: dlm: connecting to 1


總結:
當通過 system-config-cluster 修改集群配置文件時,如果節點在集群中,可以直接通過右上角的按鈕「Send to Cluster」發送給其他所有的節點。如果修改了集群提供的服務,發送后 rgmanager 會立即使之生效。
實驗中實現了開機自動掛載 GFS 文件系統,即在 /etc/fstab 中添加了掛載項。執行「mount -a」 命令時會自動啟動 gfs 服務。同理,如果手動啟動 gfs 服務,那麼也會自動掛載 /etc/fstab 中 GFS 的記錄。反之,關閉 gfs 服務也會自動卸載 GFS 文件系統。
切記,iptables 和 SELinux 一定要關掉,否則兩個節點之間可能無法通信。
當一個節點被 fence 掉之後,大概會過十幾秒鐘的時間將服務切換到另一個節點上,如果通過「# clustat -i 1」命令觀察集群狀態的變化時,「Owner」會先切換到正常運行的節點上,同時「State」變成了「starting」,當狀態變成「started」時,服務切換完成了。
《解決方案》

本帖最後由 jordie 於 2010-04-23 14:08 編輯

很有參考價值,謝謝分享!

[火星人 ] RHEL5下搭建簡單的雙節點Apache集群已經有864次圍觀

http://coctec.com/docs/service/show-post-5719.html