redhat AS 4用RHCS做HA,斷掉第一台機器網線,服務不能切換!
兩台機器系統Redhat AS 4 U4
集群軟體 RHCS
兩台機器相關的配置如下:
# more /etc/hosts 兩台機器一樣的內容
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost
192.168.0.201 vm001
192.168.0.202 vm002
兩台機器正常啟動之後
#clustat -i 3
Member Status: Quorate
Member Name Status
------ ---- ------
vm001 Online, rgmanager
vm002 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
ftpservice vm001 started
但是我斷掉第一根網線之後,等了1分鐘之後,出現
#clustat -i 3
Member Status: Quorate
Member Name Status
------ ---- ------
vm001 Offline
vm002 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
ftpservice unknown started
我的集群配置文件是:
# more /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="zcbcluster" config_version="33" name="alpha_cluster">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="vm001" votes="1">
<fence>
<method name="1">
<device name="clusterfence" nodename="vm001"/>
</method>
</fence>
</clusternode>
<clusternode name="vm002" votes="1">
<fence>
<method name="1">
<device name="clusterfence" nodename="vm002"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_manual" name="clusterfence"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="ftp-domain" ordered="1" restricted="1">
<failoverdomainnode name="vm001" priority="1"/>
<failoverdomainnode name="vm002" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="192.168.0.203" monitor_link="1"/>
<script file="/etc/rc.d/init.d/vsftpdHA.sh" name="ftpHA"/>
<fs device="/dev/sdb1" force_fsck="0" force_unmount="1" fsid="61663" fstype="ext3" mountpoint="/ftp" name="f
tpcontent" options="rw" self_fence="0"/>
</resources>
<service autostart="1" domain="ftp-domain" name="ftpservice" recovery="relocate">
<ip ref="192.168.0.203">
<fs ref="ftpcontent"/>
<script ref="ftpHA"/>
</ip>
</service>
</rm>
</cluster>
請問有什麼辦法,解決網線斷了,在備機起服務?(首先我這兩台機器服務可以相互切換)
《解決方案》
學習
我這裡也正準備弄個集群,可不可以把安裝文檔共享一下???
非常感謝!
《解決方案》
rh-cs-en-4.pdf,網上有的下載的!
《解決方案》
vsftpdHA.sh =
/var/log/messages =
ifconfig -a =
另外從4.4開始又加回了3里的仲裁分區
《解決方案》
我也做的兩個機器,我把其中一個機器的rgmanager停掉,整個cluster都down掉了,感覺這玩意可以對其上的服務提供高可用性,其本身沒有啥高可用性
《解決方案》
我想知道樓主配置的雙機的硬體設備都有什麼??
兩台主機+共享磁碟+powerswitch+雙機軟體?
《解決方案》
裝的系統是Redhat AS 4
雙機軟體是rhel-4-u4-rhcs-i386-disc1.iso
沒有powerswitch,機器是雙網卡,第二個網卡做心跳。
共享磁碟是dell cx300
就是弄不清楚,RHCS為什麼不能實現網線斷的切換,一直很納悶,為什麼會這樣了?
《解決方案》
仔細看一下/var/log/messages文件,你會找到答案的
《解決方案》
正常的情況
# clustat -i 3
Member Status: Quorate
Member Name Status
------ ---- ------
vm001 Online, rgmanager
vm002 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
ftpservice vm001 started
斷掉第一塊網卡的連接
# clustat -i 3
Member Status: Quorate
Member Name Status
------ ---- ------
vm001 Offline
vm002 Online, Local, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
ftpservice unknown started
一直是這樣,切換不過去!
看日誌:
# tail -30 /var/log/messages
Dec 28 07:28:50 vm002 gpm: gpm startup succeeded
Dec 28 07:28:50 vm002 iiim: htt startup succeeded
Dec 28 07:28:50 vm002 crond: crond startup succeeded
Dec 28 07:28:50 vm002 htt_server: started.
Dec 28 07:28:52 vm002 xfs: xfs startup succeeded
Dec 28 07:28:52 vm002 anacron: anacron startup succeeded
Dec 28 07:28:52 vm002 atd: atd startup succeeded
Dec 28 07:28:53 vm002 messagebus: messagebus startup succeeded
Dec 28 07:28:53 vm002 cups-config-daemon: cups-config-daemon startup succeeded
Dec 28 07:28:53 vm002 haldaemon: haldaemon startup succeeded
Dec 28 07:28:53 vm002 rgmanager: clurgmgrd startup succeeded
Dec 28 07:28:53 vm002 fstab-sync: removed all generated mount points
Dec 28 07:28:54 vm002 clurgmgrd: <notice> Resource Group Manager Starting
Dec 28 07:28:54 vm002 clurgmgrd: <info> Loading Service Data
Dec 28 07:28:59 vm002 clurgmgrd: <info> Initializing Services
Dec 28 07:29:00 vm002 clurgmgrd: : <info> /dev/sdb1 is not mounted
Dec 28 07:29:00 vm002 fstab-sync: added mount point /media/cdrecorder for /dev/hdc
Dec 28 07:29:01 vm002 fstab-sync: added mount point /media/floppy for /dev/fd0
Dec 28 07:29:05 vm002 clurgmgrd: : <info> Executing /etc/rc.d/init.d/vsftpdHA.sh stop
Dec 28 07:29:05 vm002 vsftpdHA.sh: vsftpd shutdown failed
Dec 28 07:29:05 vm002 clurgmgrd: <info> Services Initialized
Dec 28 07:29:07 vm002 clurgmgrd: <info> Logged in SG "usrm::manager"
Dec 28 07:29:07 vm002 clurgmgrd: <info> Magma Event: Membership Change
Dec 28 07:29:07 vm002 clurgmgrd: <info> State change: Local UP
Dec 28 07:29:07 vm002 clurgmgrd: <info> State change: vm001 UP
Dec 28 07:33:02 vm002 sshd(pam_unix): session opened for user root by root(uid=0)
Dec 28 07:34:48 vm002 kernel: CMAN: removing node vm001 from the cluster : Missed too many heartbeats
Dec 28 07:34:48 vm002 fenced: vm001 not a cluster member after 0 sec post_fail_delay
Dec 28 07:34:48 vm002 fenced: fencing node "vm001"
Dec 28 07:34:48 vm002 fence_manual: Node vm001 needs to be reset before recovery can procede. Waiting for vm001 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n vm001)
《解決方案》
Dec 28 07:34:48 vm002 kernel: CMAN: removing node vm001 from the cluster : Missed too many heartbeats
Dec 28 07:34:48 vm002 fenced: vm001 not a cluster member after 0 sec post_fail_delay
Dec 28 07:34:48 vm002 fenced: fencing node "vm001"
Dec 28 07:34:48 vm002 fence_manual: Node vm001 needs to be reset before recovery can procede. Waiting for vm001 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n vm001)
仔細看看紅色的部分吧