歡迎您光臨本站 註冊首頁

請教RHCS的問題(REDHAT Enterprise 5)

←手機掃碼閱讀     火星人 @ 2014-03-04 , reply:0

請教RHCS的問題(REDHAT Enterprise 5)

我在我在其中一台機器(LINUX1)上起service cman start,在這之前兩台機器的cman都是沒起的
日誌刷出以下內容:
Jun 23 18:49:13 LINUX1 openais: CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 18:49:13 LINUX1 openais: Not using a virtual synchrony filter.
Jun 23 18:49:13 LINUX1 openais: Creating commit token because I am the rep.
Jun 23 18:49:13 LINUX1 openais: Saving state aru 0 high seq received 0
Jun 23 18:49:13 LINUX1 openais: entering COMMIT state.
Jun 23 18:49:13 LINUX1 openais: entering RECOVERY state.
Jun 23 18:49:13 LINUX1 openais: position member 192.168.10.31:
Jun 23 18:49:13 LINUX1 openais: previous ring seq 0 rep 192.168.10.31
Jun 23 18:49:13 LINUX1 openais: aru 0 high delivered 0 received flag 0
Jun 23 18:49:13 LINUX1 openais: Did not need to originate any messages in recovery.
Jun 23 18:49:13 LINUX1 openais: Storing new sequence id for ring 4
Jun 23 18:49:13 LINUX1 openais: Sending initial ORF token
Jun 23 18:49:13 LINUX1 openais: CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais: New Configuration:
Jun 23 18:49:13 LINUX1 openais: Members Left:
Jun 23 18:49:13 LINUX1 openais: Members Joined:
Jun 23 18:49:13 LINUX1 openais: This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais: CLM CONFIGURATION CHANGE
Jun 23 18:49:13 LINUX1 openais: New Configuration:
Jun 23 18:49:13 LINUX1 openais:        r(0) ip(192.168.10.31)  
Jun 23 18:49:13 LINUX1 openais: Members Left:
Jun 23 18:49:13 LINUX1 openais: Members Joined:
Jun 23 18:49:13 LINUX1 openais:        r(0) ip(192.168.10.31)  
Jun 23 18:49:13 LINUX1 openais: This node is within the primary component and will provide service.
Jun 23 18:49:13 LINUX1 openais: entering OPERATIONAL state.
Jun 23 18:49:13 LINUX1 openais: quorum regained, resuming activity
Jun 23 18:49:13 LINUX1 openais: got nodejoin message 192.168.10.31
Jun 23 18:49:13 LINUX1 ccsd: Initial status:: Quorate
Jun 23 18:49:19 LINUX1 fenced: LINUX2 not a cluster member after 3 sec post_join_delay
Jun 23 18:49:19 LINUX1 fenced: fencing node "LINUX2"
Jun 23 18:49:19 LINUX1 fence_manual: Node LINUX2 needs to be reset before recovery can procede.  Waiting for LINUX2 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX2)

然後我執行fence_ack_manual -n LINUX2
日誌刷出:
Jun 23 18:50:43 LINUX1 fenced: fence "LINUX2" success
Jun 23 18:50:48 LINUX1 ccsd: Attempt to close an unopened CCS descriptor (180).
Jun 23 18:50:48 LINUX1 ccsd: Error while processing disconnect: Invalid request descriptor

我如果在另一台機器上起cman,也會刷出相同的日誌,只是 LINUX2換成了LINUX1;

兩台機器cman都起來后,兩台機器的情況入下:
# clustat -l
Member Status: Quorate
  Member Name                        ID   Status
  ------ ----                        ---- ------
  LINUX1                             1 Online, Local
  LINUX2                             2 Offline

# clustat -l
Member Status: Quorate
  Member Name                        ID   Status
  ------ ----                        ---- ------
  LINUX1                             1 Offline
  LINUX2                             2 Online, Local


請教各位高手,這是什麼問題,如何解決?

[ 本帖最後由 txl829 於 2008-6-28 14:32 編輯 ]
《解決方案》

把你的結構拓撲,以及
配置文件/etc/cluster/cluster.conf,/etc/hosts,以及/var/log/message拿來看看。

我覺得,應該是配置文件錯。
《解決方案》

# cat cluster.conf
<?xml version="1.0" ?>
<cluster config_version="2" name="_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="LINUX1" nodeid="1" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="svr_ip" nodename="LINUX1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="LINUX2" nodeid="2" votes="1">
                        <fence>
                                <method name="1">
                                        <device name="svr_ip" nodename="LINUX2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="svr_ip"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="" ordered="0" restricted="0">
                                <failoverdomainnode name="LINUX1" priority="1"/>
                                <failoverdomainnode name="LINUX2" priority="1"/>
                        </failoverdomain>
                </failoverdomains>
                <resources/>
                <service autostart="1" domain="" name="serv_ip" recovery="relocate">
                        <ip address="192.168.10.32" monitor_link="1"/>
                </service>
        </rm>
</cluster>


hosts
# cat hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost.localdomain localhost
::1     localhost6.localdomain6 localhost6
192.168.10.11   1   
192.168.10.13   2   
192.168.10.31   LINUX1
192.168.10.33   LINUX2
192.168.10.32   svr_ip



日誌

Jun 23 22:24:42 LINUX2 ccsd: Starting ccsd 2.0.60:
Jun 23 22:24:42 LINUX2 ccsd:  Built: Jan 23 2007 12:42:25
Jun 23 22:24:42 LINUX2 ccsd:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Jun 23 22:24:42 LINUX2 ccsd: cluster.conf (cluster name = _cluster, version = 2) found.
Jun 23 22:24:45 LINUX2 openais: AIS Executive Service RELEASE 'subrev 1324 version 0.80.2'
Jun 23 22:24:45 LINUX2 openais: Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Jun 23 22:24:45 LINUX2 openais: Copyright (C) 2006 Red Hat, Inc.
Jun 23 22:24:45 LINUX2 openais: AIS Executive Service: started and ready to provide service.
Jun 23 22:24:45 LINUX2 openais: Using default multicast address of 239.192.88.13
Jun 23 22:24:45 LINUX2 openais: openais component openais_cpg loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_cfg loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais: openais component openais_msg loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_lck loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_evt loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_ckpt loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_amf loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_clm loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais: openais component openais_evs loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais: openais component openais_cman loaded.
Jun 23 22:24:45 LINUX2 openais: Registering service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais: Token Timeout (10000 ms) retransmit timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais: token hold (386 ms) retransmits before loss (20 retrans)
Jun 23 22:24:45 LINUX2 openais: join (60 ms) send_join (0 ms) consensus (4800 ms) merge (200 ms)
Jun 23 22:24:45 LINUX2 openais: downcheck (1000 ms) fail to recv const (50 msgs)
Jun 23 22:24:45 LINUX2 openais: seqno unchanged const (30 rotations) Maximum network MTU 1500
Jun 23 22:24:45 LINUX2 openais: window size per rotation (50 messages) maximum messages per rotation (17 messages)
Jun 23 22:24:45 LINUX2 openais: send threads (0 threads)
Jun 23 22:24:45 LINUX2 openais: RRP token expired timeout (495 ms)
Jun 23 22:24:45 LINUX2 openais: RRP token problem counter (2000 ms)
Jun 23 22:24:45 LINUX2 openais: RRP threshold (10 problem count)
Jun 23 22:24:45 LINUX2 openais: RRP mode set to none.
Jun 23 22:24:45 LINUX2 openais: heartbeat_failures_allowed (0)
Jun 23 22:24:45 LINUX2 openais: max_network_delay (50 ms)
Jun 23 22:24:45 LINUX2 openais: HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Jun 23 22:24:45 LINUX2 openais: Receive multicast socket recv buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais: Transmit multicast socket send buffer size (262142 bytes).
Jun 23 22:24:45 LINUX2 openais: The network interface is now up.
Jun 23 22:24:45 LINUX2 openais: Created or loaded sequence id 0.192.168.10.33 for this ring.
Jun 23 22:24:45 LINUX2 openais: entering GATHER state from 15.
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais extended virtual synchrony service'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais cluster membership service B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais availability management framework B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais checkpoint service B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais event service B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais distributed locking service B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais message service B.01.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais configuration service'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais cluster closed process group service v1.01'
Jun 23 22:24:45 LINUX2 openais: Initialising service handler 'openais CMAN membership service 2.01'
Jun 23 22:24:45 LINUX2 openais: CMAN 2.0.60 (built Jan 23 2007 12:42:29) started
Jun 23 22:24:45 LINUX2 openais: Not using a virtual synchrony filter.
Jun 23 22:24:45 LINUX2 openais: Creating commit token because I am the rep.
Jun 23 22:24:45 LINUX2 openais: Saving state aru 0 high seq received 0
Jun 23 22:24:45 LINUX2 openais: entering COMMIT state.
Jun 23 22:24:45 LINUX2 openais: entering RECOVERY state.
Jun 23 22:24:45 LINUX2 openais: position member 192.168.10.33:
Jun 23 22:24:45 LINUX2 openais: previous ring seq 0 rep 192.168.10.33
Jun 23 22:24:45 LINUX2 openais: aru 0 high delivered 0 received flag 0
Jun 23 22:24:45 LINUX2 openais: Did not need to originate any messages in recovery.
Jun 23 22:24:45 LINUX2 openais: Storing new sequence id for ring 4
Jun 23 22:24:45 LINUX2 openais: Sending initial ORF token
Jun 23 22:24:45 LINUX2 openais: CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais: New Configuration:
Jun 23 22:24:45 LINUX2 openais: Members Left:
Jun 23 22:24:45 LINUX2 openais: Members Joined:
Jun 23 22:24:45 LINUX2 openais: This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais: CLM CONFIGURATION CHANGE
Jun 23 22:24:45 LINUX2 openais: New Configuration:
Jun 23 22:24:45 LINUX2 openais:        r(0) ip(192.168.10.33)  
Jun 23 22:24:45 LINUX2 openais: Members Left:
Jun 23 22:24:45 LINUX2 openais: Members Joined:
Jun 23 22:24:45 LINUX2 openais:        r(0) ip(192.168.10.33)  
Jun 23 22:24:45 LINUX2 openais: This node is within the primary component and will provide service.
Jun 23 22:24:45 LINUX2 openais: entering OPERATIONAL state.
Jun 23 22:24:45 LINUX2 openais: quorum regained, resuming activity
Jun 23 22:24:45 LINUX2 openais: got nodejoin message 192.168.10.33
Jun 23 22:24:45 LINUX2 ccsd: Initial status:: Quorate
Jun 23 22:24:50 LINUX2 fenced: LINUX1 not a cluster member after 3 sec post_join_delay
Jun 23 22:24:50 LINUX2 fenced: fencing node "LINUX1"
Jun 23 22:24:50 LINUX2 fence_manual: Node LINUX1 needs to be reset before recovery can procede.  Waiting for LINUX1 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n LINUX1)
Jun 23 22:25:42 LINUX2 fenced: fence "LINUX1" success
Jun 23 22:25:47 LINUX2 ccsd: Attempt to close an unopened CCS descriptor (180).
Jun 23 22:25:47 LINUX2 ccsd: Error while processing disconnect: Invalid request descriptor

[ 本帖最後由 txl829 於 2008-6-28 14:33 編輯 ]
《解決方案》

1、你的cluster是用conga配置的還是system-config-cluster配置的?

檢查所有的集群組件安裝成功,檢查關鍵服務是否啟動成功。

2、clustat -l顯示的結果是1台主機被識別,另一台主機並沒有被識別,而且顯示都沒有被rmager管理,好似心跳解析有問題,從hosts文件可見,建議換個IP段,命名使用全稱。
《解決方案》

基本上是上面說的問題,第二台主機在啟動的時候無法獲得第一台主機的心跳,當然第一台主機也是一樣。
日誌中顯示不出除自己之外其他主機加入的信息,在此情況下實際上cluster是無法quorum的。

所以:
首先你的硬體結構是怎樣的?物理上必須要保證心跳信號所在的線路也就是10.31和10.33是通的,同時注意檢查防火牆是否對心跳信號有影響。
另外既然是RHEL5的集群,你是在用哪個kernel?如果是xen的話,換成普通的kernel。xend會對集群網路配置有影響。
《解決方案》

回復 #1 txl829 的帖子

1.我是用system-config-cluster配置的;
2.兩台機器的10.31,10.33的地址是通的,相互能ping通,兩台機器的防火牆都已禁用;
3.我見clustat -l看到的信息不對,因此沒有起rgmanager;
4.操作系統的內核應該不是xen,我用redhat的的光碟裝的,沒動過內核;
5.後來我用conga重新配了一遍,還是同樣的問題;
6.我還遇到過這樣的情況,兩台機器用clustat -l看,兩個節點都是online,但Local會分別的自已的節點上,也就是講在node1上看,local就在node1,在node2上看,local就在node2。
《解決方案》

6.我還遇到過這樣的情況,兩台機器用clustat -l看,兩個節點都是online,但Local會分別的自已的節點上,也就是講在node1上看,local就在node1,在node2上看,local就在node2。

這是正常的。至於你說的上面其他的情況,按照你講的應該沒有問題。

[火星人 ] 請教RHCS的問題(REDHAT Enterprise 5)已經有381次圍觀

http://coctec.com/docs/service/show-post-7339.html