歡迎您光臨本站 註冊首頁

利用rhcs 4.3 做集群所遇到的問題?

←手機掃碼閱讀     火星人 @ 2014-03-04 , reply:0

利用rhcs 4.3 做集群所遇到的問題?

os :redhat linux 4.3
cluster:cluster 4.3


手工切換的時候,由於在b機上不能umount某個共享的文件系統,導致不能進行接管?

為什麼啊?

沒有使用gfs

cluster.conf文件

<?xml version="1.0"?>
<cluster config_version="10" name="alpha_cluster">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="mainserver1" votes="1">
                        <fence/>
                </clusternode>
                <clusternode name="mainserver2" votes="1">
                        <fence/>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1"/>
        <fencedevices>
                <fencedevice agent="fence_manual" name="test"/>
        </fencedevices>
        <rm>
                <failoverdomains>
                        <failoverdomain name="oracle" ordered="1" restricted="1">
                                <failoverdomainnode name="mainserver1" priority="1"/>
                                <failoverdomainnode name="mainserver2" priority="2"/>
                        </failoverdomain>
                </failoverdomains>
                <resources>
                        <ip address="138.148.221.3" monitor_link="1"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_oracle_data" force_fsck="0" force_unmount="1" fsid="15674" fstype="ext3" mountpoint="/data/oradata" name="oradata" options="" self_fence="1"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_oracle_log" force_fsck="0" force_unmount="1" fsid="36402" fstype="ext3" mountpoint="/data/oralog" name="oralog" options="" self_fence="0"/>
                        <fs device="/dev/mapper/VolGroupArray-lv_images" force_fsck="0" force_unmount="1" fsid="48746" fstype="ext3" mountpoint="/data/images" name="images" options="" self_fence="0"/>
                        <script file="/etc/init.d/hongsy.sh" name="orace"/>
                        <script file="/etc/init.d/cluster_svr" name="cluster_svr"/>
                </resources>
                <service autostart="1" domain="oracle" name="oracle">
                        <ip ref="138.148.221.3"/>
                        <fs ref="oradata"/>
                        <fs ref="oralog"/>
                        <fs ref="images"/>
                        <script ref="orace"/>
                </service>
        </rm>
</cluster>




相關操作系統日誌

Sep 28 03:24:46 mainserver1 ccsd: Starting ccsd 1.0.3:
Sep 28 03:24:46 mainserver1 ccsd:  Built: Jan 25 2006 16:54:55
Sep 28 03:24:46 mainserver1 ccsd:  Copyright (C) Red Hat, Inc.  2004  All rights reserved.
Sep 28 03:24:47 mainserver1 ccsd: startup succeeded
Sep 28 03:24:52 mainserver1 kernel: CMAN 2.6.9-43.8 (built Feb 26 2006 21:06:18) installed
Sep 28 03:24:52 mainserver1 kernel: NET: Registered protocol family 30
Sep 28 03:24:52 mainserver1 ccsd: cluster.conf (cluster name = alpha_cluster, version = 8) found.
Sep 28 03:24:52 mainserver1 ccsd: Remote copy of cluster.conf is from quorate node.
Sep 28 03:24:52 mainserver1 ccsd:  Local version # : 8
Sep 28 03:24:52 mainserver1 ccsd:  Remote version #: 9
Sep 28 03:24:52 mainserver1 ccsd: Switching to remote copy.
Sep 28 03:24:52 mainserver1 kernel: CMAN: Waiting to join or form a Linux-cluster
Sep 28 03:24:52 mainserver1 ccsd: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.5
Sep 28 03:24:52 mainserver1 ccsd: Initial status:: Inquorate
Sep 28 03:24:52 mainserver1 kernel: CMAN: sending membership request
Sep 28 03:24:52 mainserver1 kernel: CMAN: got node mainserver2
Sep 28 03:24:52 mainserver1 kernel: CMAN: quorum regained, resuming activity
Sep 28 03:24:52 mainserver1 ccsd: Cluster is quorate.  Allowing connections.
Sep 28 03:24:53 mainserver1 kernel: DLM 2.6.9-41.7 (built Feb 26 2006 21:30:10) installed
Sep 28 03:24:53 mainserver1 cman: startup succeeded
Sep 28 03:24:59 mainserver1 fenced: startup succeeded
Sep 28 03:25:13 mainserver1 lock_gulmd: no <gulm> section detected in /etc/cluster/cluster.conf succeeded
Sep 28 03:25:18 mainserver1 clurgmgrd: <notice> Resource Group Manager Starting
Sep 28 03:25:18 mainserver1 clurgmgrd: <info> Loading Service Data
Sep 28 03:25:18 mainserver1 rgmanager: clurgmgrd 啟動 succeeded
Sep 28 03:25:19 mainserver1 clurgmgrd: <info> Initializing Services
Sep 28 03:25:19 mainserver1 clurgmgrd: : <info> Executing /etc/init.d/hongsy.sh stop
Sep 28 03:25:19 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:25:19 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:25:19 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:25:22 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:25:22 mainserver1 clurgmgrd: : <info> /dev/mapper/VolGroupArray-lv_oracle_data is not mounted
Sep 28 03:25:24 mainserver1 clurgmgrd: : <info> /dev/mapper/VolGroupArray-lv_oracle_log is not mounted
Sep 28 03:25:26 mainserver1 clurgmgrd: : <info> /dev/mapper/VolGroupArray-lv_images is not mounted
Sep 28 03:25:28 mainserver1 clurgmgrd: <info> Services Initialized
Sep 28 03:25:28 mainserver1 clurgmgrd: <info> Logged in SG "usrm::manager"
Sep 28 03:25:28 mainserver1 clurgmgrd: <info> Magma Event: Membership Change
Sep 28 03:25:28 mainserver1 clurgmgrd: <info> State change: Local UP
Sep 28 03:25:30 mainserver1 clurgmgrd: <notice> Starting stopped service oracle
Sep 28 03:25:30 mainserver1 clurgmgrd: : <info> mounting /dev/mapper/VolGroupArray-lv_oracle_data on /data/oradata
Sep 28 03:25:30 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:30 mainserver1 kernel: EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
Sep 28 03:25:30 mainserver1 kernel: EXT3 FS on dm-2, internal journal
Sep 28 03:25:30 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:30 mainserver1 clurgmgrd: : <info> mounting /dev/mapper/VolGroupArray-lv_oracle_log on /data/oralog
Sep 28 03:25:31 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs warning: checktime reached, running e2fsck is recommended
Sep 28 03:25:31 mainserver1 kernel: EXT3 FS on dm-4, internal journal
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: recovery complete.
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:31 mainserver1 clurgmgrd: : <info> mounting /dev/mapper/VolGroupArray-lv_images on /data/images
Sep 28 03:25:31 mainserver1 kernel: kjournald starting.  Commit interval 5 seconds
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs warning: mounting unchecked fs, running e2fsck is recommended
Sep 28 03:25:31 mainserver1 kernel: EXT3 FS on dm-3, internal journal
Sep 28 03:25:31 mainserver1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 28 03:25:31 mainserver1 clurgmgrd: : <info> Adding IPv4 address 138.148.221.3 to eth2
Sep 28 03:25:32 mainserver1 clurgmgrd: : <info> Executing /etc/init.d/hongsy.sh start
Sep 28 03:25:32 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:25:42 mainserver1 clurgmgrd: <info> Magma Event: Membership Change
Sep 28 03:25:42 mainserver1 clurgmgrd: <info> State change: mainserver2 UP
Sep 28 03:27:16 mainserver1 kernel: oracle(7198): floating-point assist fault at ip 4000000009f5e0e2, isr 0000020000001001
Sep 28 03:27:16 mainserver1 last message repeated 3 times
Sep 28 03:27:17 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:27:17 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:27:19 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:27:26 mainserver1 clurgmgrd: <notice> Service oracle started
Sep 28 03:28:02 mainserver1 clurgmgrd: : <info> Executing /etc/init.d/hongsy.sh status
Sep 28 03:28:04 mainserver1 su(pam_unix): session opened for user oracle by root(uid=0)
Sep 28 03:28:25 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:28:32 mainserver1 clurgmgrd: : <info> Executing /etc/init.d/hongsy.sh status
Sep 28 03:29:33 mainserver1 last message repeated 2 times
Sep 28 03:29:37 mainserver1 clurgmgrd: <notice> Stopping service oracle
Sep 28 03:29:37 mainserver1 clurgmgrd: : <info> Executing /etc/init.d/hongsy.sh stop
Sep 28 03:29:40 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:29:41 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:29:52 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:29:55 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:29:57 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:29:59 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:30:01 mainserver1 crond(pam_unix): session opened for user root by (uid=0)
Sep 28 03:30:01 mainserver1 crond(pam_unix): session opened for user root by (uid=0)
Sep 28 03:30:01 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:30:09 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:30:10 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:30:10 mainserver1 su(pam_unix): session opened for user oracle by (uid=0)
Sep 28 03:30:10 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:30:11 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:30:13 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:30:14 mainserver1 clurgmgrd: : <info> Removing IPv4 address 138.148.221.3 from eth2
Sep 28 03:30:15 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:30:22 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:30:23 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:30:26 mainserver1 clurgmgrd: : <info> unmounting /data/oradata
Sep 28 03:30:27 mainserver1 clurgmgrd: : <info> unmounting /data/oralog
Sep 28 03:30:27 mainserver1 clurgmgrd: : <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:28 mainserver1 clurgmgrd: : <warning> killing process 4644 (root gam_serve /data/oralog)
Sep 28 03:30:29 mainserver1 clurgmgrd: : <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:32 mainserver1 su(pam_unix): session closed for user oracle
Sep 28 03:30:34 mainserver1 clurgmgrd: : <info> unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: : <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: : <warning> killing process 9203 (root gam_serve /data/oralog)
Sep 28 03:30:35 mainserver1 clurgmgrd: : <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:35 mainserver1 clurgmgrd: : <err> 'umount /data/oralog' failed, error=0
Sep 28 03:30:35 mainserver1 clurgmgrd: <notice> stop on fs "oralog" returned 2 (invalid argument(s))
Sep 28 03:30:35 mainserver1 clurgmgrd: <crit> #12: RG oracle failed to stop; intervention required
Sep 28 03:30:35 mainserver1 clurgmgrd: <notice> Service oracle is failed
Sep 28 03:30:36 mainserver1 clurgmgrd: <warning> #70: Attempting to restart service oracle locally.
Sep 28 03:30:36 mainserver1 clurgmgrd: <err> #43: Service oracle has failed; can not start.
Sep 28 03:30:36 mainserver1 clurgmgrd: <alert> #2: Service oracle returned failure code.  Last Owner: mainserver1
Sep 28 03:30:36 mainserver1 clurgmgrd: <alert> #4: Administrator intervention required.
Sep 28 03:30:44 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:30:47 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:31:01 mainserver1 crond(pam_unix): session closed for user root
Sep 28 03:31:01 mainserver1 crond(pam_unix): session closed for user root
Sep 28 03:31:33 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:31:33 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:31:36 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:31:42 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:32:36 mainserver1 htt_server: status has not been enabled yet. (1, 1)
Sep 28 03:33:08 mainserver1 htt_server: status has not been enabled yet. (1, 2)
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e904, ip=0x4000000000010091
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e904, ip=0x40000000000100b0
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e90c, ip=0x40000000000100f1
Sep 28 03:38:11 mainserver1 kernel: clurgmgrd(9343): unaligned access to 0x2000000001b2e90c, ip=0x4000000000010110


請兄弟們說道說道?
《解決方案》

對於oracle來說,如果進程沒有完全關閉,即可能還有訪問掛載點的進程沒有被終止,這個時候就會造成umount失敗。
可以在umount的失敗的時候通過命令lsof看看是否有訪問掛載點的進程。
我想通常都是服務腳本導致這個問題。你可以嘗試用系統自帶腳本去比較一下以定位是否自定義腳本的錯誤。
《解決方案》

在redhat的init 5圖形模式下,存在一個進程gam_server,它會一直監控文件改動,從而修改界面上的一些狀態(比如回收站的顯示)。
但是在gnome存在一個bug,因為進程nautilus會調用gam_server的介面來監控所有文件加,這樣當一個分區被mount,且有文件時,會一直被gam_server訪問。這時候分區將不能被umount,即使殺掉gam_server,因為nautilus會迅速重起gam_server。這樣HA切換服務時將不能umount共享分區。

解決方法:

1、不使用gnome做為窗口,使用kde

2、創建 /etc/gamin/gaminrc
把所有共享分區的mount point寫道notify 後面
例如
notify /oradata* /opt/oracle*

然後重起xwindows即可

第二步不是必須的,redhat建議這麼做,但是我測試沒有生效,使用kde可以避免該問題的發生。當然,能說服用戶平時不用圖形界面最好,需要管理的時候在啟動圖形界面即可。
《解決方案》

這個與腳本有關嗎

如果你們仔細看的話

我的service中共涉及到3個掛載點

/data/oradata
/data/oralog
/data/images

但是除了/data/oralog不能正常umount之外,其它的兩個都可以啊!

問題很怪,不好理解?再請各位指導指導下?、

我在虛擬機上測試  好好地 沒有任何問題啊
《解決方案》

to  ljhb


如果將伺服器運行在模式3下
gam_serve 這個進行還會存在嗎?
需要測試下
《解決方案》

樓主請先確定一下你的那個 .sh 能不能完全關閉 oracle。
《解決方案》

原帖由 西方 於 2008-10-8 15:26 發表 http://linux.chinaunix.net/bbs/images/common/back.gif
這個與腳本有關嗎

如果你們仔細看的話

我的service中共涉及到3個掛載點

/data/oradata
/data/oralog
/data/images

但是除了/data/oralog不能正常umount之外,其它的兩個都可以啊!

問題很怪, ...

那你還是要找找什麼在訪問那個不能被卸掉的目錄吧?至少lsof要執行一下的吧?!
《解決方案》

Sep 28 03:30:34 mainserver1 clurgmgrd: : <notice> Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: : <warning> killing process 9203 (root gam_serve /data/oralog)
Sep 28 03:30:35 mainserver1 clurgmgrd: : <crit> Could not clean up mountpoint /data/oralog
Sep 28 03:30:35 mainserver1 clurgmgrd: : <err> 'umount /data/oralog' failed, error=0


這個中表明是gam_serve在作怪啊
《解決方案》

原帖由 西方 於 2008-10-13 13:52 發表 http://linux.chinaunix.net/bbs/images/common/back.gif
Sep 28 03:30:34 mainserver1 clurgmgrd: :  Forcefully unmounting /data/oralog
Sep 28 03:30:34 mainserver1 clurgmgrd: :  killing process 9203 (root gam_serve /data/oralog)
Sep 28 03 ...


未必。

[火星人 ] 利用rhcs 4.3 做集群所遇到的問題?已經有948次圍觀

http://coctec.com/docs/service/show-post-7027.html