故障現象:HP-U MC 上面的資料庫經常突然間連不上,查看集群狀態是總顯示 LAN0 為down ,重啟集群 cmruncl -v后恢復正常,然後又不定時的出現原來的故障,只能重啟集群才能恢復正常,麻煩各位兄弟幫小弟分析一下 #cmviewcl -v
Network_Parameters: INTERFACE STATUS PATH NAME PRIMARY down (disabled) (IP only) 0/1/1/0 lan0 PRIMARY up 0/2/2/0 lan1 STANDBY up 0/2/2/1 lan3 STANDBY up 0/1/1/1 lan2
syslog.log如下
Oct 19 08:07:20 syczora1 cmnetd: 10.20.90.8 failed. Oct 19 08:07:20 syczora1 cmnetd: lan2 is down at the IP layer. Oct 19 08:07:20 syczora1 cmnetd: lan2 failed. Oct 19 08:06:47 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:07:20 syczora1 above message repeats 8 times Oct 19 08:07:20 syczora1 cmnetd: Subnet 10.20.90.0 down Oct 19 08:07:20 syczora1 cmcld: Subnet 10.20.90.0 in package orapkg is down. Oct 19 08:07:20 syczora1 cmcld: Failing package orapkg on node syczora1 due to subnet failure. Oct 19 08:07:20 syczora1 cmcld: Request from node syczora1 to fail package orapkg on node syczora1. Oct 19 08:07:20 syczora1 cmcld: Executing '/etc/cmcluster/orapkg/orapkg.cntl stop' for package orapkg, as service PKG*107009. Oct 19 08:07:20 syczora1 cmserviced: Request to perform run service PKG*107009 Oct 19 08:07:20 syczora1 su: + tty?? root-oracle Oct 19 08:07:30 syczora1 cmnetd: 10.20.90.8 recovered. Oct 19 08:07:30 syczora1 cmnetd: Subnet 10.20.90.0 up Oct 19 08:07:30 syczora1 cmnetd: lan2 is up at the IP layer. Oct 19 08:07:29 syczora1 su: + tty?? root-oracle Oct 19 08:07:30 syczora1 cmnetd: lan2 recovered. Oct 19 08:08:10 syczora1 syslog: cmmodnet -r -i 10.20.90.7 10.20.90.0 Oct 19 08:08:11 syczora1 LVM: vgchange -a n vgdata Oct 19 08:08:11 syczora1 LVM: vgchange -a n vgarch Oct 19 08:08:11 syczora1 cmserviced: Service PKG*107009 terminated due to an exit(0). Oct 19 08:08:11 syczora1 cmcld: Halted package orapkg on node syczora1. Oct 19 08:08:11 syczora1 cmcld: Request from node syczora1 to start package orapkg on node syczora1. Oct 19 08:08:11 syczora1 cmcld: Executing '/etc/cmcluster/orapkg/orapkg.cntl start' for package orapkg, as service PKG*107009. Oct 19 08:08:11 syczora1 cmserviced: Request to perform run service PKG*107009 Oct 19 08:08:17 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:08:26 syczora1 LVM: vgchange -a e vgdata Oct 19 08:08:41 syczora1 LVM: vgchange -a e vgarch Oct 19 08:08:42 syczora1 syslog: cmmodnet -a -i 10.20.90.7 10.20.90.0 Oct 19 08:08:42 syczora1 su: + tty?? root-oracle Oct 19 08:09:16 syczora1 cmserviced: Service PKG*107009 terminated due to an exit(0). Oct 19 08:09:16 syczora1 cmcld: Started package orapkg on node syczora1. Oct 19 08:09:47 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:08:55 syczora1 su: + tty?? root-oracle Oct 19 08:13:00 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:13:47 syczora1 above message repeats 3 times Oct 19 08:14:30 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:32:30 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:33:48 syczora1 above message repeats 12 times Oct 19 08:34:00 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:53:30 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 08:53:48 syczora1 above message repeats 13 times Oct 19 08:55:00 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:08:08 syczora1 sshd: SSH: Server;Ltype: Version;Remote: 10.20.90.127-1065;Protocol: 2.0;Client: SecureCRT_5.1.3 (build 281) SecureCRT Oct 19 09:08:31 syczora1 sshd: Accepted password for root from 10.20.90.127 port 1065 ssh2 Oct 19 09:09:02 syczora1 syslog: cmruncl -v Oct 19 09:09:07 syczora1 syslog: cmruncl: Failed to validate the network configuration but will try to start the cluster anyway. Oct 19 09:10:04 syczora1 syslog: cmhaltcl -f -v Oct 19 09:10:04 syczora1 cmcld: Request from root on node syczora1 to halt the cluster on this node Oct 19 09:10:04 syczora1 cmcld: Request from node syczora1 to disable node switching for package orapkg on node syczora1. Oct 19 09:10:00 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:10:04 syczora1 above message repeats 10 times Oct 19 09:10:04 syczora1 cmcld: Disabled package orapkg on node syczora1. Oct 19 09:10:04 syczora1 cmcld: Disabled package orapkg on node syczora2. Oct 19 09:10:04 syczora1 cmcld: Request from node syczora1 to disable global switching for package orapkg. Oct 19 09:10:04 syczora1 cmcld: Disabled switching for package orapkg. Oct 19 09:10:04 syczora1 cmserviced: Request to perform run service PKG*107009 Oct 19 09:10:04 syczora1 cmcld: Request from root on node syczora1 to halt the cluster on this node Oct 19 09:10:04 syczora1 su: + tty?? root-oracle Oct 19 09:10:04 syczora1 cmcld: Request from root on node syczora1 to halt the cluster on this node Oct 19 09:10:04 syczora1 cmcld: Request from node syczora1 to begin the halting process for package orapkg on node syczora1. Oct 19 09:10:04 syczora1 cmcld: Halting package orapkg on node syczora1 as requested by user. Oct 19 09:10:04 syczora1 cmcld: Request from node syczora1 to halt package orapkg on node syczora1. Oct 19 09:10:04 syczora1 cmcld: Executing '/etc/cmcluster/orapkg/orapkg.cntl stop' for package orapkg, as service PKG*107009. Oct 19 09:10:11 syczora1 su: + tty?? root-oracle Oct 19 09:10:43 syczora1 syslog: cmmodnet -r -i 10.20.90.7 10.20.90.0 Oct 19 09:10:44 syczora1 LVM: vgchange -a n vgdata Oct 19 09:10:44 syczora1 LVM: vgchange -a n vgarch Oct 19 09:10:44 syczora1 cmserviced: Service PKG*107009 terminated due to an exit(0). Oct 19 09:10:44 syczora1 cmcld: Halted package orapkg on node syczora1. Oct 19 09:10:44 syczora1 cmcld: Request from root on node syczora1 to halt the cluster on this node Oct 19 09:10:44 syczora1 cmcld: Request from node syczora1 to enable global switching for package orapkg. Oct 19 09:10:44 syczora1 cmcld: Enabled switching for package orapkg. Oct 19 09:10:47 syczora1 cmcld: Member 2 is HALTING Oct 19 09:10:47 syczora1 cmcld: Lost heartbeat to syczora2 Oct 19 09:10:47 syczora1 cmcld: Resolving quorum with members syczora1 Oct 19 09:10:47 syczora1 cmcld: Quorum satisfied Oct 19 09:10:47 syczora1 cmserviced: Service cmlvmd terminated due to an exit(0). Oct 19 09:10:47 syczora1 cmserviced: Service cmlockd terminated due to an exit(0). Oct 19 09:10:47 syczora1 cmcld: Membership: membership at 1 is REFORMING (coordinator 1) includes: 1 excludes: 2 Oct 19 09:10:47 syczora1 cmcld: Membership: membership at 2 is FORMED (coordinator 1) includes: 1 excludes: 2 Oct 19 09:10:47 syczora1 cmcld: Closing route 192.168.100.2:5300 on fd 32 to syczora2: closing member Oct 19 09:10:47 syczora1 cmcld: The following node(s) syczora2(id=2), left the cluster. Oct 19 09:10:47 syczora1 cmcld: 1 nodes have formed a new cluster, sequence #2 Oct 19 09:10:47 syczora1 cmcld: The new active cluster membership is: syczora1(id=1) Oct 19 09:10:47 syczora1 cmcld: Received clear reply in state clearing Oct 19 09:10:47 syczora1 cmcld: Cluster CDB version 12 and node 1 CDB version 12 Oct 19 09:10:47 syczora1 cmcld: Package orapkg cannot run on this node because switching has been disabled for this node Oct 19 09:10:50 syczora1 cmcld: Member syczora1 halting. Oct 19 09:10:50 syczora1 cmcld: Membership: membership at 2 is HALTED (coordinator 1) includes: 1 excludes: 2 Oct 19 09:10:50 syczora1 cmnetd: Subnet 10.20.90.0 switching from lan2 to lan0 Oct 19 09:10:50 syczora1 cmnetd: Subnet 10.20.90.0 switched from lan2 to lan0 Oct 19 09:10:50 syczora1 cmnetd: lan2 switched to lan0 Oct 19 09:10:50 syczora1 cmserviced: Service cmnetd terminated due to an exit(0). Oct 19 09:10:50 syczora1 cmserviced: Service cmfileassistd terminated due to an exit(0). Oct 19 09:10:50 syczora1 cmserviced: Request to perform halt service cmlogd Oct 19 09:10:55 syczora1 cmserviced: Service cmlogd terminated due to a signal(9). Oct 19 09:10:55 syczora1 cmcld: This node (syczora1) has ceased cluster activities. Oct 19 09:10:55 syczora1 cmcld: Daemon exiting Oct 19 09:10:55 syczora1 cmdisklockd: cmdisklockd exiting Oct 19 09:10:55 syczora1 cmproxyd: The cluster daemon aborted our connection (231). Oct 19 09:10:55 syczora1 cmwbemd: The cluster daemon aborted our connection (231). Oct 19 09:10:55 syczora1 cmclconfd: The cluster daemon aborted our connection (231). Oct 19 09:10:55 syczora1 cmclconfd: The Serviceguard daemon, cmcld, exited normally. Oct 19 09:10:56 syczora1 cmserviced: Service assistant daemon halted. Oct 19 09:13:00 syczora1 sshd: SSH: Server;Ltype: Version;Remote: 10.20.90.127-1076;Protocol: 2.0;Client: SecureCRT_5.1.3 (build 281) SecureCRT Oct 19 09:13:05 syczora1 sshd: Accepted password for root from 10.20.90.127 port 1076 ssh2 Oct 19 09:13:30 syczora1 syslog: cmhaltcl -f -v Oct 19 09:13:48 syczora1 syslog: cmhaltcl -f -v Oct 19 09:13:54 syczora1 syslog: cmruncl =v Oct 19 09:14:11 syczora1 syslog: cmruncl -v Oct 19 09:14:45 syczora1 cmclconfd: Request from root on node syczora1 to start the cluster on this node Oct 19 09:14:46 syczora1 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 300. Oct 19 09:14:46 syczora1 cmcld: Global Cluster Information: Oct 19 09:14:46 syczora1 cmcld: Network Polling Interval is 2.00 seconds. Oct 19 09:14:46 syczora1 cmcld: IO Timeout Extension is 0.00 seconds. Oct 19 09:14:46 syczora1 cmcld: Auto Start Timeout is 600.00 seconds. Oct 19 09:14:46 syczora1 cmcld: Failover Optimization is disabled. Oct 19 09:14:46 syczora1 cmcld: Information Specific to node syczora1: Oct 19 09:14:46 syczora1 cmcld: Cluster lock disk: /dev/dsk/c2t0d0. Oct 19 09:14:46 syczora1 cmcld: lan3 0x002481773f9f 192.168.100.1 bridged net:1 Oct 19 09:14:46 syczora1 cmcld: lan0 0x0024817777c2 10.20.90.8 bridged net:2 Oct 19 09:14:46 syczora1 cmcld: lan1 0x0024817777c3 192.168.10.1 bridged net:3 Oct 19 09:14:46 syczora1 cmcld: lan2 0x002481773f9e standby bridged net:2 Oct 19 09:14:46 syczora1 cmcld: Heartbeat Subnet: 192.168.100.0 Oct 19 09:14:46 syczora1 cmcld: Configured quorum disk(s) /dev/dsk/c2t0d0 Oct 19 09:14:46 syczora1 cmcld: Member Timeout is 14.00 seconds. Oct 19 09:14:46 syczora1 cmcld: Max reformation duration is 17.80 seconds. Oct 19 09:14:46 syczora1 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 1024. Oct 19 09:14:46 syczora1 cmdisklockd: Changed to working directory /var/adm/cmcluster/cmdisklockd. Oct 19 09:14:46 syczora1 cmdisklockd: cmdisklockd started Oct 19 09:14:46 syczora1 cmcld: Total allocated: 46085864 bytes, used: 3400688 bytes, unused 42685168 bytes Oct 19 09:14:46 syczora1 cmserviced: Initializing Oct 19 09:14:46 syczora1 cmserviced: Executing command: rm -f /var/adm/cmcluster/.cmserviced.*.socket Oct 19 09:14:46 syczora1 cmserviced: Request to perform run service cmlogd Oct 19 09:14:46 syczora1 cmserviced: Request to perform run service cmfileassistd Oct 19 09:14:46 syczora1 cmserviced: Request to perform run service cmlockd Oct 19 09:14:46 syczora1 cmfileassistd: Changed to working directory /var/adm/cmcluster/cmfileassistd. Oct 19 09:14:46 syczora1 cmlockd: Changed to working directory /var/adm/cmcluster/cmlockd. Oct 19 09:14:46 syczora1 cmlockd: Executing command: rm -f /var/adm/cmcluster/.cmlock.*.socket Oct 19 09:14:46 syczora1 cmserviced: Request to perform run service cmnetd Oct 19 09:14:46 syczora1 cmnetd: Changed to working directory /var/adm/cmcluster/cmnetd. Oct 19 09:14:46 syczora1 cmnetd: Initializing Oct 19 09:14:46 syczora1 cmnetd: Executing command: rm -f /var/adm/cmcluster/.cmnetd.*.socket Oct 19 09:14:46 syczora1 cmnetd: Auto Failback is enabled. Oct 19 09:14:46 syczora1 cmserviced: Request to perform run service cmlvmd Oct 19 09:14:47 syczora1 cmcld: Membership: membership at 0 is REFORMING (coordinator 1) includes: 1 excludes: 2 Oct 19 09:14:47 syczora1 cmcld: Member syczora2 is joining the cluster. Oct 19 09:14:47 syczora1 cmcld: Resolving quorum with members syczora1, syczora2 Oct 19 09:14:47 syczora1 cmcld: Quorum satisfied Oct 19 09:14:47 syczora1 cmcld: Membership: membership at 1 is FORMED (coordinator 1) includes: 1 2 excludes: Oct 19 09:14:47 syczora1 cmcld: 2 nodes have formed a new cluster, sequence #1 Oct 19 09:14:47 syczora1 cmcld: The new active cluster membership is: syczora1(id=1), syczora2(id=2) Oct 19 09:14:47 syczora1 cmcld: Cluster CDB version 12 and node 1 CDB version 12 Oct 19 09:14:47 syczora1 cmcld: Cluster CDB version 12 and node 2 CDB version 12 Oct 19 09:14:47 syczora1 cmlvmd: Clvmd initialized successfully. Oct 19 09:14:47 syczora1 cmcld: Request from node syczora1 to start package orapkg on node syczora1. Oct 19 09:14:47 syczora1 cmcld: Executing '/etc/cmcluster/orapkg/orapkg.cntl start' for package orapkg, as service PKG*107009. Oct 19 09:14:47 syczora1 cmserviced: Request to perform run service PKG*107009 Oct 19 09:14:58 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:14:59 syczora1 cmdisklockd: added device: /dev/vglock:/dev/dsk/c2t0d0 Oct 19 09:14:59 syczora1 cmcld: Cluster lock disk /dev/vglock:/dev/dsk/c2t0d0 is good Oct 19 09:14:59 syczora1 cmcld: Received clear reply in state clearing Oct 19 09:15:02 syczora1 LVM: vgchange -a e vgdata Oct 19 09:15:06 syczora1 sshd: SSH: Server;LType: Throughput;Remote: 10.20.90.127-1076;IN: 14672;OUT: 4532;Duration: 120.3;tPut_in: 121.9;tPut_out: 37.7 Oct 19 09:15:17 syczora1 LVM: vgchange -a e vgarch Oct 19 09:15:18 syczora1 syslog: cmmodnet -a -i 10.20.90.7 10.20.90.0 Oct 19 09:15:18 syczora1 su: + tty?? root-oracle Oct 19 09:15:52 syczora1 cmserviced: Service PKG*107009 terminated due to an exit(0). Oct 19 09:15:52 syczora1 cmcld: Started package orapkg on node syczora1. Oct 19 09:15:31 syczora1 su: + tty?? root-oracle Oct 19 09:16:04 syczora1 telnetd: getpid: peer died: Error 0 Oct 19 09:16:26 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:17:44 syczora1 telnetd: getpid: peer died: Error 0 Oct 19 09:32:56 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:33:48 syczora1 above message repeats 11 times Oct 19 09:34:26 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:52:32 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 09:53:48 syczora1 above message repeats 12 times Oct 19 09:54:02 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 10:13:42 syczora1 cmdisklockd: Still trying to inquire cluster lock disk /dev/dsk/c2t0d0 Oct 19 10:13:48 syczora1 above message repeats 14 times