歡迎您光臨本站 註冊首頁

computing node suddently lost network connection

←手機掃碼閱讀     火星人 @ 2014-03-04 , reply:0

computing node suddently lost network connection

科學計算機群中的計算節點經常突然丟失網路連接,那位大俠知道什麼原因?

kernel: bnx2: eth0 NIC Copper Link is Down
The /var/log/messages file after a cluster boot is:

May 15 18:37:05 uranus mountd: Caught signal 15, un-registering and exiting.
May 15 18:37:17 uranus rpc.statd: Caught signal 15, un-registering and exiting.
May 15 18:39:47 uranus kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:39:54 uranus sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:39:55 uranus xinetd: /etc/xinetd.d/RCS is not a regular file. It is being skipped.
May 15 18:40:00 uranus automount: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:00 uranus automount: lookup_mount: lookup(file): key "mysql" not found in map
May 15 18:40:12 uranus smartd: Problem creating device name scan list
May 15 18:50:34 compute-0-0.local rpc.statd: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-2.local rpc.statd: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-3.local rpc.statd: Caught signal 15, un-registering and exiting.
May 15 18:50:34 compute-0-1.local rpc.statd: Caught signal 15, un-registering and exiting.
May 15 18:53:50 compute-0-3.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-3.local kernel: ata_piix 0000:00:1f.2: no available legacy port
May 15 18:53:50 compute-0-1.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:50 compute-0-2.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:51 compute-0-0.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
May 15 18:53:58 compute-0-3.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-2.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:53:59 compute-0-1.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:00 compute-0-0.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
May 15 18:54:08 compute-0-3.local smartd: Problem creating device name scan list
May 15 18:54:09 compute-0-2.local smartd: Problem creating device name scan list
May 15 18:54:10 compute-0-1.local smartd: Problem creating device name scan list
May 15 18:54:11 compute-0-0.local smartd: Problem creating device name scan list
《解決方案》

Mar 25 11:01:05 play8dz kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 11:35:02 compute-0-0.local rpc.statd: Caught signal 15, un-registering and exiting.
Mar 25 11:35:12 compute-0-1.local automount: umount_autofs_indirect: ask umount returned busy /home
Mar 25 11:35:17 compute-0-1.local rpc.statd: Caught signal 15, un-registering and exiting.
Mar 25 11:38:18 compute-0-0.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 25 11:38:23 compute-0-0.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 25 11:38:25 compute-0-0.local smartd: Problem creating device name scan list
Mar 25 11:45:20 compute-0-1.local kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 25 11:45:24 compute-0-1.local sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 25 11:45:26 compute-0-1.local smartd: Problem creating device name scan list
《解決方案》

# more /var/log/messages
Mar 23 15:37:20 compute-0-19 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible range
Mar 23 15:37:20 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 23 15:37:23 compute-0-19 sshd: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Mar 23 15:37:28 compute-0-19 ntpdate: no server suitable for synchronization found
Mar 23 15:37:29 compute-0-19 smartd: Problem creating device name scan list
Mar 23 15:48:40 compute-0-19 rockscommand: unknown roll name "%"
Mar 23 15:48:53 compute-0-19 rockscommand: unknown roll name "%"
Mar 25 11:02:10 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 15:53:38 compute-0-19 kernel: bnx2: eth0 NIC Copper Link is Down
Mar 25 15:58:28 compute-0-19 syslogd: sendto: Network is unreachable

[火星人 ] computing node suddently lost network connection已經有725次圍觀

http://coctec.com/docs/service/show-post-5781.html