rhcs floating IP 的問題

←手機掃碼閱讀火星人 @ 2014-03-04 , reply:0

rhcs floating IP 的問題

有2台HP DL580，均安裝了RHEL5.1及其自帶的rhcs相關rpm，
分配的資源有：
floating1: ip:10.82.114.4
floating2: ip:10.30.7.185
script1: 默認的vsftpd腳本
script2: my_sample腳本（這個腳本是起一個簡單的perl script，去create 一個 port，bind到floating ip 1上面，然後listen等連接，用於演示問題）

在cluster 管理界面創建2個service
service1：scrpit1，floating ip1,floating ip2
service2：script2， floating ip1
不同時enable，第一次enable service1，vsftpd服務正常起來，能夠通過floating IP連接，實驗成功。
diable了service1,enable service2,發現bind不到floating ip，報錯退出。

對比兩個service的輸出的/var/log/messages消息，發現起vsftpd服務的時候，avahi-daemon會現Register 兩個floating ip到相應的interface，而起sample服務的時候，則沒有做這個動作，導致my_sample腳本調用的時候，floating IP是不存在的，後面的bind肯定是失敗的了，實驗了mysqld,tomcat的服務，也是因為沒有Register floating IP造成失敗。

問題是：為什只有vsftpd服務的時候，rhcs回去讓avahi-daemon register floating IP?
反覆對比了cluster.conf文件，看不出哪裡不一樣的。

《解決方案》

回復 #1 PinkOrient 的帖子

補充1：
/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster config_version="32" name="eadb_cluster">
<quorumd interval="2" label="quodisk" min_score="1" tko="10" votes="3">
<heuristic interval="2" program="ping 10.82.114.1 -c1 -t1" score="1"/>
</quorumd>
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="eadb01.smartone.com" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="eadb01_ilo"/>
</method>
</fence>
</clusternode>
<clusternode name="eadb02.smartone.com" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="eadb02_ilo"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_ilo" hostname="192.168.0.10" login="Administrator" name="eadb01_ilo" passwd="password"/>
<fencedevice agent="fence_ilo" hostname="192.168.0.11" login="Administrator" name="eadb02_ilo" passwd="password"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="eadb_fd" ordered="0" restricted="0">
<failoverdomainnode name="eadb01.smartone.com" priority="1"/>
<failoverdomainnode name="eadb02.smartone.com" priority="1"/>
</failoverdomain>
<failoverdomain name="eadb_service" ordered="0" restricted="0">
<failoverdomainnode name="eadb01.smartone.com" priority="1"/>
<failoverdomainnode name="eadb02.smartone.com" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="10.82.114.4" monitor_link="1"/>
<script file="/etc/rc.d/init.d/vsftpd" name="vsftpd"/>
<clusterfs device="/dev/VolGroup00/LogVol00" force_unmount="1" fsid="53287" fstype="gfs" mountpoint="/gfs1" name="gfs_db_space" options=""/>
<clusterfs device="/dev/VolGroup01/LogVol01" force_unmount="1" fsid="1552" fstype="gfs" mountpoint="/gfs2" name="gfs_eadb_space" options=""/>
<mysql config_file="/etc/my.cnf" listen_address="0.0.0.0" mysql_options="" name="mysql_eadb" shutdown_wait="0"/>
<script file="/etc/init.d/tomcat_eadb" name="tomcat_eadb"/>
<tomcat-5 catalina_base="eadbmgr" catalina_options="/gfs2/eadb/jakarta-tomcat-5.0.30/conf" config_file="eadb_tomcat" name="/gfs2/eadb/jakarta-tomcat-5.0.30/conf/Catalina" shutdown_wait="" tomcat_user=""/>
<script file="/etc/init.d/tomcat5" name="tomcat"/>
<ip address="10.30.7.185" monitor_link="1"/>
<script file="/etc/init.d/my_sample" name="my_sample"/>
</resources>
<service autostart="1" domain="eadb_fd" name="vsftpd">
<script ref="vsftpd"/>
<ip ref="10.82.114.4"/>
<ip ref="10.30.7.185"/>
</service>
<service autostart="1" domain="eadb_service" exclusive="1" name="eadb">
<ip ref="10.82.114.4"/>
<clusterfs ref="gfs_db_space"/>
<clusterfs ref="gfs_eadb_space"/>
<mysql ref="mysql_eadb"/>
<script ref="tomcat_eadb"/>
<ip ref="10.30.7.185"/>
</service>
<service autostart="1" domain="eadb_service" name="tc">
<ip ref="10.82.114.4"/>
<script ref="tomcat"/>
</service>
<service autostart="1" exclusive="1" name="mysql">
<ip ref="10.82.114.4"/>
<ip ref="10.30.7.185"/>
<clusterfs ref="gfs_db_space"/>
<mysql ref="mysql_eadb"/>
</service>
<service autostart="1" domain="eadb_fd" name="sample" recovery="restart">
<ip ref="10.82.114.4"/>
<script ref="my_sample"/>
</service>
</rm>
</cluster>
＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝
補充2：
my_sample service腳本調用的 perl script, eadb_sm.pl

#!/usr/bin/perl
use Socket;

# make the socket
socket(SERVER, PF_INET, SOCK_STREAM, getprotobyname('tcp'));
# so we can restart our server quickly
setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1);
# build up my socket address
$ip="10.82.114.4";
$port=9999;
#$my_addr = sockaddr_in($port, INADDR_ANY);
$my_addr = sockaddr_in($port, inet_aton($ip));
bind(SERVER, $my_addr)
or die "Couldn't bind to port $server_port : $!\n";
# establish a queue for incoming connections
listen(SERVER, SOMAXCONN)
or die "Couldn't listen on port $server_port : $!\n";
# accept and process connections
while (accept(CLIENT, SERVER)) {
# do something with CLIENT
while(1) {
print stderr "Someone connected\n";
$bs = sysread(CLIENT, $buff, 2048);
if ($bs) {
($l, $s) = unpack("na*", $buff);
print stderr "Received: $l bytes, content:\n $s\n";

$tmp = "Received!!!";
$tosend = pack("na*", length($tmp), $tmp);
$bs = syswrite(CLIENT, $tosend);
if ($bs) {
print stderr "Sent: $tmp\n";
print stderr "=======================================\n";
} else {
print stderr "failed to send, exit";
}
last;
} else {
#print stderr "Nothing comes sleep for a while\n";
#sleep(1);
next;
}
}
close(CLIENT);
print stderr "Disconnected\n";
}
close(SERVER);
＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝
補充3：
/var/log/messages信息
起vsftpd之前有註冊floating ip，起另外一個沒有
=====================
Nov 20 17:26:52 eadb01 clurgmgrd: <notice> Starting disabled service service:vsftpd
Nov 20 17:26:52 eadb01 avahi-daemon: Registering new address record for 10.82.114.4 on bond0.
Nov 20 17:26:53 eadb01 avahi-daemon: Registering new address record for 10.30.7.185 on bond1.
Nov 20 17:26:55 eadb01 clurgmgrd: <notice> Service service:vsftpd started
Nov 20 17:26:57 eadb01 snmpd: Connection from UDP: :33631
Nov 20 17:27:28 eadb01 last message repeated 3 times
Nov 20 17:27:45 eadb01 last message repeated 2 times
Nov 20 17:28:00 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:28:00 eadb01 snmpd: Received SNMP packet(s) from UDP: :33632
Nov 20 17:28:10 eadb01 clurgmgrd: <notice> Stopping service service:vsftpd
Nov 20 17:28:11 eadb01 avahi-daemon: Withdrawing address record for 10.30.7.185 on bond1.
Nov 20 17:28:15 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:28:16 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:28:21 eadb01 avahi-daemon: Withdrawing address record for 10.82.114.4 on bond0.
Nov 20 17:28:31 eadb01 clurgmgrd: <notice> Service service:vsftpd is disabled
Nov 20 17:28:31 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:28:35 eadb01 clurgmgrd: <notice> Starting disabled service service:sample
Nov 20 17:28:35 eadb01 clurgmgrd: <notice> Service service:sample started
Nov 20 17:28:43 eadb01 clurgmgrd: : <err> script:my_sample: status of /etc/init.d/my_sample failed (returned 2)
Nov 20 17:28:43 eadb01 clurgmgrd: <notice> status on script "my_sample" returned 1 (generic error)
Nov 20 17:28:43 eadb01 clurgmgrd: <notice> Stopping service service:sample
Nov 20 17:28:43 eadb01 clurgmgrd: <notice> Service service:sample is recovering
Nov 20 17:28:44 eadb01 clurgmgrd: <notice> Service service:sample is now running on member 2
Nov 20 17:28:46 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:29:17 eadb01 last message repeated 3 times
Nov 20 17:29:33 eadb01 last message repeated 2 times
Nov 20 17:29:43 eadb01 clurgmgrd: <notice> Stopping service service:sample
Nov 20 17:29:43 eadb01 clurgmgrd: <notice> Service service:sample is disabled
Nov 20 17:29:48 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:29:49 eadb01 snmpd: Connection from UDP: :33632
Nov 20 17:30:04 eadb01 snmpd: Connection from UDP: :33634
Nov 20 17:30:04 eadb01 snmpd: Received SNMP packet(s) from UDP: :33634
Nov 20 17:30:05 eadb01 clurgmgrd: <notice> Starting disabled service service:sample
Nov 20 17:30:05 eadb01 clurgmgrd: <notice> Service service:sample started
Nov 20 17:30:13 eadb01 clurgmgrd: : <err> script:my_sample: status of /etc/init.d/my_sample failed (returned 2)
Nov 20 17:30:13 eadb01 clurgmgrd: <notice> status on script "my_sample" returned 1 (generic error)
Nov 20 17:30:13 eadb01 clurgmgrd: <notice> Stopping service service:sample
Nov 20 17:30:13 eadb01 clurgmgrd: <notice> Service service:sample is recovering
Nov 20 17:30:14 eadb01 clurgmgrd: <notice> Service service:sample is now running on member 2
Nov 20 17:30:19 eadb01 snmpd: Connection from UDP: :33634
Nov 20 17:30:20 eadb01 snmpd: Connection from UDP: :33634
Nov 20 17:30:27 eadb01 clurgmgrd: <notice> Stopping service service:sample
Nov 20 17:30:27 eadb01 clurgmgrd: <notice> Service service:sample is stopped
Nov 20 17:30:27 eadb01 clurgmgrd: <notice> Starting stopped service service:sample
Nov 20 17:30:27 eadb01 clurgmgrd: <notice> Service service:sample started
Nov 20 17:30:30 eadb01 clurgmgrd: <notice> Stopping service service:sample
Nov 20 17:30:30 eadb01 clurgmgrd: <notice> Service service:sample is disabled

《解決方案》

回復 #1 PinkOrient 的帖子

從你的實現目的上分析要考慮資源組
status on script "my_sample" returned 1 (generic error)
這個說明腳步本身存在問題，啟動失敗
同時你要主機fw的規則是否禁止了數據包的傳送

《解決方案》

回復 #3 kns1024wh 的帖子

多謝回復，請再浪費點時間看看我的/var/log/messages.

腳本在rhcs外面用service my_sample start/stop/status 測試是OK的，當然這時候bind的IP指定的是一個static的IP。
如果我在rhcs裡面指定bind IP到0.0.0.0的話，也是跑的沒問題的，但是這個時候通過floating IP connect不到我的sample服務。

你這裡指出來的出錯，是因為我讓服務bind到floating IP上，而rhcs在起我的script之前，卻沒有通過avahi-daemon去register floating IP到interface上，所以進程被start之後，bind 失敗，自然是die掉了，接著rhcs通過status去monitor進程，自然是看到進程不到，就報出這個錯了。

我想問的就是，為什麼只有vsftpd這個service的時候，rhcs回去通過avahi-daemon吧floating IP做register，而其他的service卻沒有做這一步，造成bind floating IP失敗？

《解決方案》

回復 #3 kns1024wh 的帖子

《解決方案》

有人幫忙解釋一下rhcs的float IP的原理以及如何配置嗎？

未解決，頂一下不要沉了。

《解決方案》

再次頂起。。。。。

昨天用兩台vmware做試驗，情況依然如此，也是只有vsftpd的那個service會有register floating IP的動作，
後來想了個辦法，在vsftpd的service的基礎上，加上我要的服務的script，比如httpd，居然可以了，
非常奇怪，難道rhcs一定要帶著vsftpd才能玩？百思不得其解。

《解決方案》

這次搞定了，有同事幫我找到showrun之前發的經驗，然後經過試驗，解決了。

總結的經驗就是，加到Service的resource應該是分層次的，之前的誤區在於，把

IP 和 scripts這些resource當做相同層次的，都用「Add a Shared Resource to this service",

正確的做法應該是先用這個按鈕添加script，然後選中scripts，點擊」Attach a Shared/Private Resource to the selection",

把script要用到的其他resource Attach到腳本上面，呈樹狀關係。

這樣rhcs startup這個service的時候，會從樹葉開始把resource分配好或者運行好

Nov 21 08:05:05 node0 clurgmgrd: <notice> Starting disabled service service:httpd
Nov 21 08:05:10 node0 avahi-daemon: Registering new address record for 192.168.0.80 on eth0.
Nov 21 08:05:12 node0 avahi-daemon: Registering new address record for 192.168.0.15 on eth0.
Nov 21 08:05:13 node0 clurgmgrd: <notice> Service service:httpd started
Nov 21 08:05:32 node0 clurgmgrd: <notice> Stopping service service:httpd
Nov 21 08:05:32 node0 avahi-daemon: Withdrawing address record for 192.168.0.15 on eth0.
Nov 21 08:05:43 node0 avahi-daemon: Withdrawing address record for 192.168.0.80 on eth0.
Nov 21 08:05:56 node0 clurgmgrd: <notice> Service service:httpd is disabled

我的理解和showrun的相反，他的例子是把floating IP作為第一個加入service的resource，然後在上面attach上
gfs和script，但是我把script attache在floating IP下的時候發現rhcs並沒有register 這個IP，反之則OK了。

[ 本帖最後由 PinkOrient 於 2008-11-25 23:30 編輯 ]

《解決方案》

附上 cluster.conf 文件和測試結果
配置了4個service做試驗
httpd          可以正常工作，rhcs register了2個floating IP，然後運行script
httpd2          可以正常工作
httpd3          不能正常工作，沒有register floating IP
httpd4          不能正常工作，沒有register floating IP
=========================

<?xml version="1.0"?>
<cluster alias="new_cluster" config_version="40" name="new_cluster">
      <fence_daemon post_fail_delay="0" post_join_delay="3"/>
      <clusternodes>
            <clusternode name="node0" nodeid="1" votes="1">
                     <fence/>
            </clusternode>
            <clusternode name="node1" nodeid="2" votes="1">
                     <fence/>
            </clusternode>
      </clusternodes>
      <cman expected_votes="1" two_node="1"/>
      <fencedevices>
            <fencedevice agent="fence_manual" name="mf"/>
      </fencedevices>
      <rm>
            <failoverdomains>
                     <failoverdomain name="ftpd" ordered="1" restricted="0">
                              <failoverdomainnode name="node0" priority="1"/>
                              <failoverdomainnode name="node1" priority="2"/>
                     </failoverdomain>
            </failoverdomains>
            <resources>
                     <ip address="192.168.0.15" monitor_link="1"/>
                     <script file="/etc/init.d/httpd" name="httpd"/>
                     <ip address="192.168.0.80" monitor_link="1"/>
            </resources>
            <service autostart="1" domain="ftpd" exclusive="1" name="httpd" recovery="relocate">
                     <script ref="httpd">
                              <ip ref="192.168.0.80"/>
                              <ip ref="192.168.0.15"/>
                     </script>
            </service>
            <service autostart="1" domain="ftpd" name="httpd2" recovery="restart">
                     <script ref="httpd">
                              <ip address="192.168.0.200" monitor_link="1"/>
                     </script>
            </service>
            <service autostart="1" domain="ftpd" name="httpd3" recovery="restart">
                     <ip ref="192.168.0.80"/>
                     <script ref="httpd"/>
            </service>
            <service autostart="1" name="httpd4">
                     <ip ref="192.168.0.15">
                              <script ref="httpd"/>
                     </ip>
            </service>
      </rm>
</cluster>

《解決方案》

你的script沒有status的輸出

Tags:

[火星人 ] rhcs floating IP 的問題已經有850次圍觀

本文地址：http://coctec.com/docs/service/show-post-5370.html

rhcs floating IP 的問題