使用Nagios監控伺服器,Python處理故障通知(郵件與簡訊)
今天終於調試完成利用Nagios監控系統運行狀況,現把安裝配置的文檔放上來,參考了網上一些安裝的方法,在此表示感謝。
處理故障通知(郵件與簡訊) 部分的代碼,由於Python語言對縮進的要求非常嚴格,所以大家如果copy的時候一定要注意。文中所涉及到的perl與Python的代碼放在附件之中
簡訊通知可以使用一般的GSM/GPRS MODEM即可
## 注意,所有的與nagios需要處理的部分,均需要nagios許可權才能正常運行,包括簡訊部分需要調用的串口/dev/ttyS0等都要增加nagios許可權,在這個上面我調試了好久 :(
OS: CentOS-4
Nagios-2.9
1)Nagios installation
Remove old nagios installation:
rpm -qa|grep nagios; rpm -e ...
## create group & user for nagios
groupadd -g 50001 nagios
useradd -u 50001 -g 50001 -d /home/nagios -s /sbin/nologin nagios
# Require:
gd && gd-devel
libpng && libpng-devel
libjpeg && libjpeg-devel
#####download source package of Python
wget http://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-2.9.tar.gz
tar zxvf nagios-2.9.tar.gz
cd nagios-2.9
./configure --prefix=/opt/nagios \
--with-cgiurl=/nagios/cgi-bin \
--with-htmurl=/nagios \
--with-nagios-user=nagios \
--with-nagios-group=nagios
make all
make install
make install-init
make install-commandmode
make install-config
chown -R nagios.nagios /opt/nagios
2) nagios-plugins installation
#####download source package of nagios-plugins
wget http://downloads.sourceforge.net/nagiosplug/nagios-plugins-1.4.9.tar.gz?modtime=1180952247&big_mirror=0
tar zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/opt/nagios \
--with-cgiurl=nagios/cgi-bin \
--with-mysql=/opt/mysql/bin/mysql_config \
--enable-ssl \
--enable-command-args
make
make install
3) configure
cd /opt/nagios
rsync -av etc etc-orig
cd /opt/nagios/etc
#將.cfg-sample的文件複製為.cfg文件
4) configure the web server
## this example base the nginx
----------------------------------------------------------
server {
listen 192.168.0.220;
server_name 192.168.0.220;
access_log /var/log/nginx/nagios/nagios_access_log combined;
error_log /var/log/nginx/nagios/nagios_error_log notice;
location /nagios {
alias /opt/nagios/share;
auth_basic "Restricted";
auth_basic_user_file /opt/nagios/etc/htpasswd.user;
}
location ~ \.cgi$ {
root /opt/nagios/sbin;
rewrite ^/nagios/cgi-bin/(.*)\.cgi /$1.cgi break;
fastcgi_index index.cgi;
auth_basic "Restricted";
auth_basic_user_file /opt/nagios/etc/htpasswd.user;
fastcgi_pass unix:/var/run/fcgi/nagios.sock;
fastcgi_param SCRIPT_FILENAME /opt/nagios/sbin$fastcgi_script_name;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param REMOTE_PORT $remote_port;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param REQUEST_URI $request_uri;
#fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_param SERVER_ADDR $server_addr;
fastcgi_param SERVER_NAME $server_name;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param SERVER_SOFTWARE nginx;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param GATEWAY_INTERFACE CGI/1.1;
fastcgi_param HTTP_ACCEPT_ENCODING gzip,deflate;
fastcgi_param HTTP_ACCEPT_LANGUAGE zh-cn;
#include conf/fastcgi_params;
}
}
----------------------------------------------------------
## If you want nginx support the cgi,you can use below script "perl-cgi.pl"
由於字數限制,所使用到的perl-cgi.pl源代碼只能放到附件中,望見諒
----------------------------------------------------------
## start up script,get the /var/run/fcgi/nagios.sock
## Note: the nagios.sock must be web.web
----------------------------------------------------------
#!/bin/bash
## start_nginx_cgi.sh: start nginx cgi mode
## ljzhou, 2007.08.20
PERL="/usr/bin/perl"
NGINX_CGI_FILE="/opt/nagios/bin/perl-cgi.pl"
#bg_num=`jobs -l |grep "NGINX_CGI_FILE"`
#PID=`ps aux|grep "perl-cgi"|cut -c10-14|xargs kill -9`
PID=`ps aux|grep 'perl-cgi'|cut -c10-14|sed -n "1P"`
echo $PID
sockfiles="/var/run/fcgi/nagios.sock"
kill -9 $PID
$PERL $NGINX_CGI_FILE &
sleep 3
`chown web.web $sockfiles`
# EOF: start_nginx_cgi.sh
----------------------------------------------------------
5)
## 根據具體使用情況,將配置文件的結構做以下規劃,為了方便將來的維護和管理:
## 配置文件結構如下:
etc/ |-- cgi.cfg
|-- commands.cfg
|-- nagios.cfg
|-- resource.cfg
(以上為nagios系統主配置文件)
etc/servers |-- contacts.cfg 管理人員和管理人員組的的默認初始化設定文件
|-- hostgroups.cfg 伺服器組的默認初始化設定文件
|-- hosts.cfg 伺服器的默認初始化設定文件
|-- services.cfg 監控服務的默認初始化設定文件
|-- servicegroups.cfg 監控服務組的默認初始化設定文件
|-- timeperiod.cfg 時間周期默認初始化設定文件
(以上為監控服務相關的配置文件,都是由原localhost.cfg文件中拆分出來的,這樣方面理解和管理)
etc/servers/abc.com |-- 192.168.0.220.cfg
|-- 192.168.0.233.cfg
(在etc/servers/下建立監控的域名目錄,區分各個被監控的域名,每台監控的主機一個單獨的配置文件,包含hosts和services的內容)
1) ## 設置 cgi.cfg :
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
## 以上設定nagiosadmin為nagios最高許可權,有權查看所有hosts和services的狀態.
2) ## 設置nagios.cfg :
#cfg_file=/opt/nagios/etc/localhost.cfg
#cfg_file=/opt/nagios/etc/contactgroups.cfg
#cfg_file=/opt/nagios/etc/contacts.cfg
#cfg_file=/opt/nagios/etc/dependencies.cfg
#cfg_file=/opt/nagios/etc/escalations.cfg
#cfg_file=/opt/nagios/etc/hostgroups.cfg
#cfg_file=/opt/nagios/etc/hosts.cfg
#cfg_file=/opt/nagios/etc/services.cfg
#cfg_file=/opt/nagios/etc/timeperiods.cfg
## 將以上內容註釋掉
cfg_dir=/opt/nagios/etc/servers
## 開啟該參數,表示將/opt/nagios/etc/servers下的所有.cfg配置文件都載入到nagios.
3) ## 配置command.cfg,支持email,sms通知方式:
----------------------------------------------------------
# 'host-notify-by-email' command definition
define command{
command_name host-notify-by-email
command_line /opt/nagios/bin/mail_send.sh "Host $HOSTSTATE$ alert for $HOSTNAME$!" "***** Nagios *****\n\nNotification Type:$NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress:$HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" $CONTACTEMAIL$
}
# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /opt/nagios/bin/mail_send.sh "**$NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" "***** Nagios *****\n\nNotification Type:$NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost:$HOSTALIAS$\nAddress: $HOSTADDRESS$\nState:$SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditionalInfo:\n\n$SERVICEOUTPUT$" $CONTACTEMAIL$
}
# 'host-notify-by-sms' command definition
define command{
command_name host-notify-by-sms
command_line /opt/nagios/bin/sms_send.sh "Host $HOSTSTATE$ alert for $HOSTNAME$! on '$TIME$' " $CONTACTPAGER$
}
# 'service notify by sms' command definition
define command{
command_name notify-by-sms
command_line /opt/nagios/bin/sms_send.sh "$HOSTADDRESS$ $SERVICEDESC$ is $SERVICESTATE$ on $TIME$" $CONTACTPAGER$
}
----------------------------------------------------------
## 附 mail_send.sh and sms_send.sh,由於字數限制,所使用到的mai_send.py與sms_send.py源代碼只能放到附件中,望見諒
(1) mail_send.sh
----------------------------------------------------------
#!/bin/bash
cd /opt/nagios/bin
if [ $# -ne 4 ]; then
Subject="$1"
AlertInfo="$2"
Touser="$3"
/usr/bin/python2 /opt/nagios/bin/mail_send.py "$Subject" "$AlertInfo" "$Touser"
fi
# EOF :mail_send.sh
----------------------------------------------------------
(2) sms_send.sh
----------------------------------------------------------
#!/bin/bash
cd /opt/nagios/bin
if [ $# -ne 3 ]; then
msg="$1"
pcode="$2"
#echo $msg
#echo $pcode
/usr/bin/python2 /opt/nagios/bin/sms_send.py "$msg" "$pcode"
fi
# EOF: sms_send.sh
----------------------------------------------------------
3) ## 各配置文件需要注意的地方
3.1 timeperiod.cfg
----------------------------------------------------------
define timeperiod{
timeperiod_name 24x7
......
}
----------------------------------------------------------
3.2 contacts.cfg
----------------------------------------------------------
define contact{
contact_name nagiosadmin
alias nagiosadmin
......
service_notification_commands notify-by-email,notify-by-sms
host_notification_commands host-notify-by-email,host-notify-by-sms
email
[email protected],
[email protected],
[email protected] pager 8613688888888,8613888888888
}
define contactgroup{
contactgroup_name nagios
alias nagios
members nagiosadmin
}
## 定義管理員成員和管理員組成員,以及管理員的聯繫方式mail或sms.要注意這裡的管理員contact_name必須與htpasswd.user中設定的帳號一致.
## 在這裡可以設置多個管理員組,將不同的管理員分組,在hosts和services中引用后,可以達到區分各自監控的伺服器的目的.
----------------------------------------------------------
3.3 hosts.cfg
----------------------------------------------------------
define host{
name generic-host
......
}
## 定義默認的hosts公共屬性.在每台機子的hosts定義中引用
----------------------------------------------------------
3.4 hostgroups.cfg
----------------------------------------------------------
define hostgroup{
hostgroup_name nginx
alias nginx
members 192.168.0.220,192.168.0.233
}
define hostgroup{
hostgroup_name apache
alias apache
members 192.168.0.220,192.168.0.233
}
## 定義hosts所屬的分組,方便監控時的觀察.hostgroup_name定義分組名稱,alias為別名,members定義成員名稱,內容為每台hosts配置文件中定義的host_name內容
----------------------------------------------------------
3.5 services.cfg
----------------------------------------------------------
define service{
name generic-service
......
}
## 定義默認的services公共屬性,在每個service定義中引用
----------------------------------------------------------
4) etc/servers/abc.com目錄下192.168.0.220.cfg
----------------------------------------------------------
define host {
use generic-host ;引用的是hosts.cfg文件中定義的name.
host_name 192.168.0.220 ;定義所監控的伺服器名稱.
address 192.168.0.220 ;定義所監控的伺服器的IP地址.
check_command check-host-alive
max_check_attempts 10
notification_interval 480
notification_period 24x7
notification_options d,u,r
contact_groups nagios
}
define service{
use generic-service ;引用的是services.cfg文件中定義的name.
host_name 192.168.0.220 ;引用上面host中定義的host_name.
service_description PING
is_volatile 0
check_period 24x7 ;引用timeperiod.cfg中定義的timeperiod_name.
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups nagios ;引用contacts.cfg中定義的contactgroup_name.
notification_options w,u,c,r
notification_interval 240
notification_period 24x7
check_command check_ping!100.0,20%!500.0,60% ;使用commands.cfg中定義的監測命令.
}
## 該配置文件為最終監控主機的配置文件,包含被監控主機192.168.0.220的定義和需要監控的服務.
----------------------------------------------------------
## 如果有多台主機需要被監控,配置文件類似
## 至此基本的nagios各項配置已經設置完成.
/opt/nagios/bin/nagios -v /opt/nagios/etc/nagios.cfg
## 執行該命令檢查所有配置文件是否正確.如果全部正確顯示如下:
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
## 接著就可以啟動nagios監控服務:
service nagios start
## 檢查進程
ps aux|grep nagios
6) ndoutils (for mysql)
1) Download and install ndoutils
wget http://nchc.dl.sourceforge.net/sourceforge/nagios/ndoutils-1.4b4.tar.gz
tar zxvf ndoutils-1.4b4.tar.gz
cd ndoutils-1.4b4
./configure –enable-mysql –with-mysql-lib=/opt/mysql/lib/mysql –with-mysql-inc=/opt/mysql/include
make
2) install the mysql db
(1). 創建資料庫(e.g. 'nagios')
(2). 賦予該資料庫的用戶名與許可權,比如SELECT, INSERT, UPDATE, DELETE
(3). 生成數據表
cd db
./installdb –u user –p password –h localhost –d database
3) INSTALLING THE NDOMOD BROKER MODULE
cp src/ndomod-2x.o /opt/nagios/bin/ndomod.o
cp config/ndomod.cfg /opt/nagios/etc/
## edit the nagios.cfg
broker_module=/opt/nagios/bin/ndomod.o config_file=/opt/nagios/etc/ndomod.cfg
## make sure your nagios.cfg file
event_broker_options=-1
4) INSTALLING THE NDO2DB DAEMON
cp src/ndo2db-2x /opt/nagios/bin/ndo2db
cp config/ndo2db.cfg /opt/nagios/etc
### Start the daemon running! An init script will be developed soon...
/opt/nagios/bin/ndo2db -c /opt/nagios/etc/ndo2db.cfg
### 修改ndo2db.cfg文件中有關資料庫用戶名等的配置
7) The nrpe for client:
1. config the server ,suppose the ip: 192.168.0.220
## Edit commands.cfg
## Add below:
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line /opt/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
## nagios.cfg set
check_external_commands=1
Service nagios reload
2. install and config the client ,suppose the ip: 192.168.0.233
## add user and group :nagios
2.1) install and config the nrpe
cd /appstore/pkg/nagios
tar xzvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure --enable-ssl --enable-command-args.
.
*** Configuration summary for nrpe 2.8.1 08-16-2007 ***:
General Options:
-------------------------
NRPE port: 5666
NRPE user: nagios
NRPE group: nagios
Review the options above for accuracy. If they look okay,
type 'make all' to compile the NRPE daemon and client.
make all
cp sample-config/nrpe.cfg /etc/
cp src/nrpe /usr/sbin/
2.2) install and config the nagios-plugins
tar-xzvf nagios-plugins-1.4.9.tar.gz
./configure –prefix=/opt/nagios
make all
make install
cd /opt/nagios
chown –R nagios /opt/nagios
## config nrpe
## edit /etc/nrpe.cfg:
dont_blame_nrpe=1
command=/opt/nagios/libexec/check_http -H 192.168.0.234 -p 8082
command=/opt/nagios/libexec/check_http -H 192.168.0.233 -p 8888
command=/opt/nagios/libexec/check_http -H 192.168.0.234
command=/opt/nagios/libexec/check_users -w 5 -c 10
command=/opt/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /
command=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /var
command=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /appstore
command=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /webapp
command=/opt/nagios/libexec/check_disk -w 20% -c 10% -p /mysql
command=/opt/nagios/libexec/check_procs -w 5 -c 10 -s Z
command=/opt/nagios/libexec/check_procs -w 150 -c 200
command=/opt/nagios/libexec/check_mysql -H 192.168.0.233 -u yourname -p yourpasswd
2.3) config nrpe to xinetd
vi /etc/services
nrpe 5666/tcp
cd /etc/xinetd.d
vi nrpe
## add below
# default: on
# description: NRPE
service nrpe
{
disable = no
flags = REUSE
socket_type = stream
wait = no
user = nagios
server = /usr/sbin/nrpe
server_args = -c /etc/nrpe.cfg --inetd
log_on_failure += USERID
only_from = 192.168.0.220 #nagios server's IP
}
## start nrpe
/etc/rc.d/init.d/xinetd restart
netstat –nlp|grep 5666
3. server 192.168.0.220 install the script check_nrpe:
cd /appstore/pkg/nagios
tar xzvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure --enable-ssl --enable-command-args.
make all
cp sample-config/nrpe.cfg /etc/
cp src/nrpe /usr/sbin/
cp src/check_nrpe /opt/nagios/libexec
3.1) ## edit /etc/nrpe.cfg
allowed_hosts=127.0.0.1,192.168.0.233
3.2) ## now you can add the 192.168.0.233.cfg under /opt/nagios/etc/servers/abc.com/ ,like below:
define host {
use generic-host ;引用的是hosts.cfg文件中定義的name.
host_name 192.168.0.233 ;定義所監控的伺服器名稱.
address 192.168.0.233 ;定義所監控的伺服器的IP地址.
check_command check-host-alive
max_check_attempts 10
notification_interval 480
notification_period 24x7
notification_options d,u,r
contact_groups nagios
}
define service{
use generic-service
host_name 192.168.0.233
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
contact_groups nagios
notification_options w,u,c,r
notification_interval 240
notification_period 24x7
check_command check_nrpe!check_http
}
......
## 注意,所有的與nagios需要處理的部分,均需要nagios許可權才能正常運行,包括簡訊部分需要調用的串口/dev/ttyS0都要增加nagios許可權
注意:技術貼要禁用smilies
[ 本帖最後由 llzqq 於 2007-8-23 08:44 編輯 ]
《解決方案》
感謝樓主分享的精神!!
《解決方案》
好深奧的樣子 不知道有沒有更簡單點的監控辦法
《解決方案》
可能是因為我把所有的步驟都貼出來了,所以看起來很長,其實是很簡單的:lol:
《解決方案》
是不是我寫的太亂了?:(
下次要言簡意賅了:lol:
《解決方案》
很好 我一直在找類似的帖子 是自己的工作筆記吧
能否加你msn 有些問題需要討教一下
《解決方案》
謝謝llzqq幫我編輯了一下,下次發貼記得要禁用smilies
:lol:
gunguymadman
我回你信息了,偶的msn:
[email protected] 《解決方案》
aa
aasssdfadadasdasdsaasd
《解決方案》
感謝樓主的好貼
想請問,你使用的手機modem是什麼品牌型號的,介面是USB/還是串口
發手機簡訊只需要滿足這幾個條件嗎?
1. gsm modem
2.串口線/usb線
3.購買移動或聯通的sim卡
4.連接modem到伺服器並對/dev/ttyS0設置nagios訪問許可權
5.你做的那個sms python發送程序
謝謝!
《解決方案》
你好,我的伺服器用crontab定時執行一些php文件,如何利用nagios監控這些crontab進程呢,這些進程在我的伺服器上非常重要,絕對不能down掉,但是這些進程又是每隔10分鐘才執行一次,所十分頭疼,不知道該如何監控!謝謝