最近公司需要上線監控系統,而且需要部署很多的監控,環境與設備也大都不一樣,所以我就寫了一份安裝監控的技術文檔,讓我公司的運維來根據我的文檔來進行監控的部署.
我的系統是redhat5.4,關閉了iptables與selinux.
1、安裝yum(如果本機有了yum,則可以不安裝,跳過此步到第3步)
- [root@localhost yum.repos.d]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.i386.rpm
- root@localhost yum.repos.d]# wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt
- [root@localhost yum.repos.d]# rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.i386.rpm
- root@localhost yum.repos.d]# rpm --import RPM-GPG-KEY.dag.txt
- [root@localhost yum.repos.d]# yum install yum-fastestmirror yum-presto
2、安裝apache(如果本機默認安裝了,那麼可以跳過這一步,如果沒有安裝,則可以yum安裝)
- [root@localhost ~]# yum -y install httpd
安裝nagios需要一些基礎支持套件
- [root@localhost etc]# yum -y install gd gd-devel glibc glibc-common gcc
3、配置apache來支持nagios
(1)建立nagios用戶
- [root@localhost ~]# useradd nagios
- [root@localhost etc]# /usr/sbin/groupadd nagcmd 添加nagcmd用戶組,用以通過web頁面提交外部控制命令
- [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd nagios將nagios用戶加入nagcmd組
- [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd apache將apache用戶加入nagcmd組
- [root@localhost etc]# /usr/sbin/usermod -a -G apache nagios將nagios用戶加入apache組
- [root@localhost etc]# /usr/sbin/usermod -a -G nagios apache將apache用戶加入nagios組
(2)修改apache運行用戶和組.默認是daemon,需要把它改成nagios.這樣它才能有許可權訪問我們安裝的nagios目錄,執行相關的cgi命令,如通過瀏覽器界面關閉nagios、停止某個故障對象發送報警信息等.(此步可以省略,我在部署nagios的時候,沒有改變apache的用戶與組,也沒有出現問題)
(3)添加nagios訪問目錄(nagios 的安裝路徑/usr/local/nagios),同時使用http用戶驗證.把下面的內容追加到httpd.conf文件的末尾:
- ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
- <Directory "/usr/local/nagios/sbin">
- Options ExecCGI
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- </Directory>
- Alias /nagios /usr/local/nagios/share
- <Directory "/usr/local/nagios/share"
>- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- </Directory>
4、安裝nagios
- [root@localhost tmp]# tar zxvf nagios-3.3.1.tar.gz
- [root@localhost nagios]# ./configure --prefix=/usr/local/nagios -with-command-group=nagcmd
- [root@localhost nagios]# make all
- [root@localhost nagios]# make install
- [root@localhost nagios]# make install-init
- [root@localhost nagios]# make install-config
- [root@localhost nagios]# make install-commandmode
- [root@localhost nagios]# make install-webconf
5、安裝nagios插件nagios-plugin
6、配置nagios
- [root@localhost nagios]#cd /tmp
- [root@localhost tmp]# tar zxvf nagios-plugins-1.4.15.tar.gz
- [root@localhost nagios-plugins-1.4.15]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
- [root@localhost nagios-plugins-1.4.15]# make
- [root@localhost nagios-plugins-1.4.15]# make install
在這裡指定的用戶”nagios”可以通過瀏覽器操縱nagios服務的關閉、重啟等各種操作 [root@localhost etc]# sed -i 's/nagiosadmin/nagios/g' cgi.cfg ##或者用此命令修改
- [root@localhost nagios-plugins-1.4.15]# cd /usr/local/
- [root@localhost local]# chown -R nagios:nagios nagios/
- [root@localhost local]# chown -R nagios:nagios nagios/*
- [root@localhost local]# cd nagios/etc/
- [root@localhost etc]# vim nagios.cfg ###修改nagios.cfg配置文件,內容如下:
- cfg_file=/usr/local/nagios/etc/hosts.cfg #增加主機配置文件
- cfg_file=/usr/local/nagios/etc/hostgroups.cfg #增加主機組配置文件
- cfg_file=/usr/local/nagios/etc/contacts.cfg #增加聯繫人配置文件
- cfg_file=/usr/local/nagios/etc/contactgroups.cfg #增加聯繫人配置文件
- cfg_file=/usr/local/nagios/etc/services.cfg ##增加服務配置文件
- cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #時間周期配置文件
- cfg_file=/usr/local/nagios/etc/objects/commands.cfg #命令配置文件
- 修改cgi.cfg配置文件,修改內容如下:
- [root@localhost etc]# vim cgi.cfg
- #如有多個用戶,中間用逗號隔開
- authorized_for_system_information=nagios
- authorized_for_configuration_information
= nagios- authorized_for_system_commands= nagios
- authorized_for_all_services= nagios
- authorized_for_all_hosts= nagios
- authorized_for_all_service_commands= nagios
- authorized_for_all_host_commands= nagios
7、安裝nrpe
- (1)、配置主機文件hosts.cfg
- define host{
- host_name web1## 主機名為web1,可以在hostname里查看
- alias Nagios Server ##主機別名為Server
- address 192.168.10.223##主機的ip地址
- check_command check-host-alive ##檢查使用的命令,需要在命令定
- 義文件定義,默認是定義好的.
- check_interval 5 ##檢測的時間間隔
- retry_interval 1 ##檢測失敗后重試的時間間隔
- max_check_attempts 5 ##最大重試次數
- check_period 24x7 ##檢測的時段
- process_perf_data 0
- retain_nonstatus_information 0
- contact_groups admin ###聯繫組,就是設置郵件報警的組
- notification_interval 30 ##通知間隔
- notification_period 24x7 ##通知周期設置
- notification_options d,u,r ####定義什麼狀態時報警,定義報警狀態中的w表示warning,u表示unknown,c表示critial,r表示recovery(即恢復后是否發送通知);報警選項一般生產環境下設置w,c,r即可
- }
- (2)、配置主機組文件hostgroups.cfg
- define hostgroup {
- hostgroup_name Nagios-Example ##定義主機組的名字
- alias Nagios Example ##定義主機組的別名
- members web1 ##主機組的成員,跟hosts.cfg里的hostname一致,否則出錯
- }
- (3)、配置聯繫人文件contacts.cfg
- define contact{
- contact_name nagiosadmin #聯繫名稱
- alias Nagios Admin #聯繫別名
- service_notification_period 24x7 #服務監控時間為任何時候
- host_notification_period 24x7 #主機監控時間為任何時候
- service_notification_options w,u,c,r #服務監控的狀態
- host_notification_options d,u,r #主機監控的狀態
- service_notification_commands notify-service-by-email #郵件報警
- host_notification_commands notify-host-by-email #同上
- email denglei@ctfo.com #接收報警的郵箱
- }
- (4)、配置聯繫組文件contactgroups.cfg
- define contactgroup{
- contactgroup_name admin #聯繫組的名字
- alias Nagios Administrators #聯繫組的別名
- members nagiosadmin #聯繫組裡的成員,與contacts.cfg里的contact_name 保存一致
- }
- (5)、配置服務文件 services.cfg
- define service {
- host_name web1 #與hosts.cfg里的host-name保持一致
- service_description check-host-alive #服務描述
- check_period 24x7 #服務描述
- max_check_attempts 4 #最大檢測次數
- normal_check_interval 3 #檢測的時間間隔
- retry_check_interval 2 #重複檢測的時間間隔
- contact_groups admin #發生故障通知的聯繫人組
notification_interval 10 #通知間隔- notification_period 24x7 #通知的時間段
- notification_options w,u,c,r #定義什麼狀態時報警,定義報警狀態中
- check_command check-host-alive #檢測的命令
- }
- define service {
- host_name web1
- service_description PING
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_ping!100.0,20%!500.0,60%
- }
- define service {
- host_name web1
- service_description Root Partition
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_local_disk!20%!10%!/
- }
- define service {
- host_name web1
- service_description Current Users
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_local_users!20!50
- }
- define service {
- host_name web1
- service_description Total Processes
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_local_procs!250!400!RSZDT
- }
- define service {
- host_name web1
- service_description Current Load
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
- }
- define service {
- host_name web1
- service_description Swap Usage
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notification_options w,u,c,r
- check_command check_local_swap!20!10
- }
- define service {
- host_name web1
- service_description SSH
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notifications_enabled 0
- notification_options w,u,c,r
- check_command check_ssh
- }
- define service {
- host_name web1
- service_description HTTP
- check_period 24x7
- max_check_attempts 4
- normal_check_interval 3
- retry_check_interval 2
- contact_groups admin
- notification_interval 10
- notification_period 24x7
- notifications_enabled 0
- notification_options w,u,c,r
- check_command check_http
- }
複製文件
- [root@localhost etc]# cd /tmp/
- [root@localhost tmp]# tar zxvf nrpe-2.12.tar.gz
- [root@localhost tmp]# cd nrpe-2.12
- [root@localhost nrpe-2.12]# ./configure --prefix=/usr/local/nrpe
- [root@localhost nrpe-2.12]# make
- [root@localhost nrpe-2.12]# make install
配置nrpe
- [root@localhost nrpe-2.12]# cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec
- [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec
- [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec
- [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec
- [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec
- [root@localhost nrpe-2.12]# mkdir /usr/local/nrpe/etc
- [root@localhost nrpe-2.12]# cp sample-config/nrpe.cfg /usr/local/nrpe/etc/
修改nrpe.cfg的配置問題,如果是服務端的話,可以不修改,如果是客戶端的話,則修改下面:
allowed_hosts=127.0.0.1
可以在allowed_hosts里加入服務都的ip
- [root@localhost nrpe-2.12]# /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
- [root@localhost nrpe-2.12]# ps -ef|grep nrpe
- nagios 4465 1 0 21:02 ? 00:00:00 /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
- root 4467 12877 0 21:02 pts/2 00:00:00 grep nrpe
- [root@localhost nrpe-2.12]# lsof -i:5666
- COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
- nrpe 4465 nagios 4u IPv4 81685 TCP *:5666 (LISTEN)
修改nagios與nrpe的所屬用戶與組
- [root@localhost local]# chown -R nagios:nagios /usr/local/nagios/*
- [root@localhost local]# chown -R nagios:nagios /usr/local/nrpe/*
8、啟動nagios
先查看沒有問題,則啟動nagios
- [root@localhost etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
- Nagios Core 3.3.1
- Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
- Copyright (c) 1999-2009 Ethan Galstad
- Last Modified: 07-25-2011
- License: GPL
- Website: http://www.nagios.org
- Reading configuration data...
- Read main config file okay...
- Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
- Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
- Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
- Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
- Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
- Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
- Processing object config file '/usr/local/nagios/etc/services.cfg'...
- Read object config files okay...
- Running pre-flight check on configuration data...
- Checking services...
- Checked 9 services.
- Checking hosts...
- Checked 1 hosts.
- Checking host groups...
- Checked 1 host groups.
- Checking service groups...
- Checked 0 service groups.
- Checking contacts...
- Checked 2 contacts.
- Checking contact groups...
- Checked 1 contact groups.
- Checking service escalations...
- Checked 0 service escalations.
- Checking service dependencies...
- Checked 0 service dependencies.
- Checking host escalations...
- Checked 0 host escalations.
- Checking host dependencies...
- Checked 0 host dependencies.
- Checking commands...
- Checked 24 commands.
- Checking time periods...
- Checked 5 time periods.
- Checking for circular paths between hosts...
- Checking for circular host and service dependencies...
- Checking global event handlers...
- Checking obsessive compulsive processor commands...
- Checking misc settings...
- Total Warnings: 0
- Total Errors: 0
- Things look okay - No serious problems were detected during the pre-flight check
創建web驗證用戶
- [root@localhost etc]# chkconfig --add nagios 將nagios添加到服務中
- [root@localhost etc]# chkconfig nagios on 設置服務為自啟動
- [root@localhost etc]# service nagios start 啟動nagios
創建開機啟動nrpe
- [root@localhost etc]# htpasswd -c /usr/local/nagios/etc/htpasswd nagios
- New password:
- Re-type new password:
- Adding password for user nagios
- [root@localhost etc]#echo "/usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d" >>/etc/rc.local
啟動sendmail,接收報警
之後你斷掉httpd服務就能收到報警,如果出現了解決不了的問題,可以聯繫我. 或者直接瀏覽我的下一篇文章 “文章為什麼nagios不能發生報警郵件 ”,地址是http://dl528888.blog.51cto.com/2382721/763079
- [root@localhost etc]#service sendmail start
本文出自 「吟—技術交流」 博客,請務必保留此出處http://dl528888.blog.51cto.com/2382721/763032
[火星人 ] nagios監控三部曲之——nagios的安裝與配置(1)已經有1045次圍觀