歡迎您光臨本站 註冊首頁

nagios監控三部曲之——nagios的安裝與配置(1)

←手機掃碼閱讀     火星人 @ 2014-03-08 , reply:0

最近公司需要上線監控系統,而且需要部署很多的監控,環境與設備也大都不一樣,所以我就寫了一份安裝監控的技術文檔,讓我公司的運維來根據我的文檔來進行監控的部署.

我的系統是redhat5.4,關閉了iptables與selinux.

1、安裝yum(如果本機有了yum,則可以不安裝,跳過此步到第3步)
  1. [root@localhost yum.repos.d]# wget http://packages.sw.be/rpmforge-release/rpmforge-release-0.5.1-1.el5.rf.i386.rpm
  2. root@localhost yum.repos.d]# wget http://dag.wieers.com/rpm/packages/RPM-GPG-KEY.dag.txt
  3. [root@localhost yum.repos.d]# rpm -Uvh rpmforge-release-0.5.1-1.el5.rf.i386.rpm
  4. root@localhost yum.repos.d]# rpm --import RPM-GPG-KEY.dag.txt
  5. [root@localhost yum.repos.d]# yum install yum-fastestmirror yum-presto

2、安裝apache(如果本機默認安裝了,那麼可以跳過這一步,如果沒有安裝,則可以yum安裝)

  1. [root@localhost ~]# yum -y install httpd

安裝nagios需要一些基礎支持套件

  1. [root@localhost etc]# yum -y install gd gd-devel glibc glibc-common gcc

3、配置apache來支持nagios

1)建立nagios用戶

  1. [root@localhost ~]# useradd nagios
  2. [root@localhost etc]# /usr/sbin/groupadd nagcmd 添加nagcmd用戶組,用以通過web頁面提交外部控制命令
  3. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd nagios將nagios用戶加入nagcmd組

  4. [root@localhost etc]# /usr/sbin/usermod -a -G nagcmd apache將apache用戶加入nagcmd組
  5. [root@localhost etc]# /usr/sbin/usermod -a -G apache nagios將nagios用戶加入apache組
  6. [root@localhost etc]# /usr/sbin/usermod -a -G nagios apache將apache用戶加入nagios組

2)修改apache運行用戶和組.默認是daemon,需要把它改成nagios.這樣它才能有許可權訪問我們安裝的nagios目錄,執行相關的cgi命令,如通過瀏覽器界面關閉nagios、停止某個故障對象發送報警信息等.(此步可以省略,我在部署nagios的時候,沒有改變apache的用戶與組,也沒有出現問題)

3)添加nagios訪問目錄(nagios 的安裝路徑/usr/local/nagios),同時使用http用戶驗證.把下面的內容追加到httpd.conf文件的末尾:

  1. ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
  2. <Directory "/usr/local/nagios/sbin">
  3. Options ExecCGI
  4. AllowOverride None
  5. Order allow,deny
  6. Allow from all
  7. AuthName "Nagios Access"
  8. AuthType Basic
  9. AuthUserFile /usr/local/nagios/etc/htpasswd
  10. Require valid-user
  11. </Directory>
  12. Alias /nagios /usr/local/nagios/share
  13. <Directory "/usr/local/nagios/share"

    >
  14. Options None
  15. AllowOverride None
  16. Order allow,deny
  17. Allow from all
  18. AuthName "Nagios Access"
  19. AuthType Basic
  20. AuthUserFile /usr/local/nagios/etc/htpasswd
  21. Require valid-user
  22. </Directory>

4、安裝nagios

  1. [root@localhost tmp]# tar zxvf nagios-3.3.1.tar.gz
  2. [root@localhost nagios]# ./configure --prefix=/usr/local/nagios -with-command-group=nagcmd
  3. [root@localhost nagios]# make all
  4. [root@localhost nagios]# make install
  5. [root@localhost nagios]# make install-init
  6. [root@localhost nagios]# make install-config
  7. [root@localhost nagios]# make install-commandmode
  8. [root@localhost nagios]# make install-webconf

5、安裝nagios插件nagios-plugin

  1. [root@localhost nagios]#cd /tmp
  2. [root@localhost tmp]# tar zxvf nagios-plugins-1.4.15.tar.gz
  3. [root@localhost nagios-plugins-1.4.15]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
  4. [root@localhost nagios-plugins-1.4.15]# make

  5. [root@localhost nagios-plugins-1.4.15]# make install
6、配置nagios
  1. [root@localhost nagios-plugins-1.4.15]# cd /usr/local/
  2. [root@localhost local]# chown -R nagios:nagios nagios/
  3. [root@localhost local]# chown -R nagios:nagios nagios/*
  4. [root@localhost local]# cd nagios/etc/
  5. [root@localhost etc]# vim nagios.cfg ###修改nagios.cfg配置文件,內容如下:
  6. cfg_file=/usr/local/nagios/etc/hosts.cfg #增加主機配置文件
  7. cfg_file=/usr/local/nagios/etc/hostgroups.cfg #增加主機組配置文件
  8. cfg_file=/usr/local/nagios/etc/contacts.cfg #增加聯繫人配置文件
  9. cfg_file=/usr/local/nagios/etc/contactgroups.cfg #增加聯繫人配置文件
  10. cfg_file=/usr/local/nagios/etc/services.cfg ##增加服務配置文件
  11. cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg #時間周期配置文件
  12. cfg_file=/usr/local/nagios/etc/objects/commands.cfg #命令配置文件
  13. 修改cgi.cfg配置文件,修改內容如下:
  14. [root@localhost etc]# vim cgi.cfg
  15. #如有多個用戶,中間用逗號隔開
  16. authorized_for_system_information=nagios
  17. authorized_for_configuration_information

    = nagios
  18. authorized_for_system_commands= nagios
  19. authorized_for_all_services= nagios
  20. authorized_for_all_hosts= nagios
  21. authorized_for_all_service_commands= nagios
  22. authorized_for_all_host_commands= nagios
在這裡指定的用戶”nagios”可以通過瀏覽器操縱nagios服務的關閉、重啟等各種操作 [root@localhost etc]# sed -i 's/nagiosadmin/nagios/g' cgi.cfg ##或者用此命令修改
  1. (1)、配置主機文件hosts.cfg
  2. define host{
  3. host_name web1## 主機名為web1,可以在hostname里查看
  4. alias Nagios Server ##主機別名為Server
  5. address 192.168.10.223##主機的ip地址
  6. check_command check-host-alive ##檢查使用的命令,需要在命令定
  7. 義文件定義,默認是定義好的.
  8. check_interval 5 ##檢測的時間間隔
  9. retry_interval 1 ##檢測失敗后重試的時間間隔
  10. max_check_attempts 5 ##最大重試次數

  11. check_period 24x7 ##檢測的時段
  12. process_perf_data 0
  13. retain_nonstatus_information 0
  14. contact_groups admin ###聯繫組,就是設置郵件報警的組
  15. notification_interval 30 ##通知間隔
  16. notification_period 24x7 ##通知周期設置
  17. notification_options d,u,r ####定義什麼狀態時報警,定義報警狀態中的w表示warning,u表示unknown,c表示critial,r表示recovery(即恢復后是否發送通知);報警選項一般生產環境下設置w,c,r即可
  18. }
  19. (2)、配置主機組文件hostgroups.cfg
  20. define hostgroup {
  21. hostgroup_name Nagios-Example ##定義主機組的名字
  22. alias Nagios Example ##定義主機組的別名
  23. members web1 ##主機組的成員,跟hosts.cfg里的hostname一致,否則出錯
  24. }
  25. (3)、配置聯繫人文件contacts.cfg
  26. define contact{
  27. contact_name nagiosadmin #聯繫名稱
  28. alias Nagios Admin #聯繫別名
  29. service_notification_period 24x7 #服務監控時間為任何時候
  30. host_notification_period 24x7 #主機監控時間為任何時候

  31. service_notification_options w,u,c,r #服務監控的狀態
  32. host_notification_options d,u,r #主機監控的狀態
  33. service_notification_commands notify-service-by-email #郵件報警
  34. host_notification_commands notify-host-by-email #同上
  35. email denglei@ctfo.com #接收報警的郵箱
  36. }
  37. (4)、配置聯繫組文件contactgroups.cfg
  38. define contactgroup{
  39. contactgroup_name admin #聯繫組的名字
  40. alias Nagios Administrators #聯繫組的別名
  41. members nagiosadmin #聯繫組裡的成員,與contacts.cfg里的contact_name 保存一致
  42. }
  43. (5)、配置服務文件 services.cfg
  44. define service {
  45. host_name web1 #與hosts.cfg里的host-name保持一致
  46. service_description check-host-alive #服務描述
  47. check_period 24x7 #服務描述
  48. max_check_attempts 4 #最大檢測次數
  49. normal_check_interval 3 #檢測的時間間隔
  50. retry_check_interval 2 #重複檢測的時間間隔
  51. contact_groups admin #發生故障通知的聯繫人組
  52. notification_interval 10 #通知間隔
  53. notification_period 24x7 #通知的時間段
  54. notification_options w,u,c,r #定義什麼狀態時報警,定義報警狀態中
  55. check_command check-host-alive #檢測的命令
  56. }
  57. define service {
  58. host_name web1
  59. service_description PING
  60. check_period 24x7
  61. max_check_attempts 4
  62. normal_check_interval 3
  63. retry_check_interval 2
  64. contact_groups admin
  65. notification_interval 10
  66. notification_period 24x7
  67. notification_options w,u,c,r
  68. check_command check_ping!100.0,20%!500.0,60%
  69. }
  70. define service {
  71. host_name web1
  72. service_description Root Partition
  73. check_period 24x7
  74. max_check_attempts 4
  75. normal_check_interval 3
  76. retry_check_interval 2
  77. contact_groups admin
  78. notification_interval 10
  79. notification_period 24x7
  80. notification_options w,u,c,r
  81. check_command check_local_disk!20%!10%!/

  82. }
  83. define service {
  84. host_name web1
  85. service_description Current Users
  86. check_period 24x7
  87. max_check_attempts 4
  88. normal_check_interval 3
  89. retry_check_interval 2
  90. contact_groups admin
  91. notification_interval 10
  92. notification_period 24x7
  93. notification_options w,u,c,r
  94. check_command check_local_users!20!50
  95. }
  96. define service {
  97. host_name web1
  98. service_description Total Processes
  99. check_period 24x7
  100. max_check_attempts 4
  101. normal_check_interval 3
  102. retry_check_interval 2
  103. contact_groups admin
  104. notification_interval 10
  105. notification_period 24x7
  106. notification_options w,u,c,r
  107. check_command check_local_procs!250!400!RSZDT
  108. }
  109. define service {
  110. host_name web1
  111. service_description Current Load
  112. check_period 24x7
  113. max_check_attempts 4
  114. normal_check_interval 3
  115. retry_check_interval 2
  116. contact_groups admin
  117. notification_interval 10
  118. notification_period 24x7
  119. notification_options w,u,c,r
  120. check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
  121. }
  122. define service {
  123. host_name web1
  124. service_description Swap Usage
  125. check_period 24x7
  126. max_check_attempts 4
  127. normal_check_interval 3
  128. retry_check_interval 2
  129. contact_groups admin
  130. notification_interval 10
  131. notification_period 24x7
  132. notification_options w,u,c,r
  133. check_command check_local_swap!20!10
  134. }
  135. define service {
  136. host_name web1
  137. service_description SSH
  138. check_period 24x7
  139. max_check_attempts 4
  140. normal_check_interval 3
  141. retry_check_interval 2
  142. contact_groups admin
  143. notification_interval 10

  144. notification_period 24x7
  145. notifications_enabled 0
  146. notification_options w,u,c,r
  147. check_command check_ssh
  148. }
  149. define service {
  150. host_name web1
  151. service_description HTTP
  152. check_period 24x7
  153. max_check_attempts 4
  154. normal_check_interval 3
  155. retry_check_interval 2
  156. contact_groups admin
  157. notification_interval 10
  158. notification_period 24x7
  159. notifications_enabled 0
  160. notification_options w,u,c,r
  161. check_command check_http
  162. }
7、安裝nrpe
  1. [root@localhost etc]# cd /tmp/
  2. [root@localhost tmp]# tar zxvf nrpe-2.12.tar.gz
  3. [root@localhost tmp]# cd nrpe-2.12
  4. [root@localhost nrpe-2.12]# ./configure --prefix=/usr/local/nrpe
  5. [root@localhost nrpe-2.12]# make
  6. [root@localhost nrpe-2.12]# make install
複製文件
  1. [root@localhost nrpe-2.12]# cp /usr/local/nrpe/libexec/check_nrpe /usr/local/nagios/libexec
  2. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_disk /usr/local/nrpe/libexec

  3. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_load /usr/local/nrpe/libexec
  4. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_ping /usr/local/nrpe/libexec
  5. [root@localhost nrpe-2.12]# cp /usr/local/nagios/libexec/check_procs /usr/local/nrpe/libexec
配置nrpe
  1. [root@localhost nrpe-2.12]# mkdir /usr/local/nrpe/etc
  2. [root@localhost nrpe-2.12]# cp sample-config/nrpe.cfg /usr/local/nrpe/etc/

修改nrpe.cfg的配置問題,如果是服務端的話,可以不修改,如果是客戶端的話,則修改下面:

allowed_hosts=127.0.0.1

可以在allowed_hosts里加入服務都的ip

  1. [root@localhost nrpe-2.12]# /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  2. [root@localhost nrpe-2.12]# ps -ef|grep nrpe
  3. nagios 4465 1 0 21:02 ? 00:00:00 /usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d
  4. root 4467 12877 0 21:02 pts/2 00:00:00 grep nrpe
  5. [root@localhost nrpe-2.12]# lsof -i:5666
  6. COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
  7. nrpe 4465 nagios 4u IPv4 81685 TCP *:5666 (LISTEN)

修改nagios與nrpe的所屬用戶與組

  1. [root@localhost local]# chown -R nagios:nagios /usr/local/nagios/*
  2. [root@localhost local]# chown -R nagios:nagios /usr/local/nrpe/*

8、啟動nagios

先查看

nagios的配置是否有問題
  1. [root@localhost etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
  2. Nagios Core 3.3.1
  3. Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
  4. Copyright (c) 1999-2009 Ethan Galstad
  5. Last Modified: 07-25-2011
  6. License: GPL
  7. Website: http://www.nagios.org
  8. Reading configuration data...
  9. Read main config file okay...
  10. Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
  11. Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
  12. Processing object config file '/usr/local/nagios/etc/hosts.cfg'...
  13. Processing object config file '/usr/local/nagios/etc/hostgroups.cfg'...
  14. Processing object config file '/usr/local/nagios/etc/contacts.cfg'...
  15. Processing object config file '/usr/local/nagios/etc/contactgroups.cfg'...
  16. Processing object config file '/usr/local/nagios/etc/services.cfg'...
  17. Read object config files okay...
  18. Running pre-flight check on configuration data...
  19. Checking services...
  20. Checked 9 services.
  21. Checking hosts...
  22. Checked 1 hosts.
  23. Checking host groups...
  24. Checked 1 host groups.

  25. Checking service groups...
  26. Checked 0 service groups.
  27. Checking contacts...
  28. Checked 2 contacts.
  29. Checking contact groups...
  30. Checked 1 contact groups.
  31. Checking service escalations...
  32. Checked 0 service escalations.
  33. Checking service dependencies...
  34. Checked 0 service dependencies.
  35. Checking host escalations...
  36. Checked 0 host escalations.
  37. Checking host dependencies...
  38. Checked 0 host dependencies.
  39. Checking commands...
  40. Checked 24 commands.
  41. Checking time periods...
  42. Checked 5 time periods.
  43. Checking for circular paths between hosts...
  44. Checking for circular host and service dependencies...
  45. Checking global event handlers...
  46. Checking obsessive compulsive processor commands...
  47. Checking misc settings...
  48. Total Warnings: 0
  49. Total Errors: 0
  50. Things look okay - No serious problems were detected during the pre-flight check
沒有問題,則啟動nagios
  1. [root@localhost etc]# chkconfig --add nagios 將nagios添加到服務中
  2. [root@localhost etc]# chkconfig nagios on 設置服務為自啟動
  3. [root@localhost etc]# service nagios start 啟動nagios

創建web驗證用戶
  1. [root@localhost etc]# htpasswd -c /usr/local/nagios/etc/htpasswd nagios
  2. New password:
  3. Re-type new password:
  4. Adding password for user nagios
創建開機啟動nrpe
  1. [root@localhost etc]#echo "/usr/local/nrpe/bin/nrpe -c /usr/local/nrpe/etc/nrpe.cfg -d" >>/etc/rc.local

啟動sendmail,接收報警

  1. [root@localhost etc]#service sendmail start
之後你斷掉httpd服務就能收到報警,如果出現了解決不了的問題,可以聯繫我. 或者直接瀏覽我的下一篇文章 “文章為什麼nagios不能發生報警郵件 ”,地址是http://dl528888.blog.51cto.com/2382721/763079

本文出自 「吟—技術交流」 博客,請務必保留此出處http://dl528888.blog.51cto.com/2382721/763032


[火星人 ] nagios監控三部曲之——nagios的安裝與配置(1)已經有1045次圍觀

http://coctec.com/docs/linux/show-post-46820.html