用nagios发送报警邮件
(特别说明:本文档介绍的方法基于本公司测试机环境中不能连接到外部网络所作出的修改,要是能直接链接www的话可以直接在localhost.cfg文件中对应的地方修改一处即可)
nagios发警告邮件是采用本机的smtp服务,可以查看commands.cfg中关于发邮件的命令的定义,使用本机的mail命令,这就需要开启本机的smtp服务,为了安全可以在防火墙上设置拒绝其他的机器连本机的25号端口
现在我们的网络里面有一个邮件服务器,所以要求使用这台现有的邮件服务器,不开启本机的smtp服务,这就需要重新定义命令使用第三方软件sendEmail.
首先我们当然要在邮件服务器上新建一个账户用来做发邮件的账户
这里邮件服务器的地址为mail.test.com
用来发邮件的帐号nagios@test.com
SMTP验证的用户名 nagios 密码 p#3isoda
以下就来介绍一下sendEmail这个软件的使用.
sendEmail的主页http://caspian.dotconf.net/menu/Software/SendEmail/
下载地址http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.55.tar.gz
软件十分小,是一个通过命令来发smtp邮件的程序.安装也十分简单(查看其README文件即可).
解压缩tar –zxvf sendEmail-v1.55.tar.gz
cd sendEmail-v1.55
将可执行程序复制cp sendEmail /usr/local/bin
然后给确认确实它具有执行权限
ll /usr/local/bin/sendEmail
-rwxr-xr-x 1 root root 77882 11-03 14:23 /usr/local/bin/sendEmail |
这样程序就装好了,使用也很简单.直接运行sendEmail就会显示详细的用法
先看一个典型的例子
/usr/local/bin/sendEmail –f nagios@test.com –t yahoon@test.com –s mail.test.com –u “from nagios” –xu nagios –xp p#3isoda –m happy
解释:
-f 表示发送者的邮箱
-t 表示接收者的邮箱
-s 表示SMTP服务器的域名或者ip
-u 表示邮件的主题
-xu 表示SMTP验证的用户名
-xp 表示SMTP验证的密码(注意,这个密码貌似有限制,例如我用d!5neyland就不能被正确识别)
-m 表示邮件的内容
如果你不带-m参数的话,就会提示你自行输入
Reading message body from STDIN because the ‘-m’ option was not used. If you are manually typing in a message: - First line must be received within 60 seconds. - End manual input with a CTRL-D on its own line |
输入完成后使用CTRL-D来结束
当然我们也可以将一个文件的内容作为邮件的正文发出去的
那么就可以使用:
cat 文件名| /usr/local/bin/sendEmail –f nagios@test.com –t yahoon@test.com –s mail.test.com –u “from nagios” –xu nagios –xp p#3isoda
有关sendEmail的用法就讲到这里
既然nagios要使用sendEmail来发警告邮件,那么就要修改commands.cfg中关于发邮件的命令的定义,我们现在来修改notify-by-email这个命令,如下(注意其中粗体的部分)
# 'notify-by-email' command definition define command{ command_name notify-by-email command_line /usr/bin/printf "%b" "***** Nagios 2.9 *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" | /usr/local/bin/sendEmail -f nagios@test.com -t $CONTACTEMAIL$ -s mail.test.com -u "** $NOTIFICATIONTYPE$ alert - $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" -xu nagios -xp p#3isoda } |
(注意:上几行中彩色字体的部分要根据具体情况修改,例如我针对本公司平台作出的修改如下:/usr/local/bin/sendEmail -f nagios@onecloud.com.cn -t nagios@onecloud.com.cn -s 192.168.4.26 -u "$HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$" )然后就报警信息就会自动发送到https://192.168.4.26:446/mail/中
注:其实sendEmail是一个十分有用的程序,我们在这个地方用了它,其实别的地方也可以用,典型的好处就是你不需要每台机器都装sendmail,开启smtp服务.直接用现成的一台邮件服务器就行了,这无疑很大的加强了系统的安全性,也节约了资源.
接下来介绍一下在sendmail中的一些基本的设置来定制个性化服务:
看local-service在templete.cfg中的定义:
define service{
name local-service ; The name of this service template
use generic-service ; Inherit default values from the generic-service definition
max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
normal_check_interval 5 ;(#每多长时间检测一次) Check the service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
看generic-service在templete.cfg中的定义:
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admingroup #联系组 ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 #通知频率 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
配置文件/etc/nagios/nrpe.cfg
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 15% -c 10% -p /dev/sda1
#command[check_disk2]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /dev/hdb1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 250 -c 400
command[check_cpu]=/usr/lib/nagios/plugins/check_procs -w 50 -c 70 --metric=CPU
command[check_mem]=/usr/lib/nagios/plugins/check_mem 450 500
这个就是调用的命令,各种命令通过安装插件得到,在远程机器上只需要 check_nrpe -H IP地址 -c 命令(如check_mem)就可以了
check_nrpe需要安装nagios-nrpe_2.12.orig.tar.gz默认的插件中是没有的。
/etc/rc.d/init.d/nrpe start启动在5666端口
如果调用提示“no xxxxx output“注意属主和是否有可执行权限
关于自己写插件(我用shell简单O(∩_∩)O):
远程也是check_nrpe -H IP地址 -c 命令(如check_mem)调用,
脚本 返回给服务器的东西 通过 echo
通过脚本程序退出的值0正常,1警告,2错误 来获取远程服务器健康情况
############################
$USER1是在resource.cfg中定义,$ARG1$ , $ARG2$这些就是定义在service中调用服务命令后面的参数,$HOSTADDRESS$是系统的宏 具体见手册
说明:定义为use XXXXXXXX 的都定义在template.cfg中
通知选项 w,u,c,r这些:
notification_options: This directive is used to determine when notifications for the host should be sent out. Valid options are a combination of one or more of the following: d = send notifications on a DOWN state, u = send notifications on an UNREACHABLE state, r = send notifications on recoveries (OK state), f = send notifications when the host starts and stops flapping, and s = send notifications when scheduled downtime starts and ends. If you specify n (none) as an option, no host notifications will be sent out. If you do not specify any notification options, Nagios will assume that you want notifications to be sent out for all possible states. Example: If you specify d,r in this field, notifications will only be sent out when the host goes DOWN and when it recovers from a DOWN state.
手册上有说明
#notification_interval 120 单位分钟(当检测到错误服务时 通知后 隔多少分钟再通知)
(centos请关闭selinux,要不无法使用cgi,日志等(出现无法读,或者是白页) | chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin)
max_check_attempts normal_check_interval
notification_interval
第一个就是检测故障的次数 第二个是检测的时间间隔 报警时间=检测故障次数*检测时间间隔 当报警之后还没有处理就按notification_interval这个来进行多长时间的报警
web页面刷新:refresh_rate=90(在cfg.cgi中)
邮件通知:
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
短信通知:(可以用飞信,安装smstools)
# 'notify-service-by-sms' command definition
define command{
command_name notify-service-by-sms
command_line /usr/local/bin/sendsms 1589534xxxx "**Nagios**\n$NOTIFICATIONTYPE$: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\nInfo:\n$SERVICEOUTPUT$"
}
参考链接
nagios中文手册:http://www.itnms.net/docs/nagios/cn/build/html/index.html
http://blog.chinaunix.net/space.php?uid=20367477&do=blog&id=127245
http://hi.baidu.com/171892549/blog/item/df86c4179551300ec93d6d13.html
http://hi.baidu.com/171892549/blog/item/9ecd76898b8e5ab40f244482.html