check_km文档没有看到如何修改passive check的间隔时间,不过观察发现icinga里面有一个变量check_interval可以设置在service里面。
在check_mk_objects.cfg里面的CPU load设置里添加这个变量:
define service {
use check_mk_passive_perf
host_name StaticFileServer
service_description CPU load
check_command check_mk-cpu.loads
check_interval 0.05
}
因为单位是分钟,这里用0.05来表示3秒间隔。
然后重新启动icinga
service icinga restart
web页面里面显示间隔为3秒。
如果要改变所有的service的监控间隔,可以修改conf.d/check_mk_templates.cfg文件中的名为check_mk_default的service:
# Template used by all other check_mk templates
define service {
name check_mk_default
register 0
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 0
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 0
retain_status_information 1
retain_nonstatus_information 1
notification_interval 0
is_volatile 0
normal_check_interval 0.05
retry_check_interval 0.05
max_check_attempts 1
notification_options u,c,w,r,f,s
notification_period 24X7
check_period 24X7
}
上面将normal_check_ineterval和retry_check_interval修改成了0.05分钟。
再修改icinga.cfg文件:
command_check_interval=1s
external_command_buffer_slots=32768
加上日志:
log_external_commands=1
log_passive_checks=1
用grep命令把对某个服务器的cpuload监控日志过滤出来:
./icinga.log:127508:[1369051243] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127538:[1369051247] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127585:[1369051252] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127631:[1369051257] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127677:[1369051262] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127724:[1369051267] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127770:[1369051272] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127832:[1369051278] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127878:[1369051283] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127909:[1369051287] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:127955:[1369051292] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128002:[1369051297] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128048:[1369051302] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
./icinga.log:128125:[1369051309] PASSIVE SERVICE CHECK: StaticFileServer;CPU load;0;OK - 15min load 0.05 at 4 CPUs
刚才是全局的设置,所有服务的检查都改成了3s间隔,但是如果仅仅改动一个service的间隔可以么?我尝试了把下面的配置单独放在一个service中,而全局的配置仍然为1分钟:
normal_check_interval 0.05
retry_check_interval 0.05
日志中显示仍然为60秒间隔,尽管web页面上已经显示3s.
Service normal/retry check interval | 3s/3s |
结论:
1. 目前只找到全局的修改方式,对某个service修改无效。
2. 服务器CPU load现在没有什么压力,所以还看不出实际的效果。还需要压力测试来证明。