下载安装
mkdir -p /data/alertmanager/{bin,conf,logs,data,templates}
wget https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
tar xvf alertmanager-0.21.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/alertmanager-0.21.0.linux-amd64 /usr/local/alertmanager
anzhaugn配置文件说明:
全局配置(global):用于定义一些全局的公共参数,如全局的SMTP配置,Slack配置等内容;
模板(templates):用于定义告警通知时的模板,如HTML模板,邮件模板等;
告警路由(route):根据标签匹配,确定当前告警应该如何处理;
接收人(receivers):接收人是一个抽象的概念,它可以是一个邮箱也可以是微信,Slack或者Webhook等,接收人一般配合告警路由使用;
抑制规则(inhibit_rules):合理设置抑制规则可以减少垃圾告警的产生
group_by:定义分组规则。
group_wait:设置等待时间,如果在等待时间内当前group接收到了新的告警这些告警将会合并为一个通知向receiver发送。
group_interval:定义相同的Group之间发送告警通知的时间间隔。
repeat_interval:发送报警间隔,如果指定时间内没有修复,则重新发送报警。
match_re:设置match_re可以验证当前告警标签的值是否满足正则表达式的内容。
continue:continue的值为false,那么告警在匹配到第一个子节点之后就直接停止。如果continue为true,报警则会继续进行后续子节点的匹配
inhibit_rules 抑制规则说明:当已经发送的告警通知匹配到target_match和target_match_re规则,当有新的告警规则如果满足source_match或者定义的匹配规则,并且已发送的告警与新产生的告警中equal定义的标签完全相同,则启动抑制机制,新的告警不会发送。
告警通知文件
vim /usr/local/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465'
smtp_from: 'xxxxx@qq.com'
smtp_auth_username: 'xxxx@qq.com'
smtp_auth_password: 'xxxxx'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5s
repeat_interval: 1h
receiver: 'default'
routes:
- receiver: 'email'
match_re:
severity: 'warning|error|critical'
routes:
- name: 'email'
- match_re:
severity: 'warning|error|critical'
receiver: 'default'
receivers:
- name: 'default'
email_configs:
- to: 'xxxx@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.163.com:25'
smtp_from: 'xxx@163.com'
smtp_auth_username: 'xxx@163.com'
smtp_auth_password: 'xxx'
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5s
repeat_interval: 1h
receiver: 'default'
routes:
- receiver: 'email'
match_re:
severity: 'warning|error|critical'
receivers:
- name: 'default'
- name: 'email'
email_configs:
- to: 'xxx@qq.com'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname','dev','instance']
告警规则配置文件
mkdir /etc/rules
vim /etc/rules/nodes.rules.yml
配置文件说明:
alert:告警规则的名称。
expr:基于PromQL表达式告警触发条件,用于计算是否有时间序列满足该条件。
for:评估等待时间,可选参数。用于表示只有当触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为pending。
labels:自定义标签,允许用户指定要附加到告警上的一组附加标签。
annotations:用于指定一组附加信息,比如用于描述告警详细信息的文字等,annotations的内容在告警产生时会一同作为参数发送到Alertmanager。
groups:
- name: nodes
rules:
- alert: AvailableMemoryBelow30%
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 30
for: 2m
labels:
severity: warning
annotations:
summary: Host memory usage is below 30% (instance {{ $labels.instance }})
description: "Node memory is filling up (< 30% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: AvailableMemoryBelow20%
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20
for: 2m
labels:
severity: error
annotations:
summary: Host memory usage is below 20% (instance {{ $labels.instance }})
description: "Node memory is filling up (< 20% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
关联Prometheus
vim /usr/local/prometheus/prometheus.yml
global:
scrape_interval: 30s
alerting:
alertmanagers:
- static_configs:
- targets: ['Alertmanager机器ip:9093']
rule_files:
- "/etc/rules/*.yml"
配置systen启动
cat >> /lib/systemd/system/alertmanager.service <<EOF
[Service]
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
[Unit]
Description=alertmanager
EOF
启动:
systemctl start alertmanager.service
systemctl enable alertmanager.service