所有告警都应该立刻处理掉,不应该存在长时间未解决的告警。所以具体的表现就是高频的数据采集,和告警的自动恢复(默认5分钟)
使用如下命令即可手工制造告警,注意startsAt和endsAt时间为当前实际时间的UTC格式。
curl -H "Content-Type: application/json" -X POST -d '[{"labels":{"字段1": "值1", "字段2": "值2", "字段3": "值3"},"annotations":{"desc": "xxxx"},"generatorURL":"http://1.1.1.1","startsAt":"2022-08-10T20:57:46.000+08:00"}]' "http://127.0.0.1:9093/api/v2/alerts"
alertmanager发送给receiver的为一个json,多条告警形成alerts数组,示例如下:
'{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}, "annotations": {"desc": "xxxx"}, "startsAt": "2023-02-09T09:58:45+08:00", "endsAt": "2023-02-09T10:00:45+08:00", "generatorURL": "http://1.1.1.1", "fingerprint": "12345"}], "groupLabels": {"字段1": "值1"}, "commonLabels": {"字段1": "值1", "字段2"}, "commonAnnotations": {"desc": "xxxx"}, "externalURL": "http://prometheus:9093", "version": "4", "truncatedAlerts": 0}'
告警恢复之后,对应的status字段会被置为resolved,只有alerts数组中所有告警都变为resolved状态,整条json的status才会置为resolved。
假设group_wait设置为30秒,group_interval设置为1分钟,repeat_interval设置为10分钟
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
{"receiver": "email", "status": "firing", "alerts": [{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
{"receiver": "email", "status": "firing", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "firing", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
{"receiver": "email", "status": "resolve", "alerts": [{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...},{"status": "resolve", "labels": {"字段1": "值1", "字段2": "值2", "字段3": "值3"}...}], ...}
假如10:00:30发送第一条json之后,2、3、4步骤都没有发生,且告警一直没有恢复,则10:10:30(t0+repeat_interval)会重复发送第一条json。