System Detail
1. Context Switches / Interrupts
type: Graph
Unit: short
Label: Counter
Context switches - CPU 的 context switch 平均次数(5分钟内)
metrics:
irate(node_context_switches_total{instance=~"$node:$port",job=~"$job"}[5m])
Interrupts - 服务的平均中断总数(5分钟内)
metrics:
irate(node_intr_total{instance=~"$node:$port",job=~"$job"}[5m])
2. System Load
type: Graph
Unit: short
Label: Load
Load 1m - 系统1分钟内的平均负载
metrics:
node_load1{instance=~"$node:$port",job=~"$job"}
Load 5m - 系统5分钟内的平均负载
metrics:
node_load5{instance=~"$node:$port",job=~"$job"}
Load 15m - 系统15分钟内的平均负载
metrics:
node_load15{instance=~"$node:$port",job=~"$job"}
3. Interrupts Detail /proc/interrupts
type: Graph
Unit: short
Label: Counter
{{ type }} - {{ info }} - 当前系统的软中断列表和对应的中断号平均中断次数(5分钟内)
metrics:
irate(node_interrupts_total{instance=~"$node:$port",job=~"$job"}[5m])
4. File Descriptors
type: Graph
Unit: short
Label: Descriptors
Maximum open file descriptors - 最大打开文件描述符数
metrics:
process_max_fds{instance=~"$node:$port",job=~"$job"}
Open file descriptors - 打开文件描述符的数量
metrics:
process_open_fds{instance=~"$node:$port",job=~"$job"}
5. Entropy
type: Graph
Unit: short
Label: Entropy
Entropy available to random number generators
metrics:
node_entropy_available_bits{instance=~"$node:$port",job=~"$job"}
6. Processes State
type: Graph
Unit: short
Label: Processes
Processes blocked - 当前被阻塞的任务的数目 /proc/stat procs_blocked
metrics:
node_procs_blocked{instance=~"$node:$port",job=~"$job"}
Processes in runnable state - 当前运行队列的任务的数目 /proc/stat procs_running
metrics:
node_procs_running{instance=~"$node:$port",job=~"$job"}
7. Processes Forks
type: Graph
Unit: short
Label: Forks / sec
Processes forks second - 每秒创建的进程个数
metrics:
rate(node_forks_total{instance=~"$node:$port",job=~"$job"}[5m])
8. Processes Memory
type: Graph
Unit: bytes
Label: Bytes
进程占用的虚拟内存的大小:
metrics:
process_virtual_memory_bytes{instance=~"$node:$port",job=~"$job"}
进程常驻内存大小:
metrics:
process_resident_memory_bytes{instance=~"$node:$port",job=~"$job"}
9. Time Syncronized Status
type: Graph
Unit: short
Label: Counter
Is clock synchronized to a reliable server:时钟是否与一个可靠的服务器同步:
metrics:
node_timex_sync_status{instance=~"$node:$port",job=~"$job"}
Local clock frequency adjustment: 本地时钟调整频率:
metrics:
node_timex_frequency_adjustment_ratio{instance=~"$node:$port",job=~"$job"}
10. Time Syncronized Drift
type: Graph
Unit: seconds
Label: Seconds
Estimated error in seconds:估算误差(秒):
metrics:
node_timex_estimated_error_seconds{instance=~"$node:$port",job=~"$job"}
Time offset in between local system and reference clock:本地系统和参考时钟之间的时间偏移:
metrics:
node_timex_offset_seconds{instance=~"$node:$port",job=~"$job"}
Maximum error in seconds: 最大误差(秒):
metrics:
node_timex_maxerror_seconds{instance=~"$node:$port",job=~"$job"}
11. Hardware temperature monitor 硬件的温度监控
type: Graph
Unit: Celsius(摄氏度)
Label: Temperature
{{ chip }} {{ sensor }} temp -
metrics:
node_hwmon_temp_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical Alarm
metrics:
node_hwmon_temp_crit_alarm_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical
metrics:
node_hwmon_temp_crit_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Critical Historical
metrics:
node_hwmon_temp_crit_hyst_celsius{instance=~"$node:$port",job=~"$job"}
{{ chip }} {{ sensor }} Max
metrics:
node_hwmon_temp_max_celsius{instance=~"$node:$port",job=~"$job"}