Heapster Metrics

强阳曜
2023-12-01

Metrics

Heapster exports the following metrics to its backends.

Metric NameDescription
cpu/limitCPU hard limit in millicores.
cpu/node_capacityCpu capacity of a node.
cpu/node_allocatableCpu allocatable of a node.
cpu/node_reservationShare of cpu that is reserved on the node allocatable.
cpu/node_utilizationCPU utilization as a share of node allocatable.
cpu/requestCPU request (the guaranteed amount of resources) in millicores.
cpu/usageCumulative CPU usage on all cores.
cpu/usage_rateCPU usage on all cores in millicores.
filesystem/usageTotal number of bytes consumed on a filesystem.
filesystem/limitThe total size of filesystem in bytes.
filesystem/availableThe number of available bytes remaining in a the filesystem
filesystem/inodesThe number of available inodes in a the filesystem
filesystem/inodes_freeThe number of free inodes remaining in a the filesystem
disk/io_read_bytesNumber of bytes read from a disk partition
disk/io_write_bytesNumber of bytes written to a disk partition
disk/io_read_bytes_rateNumber of bytes read from a disk partition per second
disk/io_write_bytes_rateNumber of bytes written to a disk partition per second
memory/limitMemory hard limit in bytes.
memory/major_page_faultsNumber of major page faults.
memory/major_page_faults_rateNumber of major page faults per second.
memory/node_capacityMemory capacity of a node.
memory/node_allocatableMemory allocatable of a node.
memory/node_reservationShare of memory that is reserved on the node allocatable.
memory/node_utilizationMemory utilization as a share of memory allocatable.
memory/page_faultsNumber of page faults.
memory/page_faults_rateNumber of page faults per second.
memory/requestMemory request (the guaranteed amount of resources) in bytes.
memory/usageTotal memory usage.
memory/cacheCache memory usage.
memory/rssRSS memory usage.
memory/working_setTotal working set usage. Working set is the memory being used and not easily dropped by the kernel.
accelerator/memory_totalMemory capacity of an accelerator.
accelerator/memory_usedMemory used of an accelerator.
accelerator/duty_cycleDuty cycle of an accelerator.
network/rxCumulative number of bytes received over the network.
network/rx_errorsCumulative number of errors while receiving over the network.
network/rx_errors_rateNumber of errors while receiving over the network per second.
network/rx_rateNumber of bytes received over the network per second.
network/txCumulative number of bytes sent over the network
network/tx_errorsCumulative number of errors while sending over the network
network/tx_errors_rateNumber of errors while sending over the network
network/tx_rateNumber of bytes sent over the network per second.
uptimeNumber of milliseconds since the container was started.

All custom (aka application) metrics are prefixed with 'custom/'.

Labels

Heapster tags each metric with the following labels.

Label NameDescription
pod_idUnique ID of a Pod
pod_nameUser-provided name of a Pod
container_base_imageBase image for the container
container_nameUser-provided name of the container or full cgroup name for system containers
host_idCloud-provider specified or user specified Identifier of a node
hostnameHostname where the container ran
nodenameNodename where the container ran
labelsComma-separated(Default) list of user-provided labels. Format is 'key:value'
namespace_idUID of the namespace of a Pod
namespace_nameUser-provided name of a Namespace
resource_idA unique identifier used to differentiate multiple metrics of the same type. e.x. Fs partitions under filesystem/usage, disk device name under disk/io_read_bytes
makeMake of the accelerator (nvidia, amd, google etc.)
modelModel of the accelerator (tesla-p100, tesla-k80 etc.)
accelerator_idID of the accelerator

Note

$limit = avg(influx("k8s", '''SELECT mean(value) as value FROM "memory/limit" WHERE type = 'node' GROUP BY nodename, labels''', "${INTERVAL}s", "", ""))

With a comma-separated labels:

nodename=127.0.0.1,labels=beta.kubernetes.io/arch:amd64,beta.kubernetes.io/os:linux,kubernetes.io/hostname:127.0.0.1

When split by a comma, something wrong happened. Bosun split it wrongly to:

nodename=127.0.0.1
labels=labels:beta.kubernetes.io/arch:amd64
beta.kubernetes.io/os.linux
kubernetes.io/hostname:127.0.0.1

Last two tag key-value pairs is wrong. They should not exist and be squashed to labels:

nodename=127.0.0.1
labels=labels:beta.kubernetes.io/arch:amd64,beta.kubernetes.io/os.linux,kubernetes.io/hostname:127.0.0.1

This will make bosun confused and panic with something like "panic: opentsdb: bad tag: beta.kubernetes.io/os:linux".

  • User-provided labels can be stored additionally as separate labels with Heapster --store-label. Similarily, using --ignore-label, labels can be ommited in concatenated labels.

Aggregates

The metrics are initially collected for nodes and containers and later aggregated for pods, namespaces and clusters. Disk and network metrics are not available at container level (only at pod and node level).

Storage Schema

InfluxDB

Default

Each metric translates to a separate 'series' in InfluxDB. Labels are stored as tags. The metric name is not modified.

Using fields

If you want to use InfluxDB fields, you have to add withfields=true as parameter in InfluxDB sink URL. (More information here: https://docs.influxdata.com/influxdb/v0.9/concepts/key_concepts/)

In that case, each metric translates to a separate in 'series' in InfluxDB. This means that some metrics are grouped in the same 'measurement'. For example, we have the measurement 'cpu' with fields 'node_reservation', 'node_utilization', 'request', 'usage', 'usage_rate'. Also, all labels are stored as tags. Here the measurement list: cpu, filesystem, memory, network, uptime

Also, standard Grafana dashboard is not working with this new schema, you have to use new dashboards

Google Cloud Monitoring

Metrics mentioned above are stored along with corresponding labels as custom metrics in Google Cloud Monitoring.

  • Metrics are collected every 2 minutes by default and pushed with a 1 minute precision.

  • Each metric has a custom metric prefix - custom.cloudmonitoring.googleapis.com

  • Each metric is pushed with an additional namespace prefix - kubernetes.io.

  • GCM does not support visualizing cumulative metrics yet. To work around that, heapster exports an equivalent gauge metric for all cumulative metrics mentioned above.

    The gauge metrics use their parent cumulative metric name as the prefix, followed by a "_rate" suffix. E.x.: "cpu/usage", which is cumulative, will have a corresponding gauge metric "cpu/usage_rate" NOTE: The gauge metrics will be deprecated as soon as GCM supports visualizing cumulative metrics.

TODO: Add a snapshot of all the metrics stored in GCM.

Hawkular

Each metric is stored as separate timeseries (metric) in Hawkular-Metrics with tags being inherited from common ancestor type. The metric name is created with the following format: containerName/podId/metricName (/ is separator). Each definition stores the labels as tags with following addons:

  • All the Label descriptions are stored as label_description
  • The ancestor metric name (such as cpu/usage) is stored under the tag descriptor_name
  • To ease search, a tag with group_id stores the key containerName/metricName so each podId can be linked under a single timeseries if necessary.
  • Units are stored under units tag
  • If labelToTenant parameter is given, any metric with the label will use this label's value as the target tenant. If the metric doesn't have the label defined, default tenant is used.

At the start, all the definitions are fetched from the Hawkular-Metrics tenant and filtered to cache only the Heapster metrics. It is recommended to use a separate tenant for Heapster information if you have lots of metrics from other systems, but not required.

The Hawkular-Metrics instance can be a standalone installation of Hawkular-Metrics or the full installation of Hawkular.

 类似资料:

相关阅读

相关文章

相关问答