6.6 监控

优质
小牛编辑
140浏览
2023-12-01
Kafka在服务器和Scala客户端使用Yammer Metrics输出指标报告。 Java客户端通过使用Kafka Metrics,一个内置的指标注册表,最大限度地减少了传递到客户端应用程序的依赖关系。两者都通过JMX公开指标,并且可以配置为使用可插拔的统计记录器,连接到您的监控系统来生成统计报告。

查看可用指标的最简单方法是启动jconsole并将其指向正在运行的kafka客户端或服务器;这样你就能浏览JMX的所有指标。

我们对以下指标进行图形化和告警:

DescriptionMbean nameNormal value
Message in ratekafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
Byte in ratekafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
Request ratekafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}
Byte out ratekafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec
Log flush rate and timekafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs
# of under replicated partitions (|ISR| &lt |all replicas|)kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions0
# of under minIsr partitions (|ISR| &lt min.insync.replicas)kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount0
# of offline log directorieskafka.log:type=LogManager,name=OfflineLogDirectoryCount0
Is controller active on brokerkafka.controller:type=KafkaController,name=ActiveControllerCountonly one broker in the cluster should have 1
Leader election ratekafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMsnon-zero when there are broker failures
Unclean leader election ratekafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec0
Partition countskafka.server:type=ReplicaManager,name=PartitionCountmostly even across brokers
Leader replica countskafka.server:type=ReplicaManager,name=LeaderCountmostly even across brokers
ISR shrink ratekafka.server:type=ReplicaManager,name=IsrShrinksPerSecIf a broker goes down, ISR for some of the partitions willshrink. When that broker is up again, ISR will be expandedonce the replicas are fully caught up. Other than that, theexpected value for both ISR shrink rate and expansion rate is 0.
ISR expansion ratekafka.server:type=ReplicaManager,name=IsrExpandsPerSecSee above
Max lag in messages btw follower and leader replicaskafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replicalag should be proportional to the maximum batch size of a produce request.
Lag in messages per follower replicakafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+)lag should be proportional to the maximum batch size of a produce request.
Requests waiting in the producer purgatorykafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Producenon-zero if ack=-1 is used
Requests waiting in the fetch purgatorykafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetchsize depends on fetch.wait.max.ms in the consumer
Request total timekafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}broken into queue, local, remote and response send time
Time the request waits in the request queuekafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
Time the request is processed at the leaderkafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}
Time the request waits for the followerkafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}non-zero for produce requests when ack=-1
Time the request waits in the response queuekafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}
Time to send the responsekafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}
Number of messages the consumer lags behind the producer by. Published by the consumer, not broker.

Old consumer: kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)

New consumer: kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} Attribute: records-lag-max

The average fraction of time the network processors are idlekafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercentbetween 0 and 1, ideally &gt 0.3
The average fraction of time the request handler threads are idlekafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercentbetween 0 and 1, ideally &gt 0.3
Bandwidth quota metrics per (user, client-id), user or client-idkafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+)Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. byte-rate indicates the data produce/consume rate of the client in bytes/sec. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.
Request quota metrics per (user, client-id), user or client-idkafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+)Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. request-time indicates the percentage of time spent in broker network and I/O threads to process requests from client group. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.
Requests exempt from throttlingkafka.server:type=Requestexempt-throttle-time indicates the percentage of time spent in broker network and I/O threads to process requests that are exempt from throttling.

producer/consumer/connect/streams的通用监控指标

以下指标可用于 producer/consumer/connector/streams 实例。有关具体指标,请参阅以下部分。
Metric/Attribute nameDescriptionMbean name
connection-close-rateConnections closed per second in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
connection-creation-rateNew connections established per second in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
network-io-rateThe average number of network operations (reads or writes) on all connections per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
outgoing-byte-rateThe average number of outgoing bytes sent per second to all servers.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
request-rateThe average number of requests sent per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
request-size-avgThe average size of all requests in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
request-size-maxThe maximum size of any request sent in the window.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
incoming-byte-rateBytes/second read off all sockets.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
response-rateResponses received sent per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
select-rateNumber of times the I/O layer checked for new I/O to perform per second.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
io-wait-time-ns-avgThe average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
io-wait-ratioThe fraction of time the I/O thread spent waiting.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
io-time-ns-avgThe average length of time for I/O per select call in nanoseconds.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
io-ratioThe fraction of time the I/O thread spent doing I/O.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
connection-countThe current number of active connections.kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)

producer/consumer/connect/streams对应的每个broker的通用指标

以下指标可用于 producer/consumer/connector/streams 实例。有关具体指标,请参阅以下部分。
Metric/Attribute nameDescriptionMbean name
outgoing-byte-rateThe average number of outgoing bytes sent per second for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-rateThe average number of requests sent per second for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-size-avgThe average size of all requests in the window for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-size-maxThe maximum size of any request sent in the window for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
incoming-byte-rateThe average number of responses received per second for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-latency-avgThe average request latency in ms for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
request-latency-maxThe maximum request latency in ms for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
response-rateResponses received sent per second for a node.kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)

Producer 监控

以下指标可用于 producer 实例。
Metric/Attribute nameDescriptionMbean name
waiting-threadsThe number of user threads blocked waiting for buffer memory to enqueue their records.kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-total-bytesThe maximum amount of buffer memory the client can use (whether or not it is currently used).kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-available-bytesThe total amount of buffer memory that is not being used (either unallocated or in the free list).kafka.producer:type=producer-metrics,client-id=([-.\w]+)
bufferpool-wait-timeThe fraction of time an appender waits for space allocation.kafka.producer:type=producer-metrics,client-id=([-.\w]+)

Producer 发送人指标

新的 consumer 监控

以下指标可用于新的 consumer 实例。

Consumer Group Metrics

Metric/Attribute nameDescriptionMbean name
commit-latency-avgThe average time taken for a commit requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-latency-maxThe max time taken for a commit requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-rateThe number of commit calls per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
assigned-partitionsThe number of partitions currently assigned to this consumerkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
heartbeat-response-time-maxThe max time taken to receive a response to a heartbeat requestkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
heartbeat-rateThe average number of heartbeats per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-time-avgThe average time taken for a group rejoinkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-time-maxThe max time taken for a group rejoinkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
join-rateThe number of group joins per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-time-avgThe average time taken for a group synckafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-time-maxThe max time taken for a group synckafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
sync-rateThe number of group syncs per secondkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
last-heartbeat-seconds-agoThe number of seconds since the last controller heartbeatkafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)

Consumer Fetch Metrics

Connect监控

一个Connect工作进程包含所有的producer和consumer指标以及特定于Connect的指标。这个工作进程本身有许多指标,而每个连接器和任务都还有额外的指标。

Streams监控

一个Kafka Streams实例包含所有的producer和consumer指标以及特定于streams的额外指标。 默认情况下,Kafka Streams有两个记录级别的指标:debug和info。debug级别记录所有的度量,而info级别只记录线程级别的度量。

注意,这些指标有一个3层的层次结构。在顶层,有每个线程的指标。每个线程都有任务,并有自己的指标。每个任务都有许多处理器节点,并有自己的指标。每个任务都有许多状态存储和记录缓存,所有这些都有自己的指标。

使用以下配置选项指定要收集哪些指标:
metrics.recording.level="info"

Thread指标

以下所有指标的记录级别都是``info``:
Metric/Attribute nameDescriptionMbean name
commit-latency-avgThe average execution time in ms for committing, across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
commit-latency-maxThe maximum execution time in ms for committing across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
poll-latency-avgThe average execution time in ms for polling, across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
poll-latency-maxThe maximum execution time in ms for polling across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
process-latency-avgThe average execution time in ms for processing, across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
process-latency-maxThe maximum execution time in ms for processing across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
punctuate-latency-avgThe average execution time in ms for punctuating, across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
punctuate-latency-maxThe maximum execution time in ms for punctuating across all running tasks of this thread.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
commit-rateThe average number of commits per second across all tasks.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
poll-rateThe average number of polls per second across all tasks.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
process-rateThe average number of process calls per second across all tasks.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
punctuate-rateThe average number of punctuates per second across all tasks.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
task-created-rateThe average number of newly created tasks per second.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
task-closed-rateThe average number of tasks closed per second.kafka.streams:type=stream-metrics,client-id=([-.\w]+)
skipped-records-rateThe average number of skipped records per second.kafka.streams:type=stream-metrics,client-id=([-.\w]+)

Task指标

以下所有指标的记录级别都是``debug``:
Metric/Attribute nameDescriptionMbean name
commit-latency-avgThe average commit time in ns for this task.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
commit-latency-maxThe maximum commit time in ns for this task.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)
commit-rateThe average number of commit calls per second.kafka.streams:type=stream-task-metrics,client-id=([-.\w]+),task-id=([-.\w]+)

Processor Node指标

以下所有指标的记录级别都是``debug``:
Metric/Attribute nameDescriptionMbean name
process-latency-avgThe average process execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-latency-maxThe maximum process execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-latency-avgThe average punctuate execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-latency-maxThe maximum punctuate execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-latency-avgThe average create execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-latency-maxThe maximum create execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-latency-avgThe average destroy execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-latency-maxThe maximum destroy execution time in ns.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
process-rateThe average number of process operations per second.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
punctuate-rateThe average number of punctuate operations per second.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
create-rateThe average number of create operations per second.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
destroy-rateThe average number of destroy operations per second.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)
forward-rateThe average rate of records being forwarded downstream, from source nodes only, per second.kafka.streams:type=stream-processor-node-metrics,client-id=([-.\w]+),task-id=([-.\w]+),processor-node-id=([-.\w]+)

State Store指标

以下所有指标的记录级别都是``debug``:
Metric/Attribute nameDescriptionMbean name
put-latency-avgThe average put execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-latency-maxThe maximum put execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-if-absent-latency-avgThe average put-if-absent execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-if-absent-latency-maxThe maximum put-if-absent execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
get-latency-avgThe average get execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
get-latency-maxThe maximum get execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
delete-latency-avgThe average delete execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
delete-latency-maxThe maximum delete execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-all-latency-avgThe average put-all execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-all-latency-maxThe maximum put-all execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
all-latency-avgThe average all operation execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
all-latency-maxThe maximum all operation execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
range-latency-avgThe average range execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
range-latency-maxThe maximum range execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
flush-latency-avgThe average flush execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
flush-latency-maxThe maximum flush execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
restore-latency-avgThe average restore execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
restore-latency-maxThe maximum restore execution time in ns.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-rateThe average put rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-if-absent-rateThe average put-if-absent rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
get-rateThe average get rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
delete-rateThe average delete rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
put-all-rateThe average put-all rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
all-rateThe average all operation rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
range-rateThe average range rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
flush-rateThe average flush rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)
restore-rateThe average restore rate for this store.kafka.streams:type=stream-[store-type]-state-metrics,client-id=([-.\w]+),task-id=([-.\w]+),[store-type]-state-id=([-.\w]+)

Record Cache指标

以下所有指标的记录级别都是``debug``:
Metric/Attribute nameDescriptionMbean name
hitRatio-avgThe average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.kafka.streams:type=stream-record-cache-metrics,client-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
hitRatio-minThe mininum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,client-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)
hitRatio-maxThe maximum cache hit ratio.kafka.streams:type=stream-record-cache-metrics,client-id=([-.\w]+),task-id=([-.\w]+),record-cache-id=([-.\w]+)

其它

我们建议监视GC时间和其它状态,以及各种服务器状态,如CPU利用率、I/O服务时间等。 在客户端,我们建议监视消息/字节率(全局和每个topic)、请求速率/大小/时间以及消费者端上所有分区之间的最大延迟消息和最小获取请求速率。为了使消费者跟上生产的速率,最大延迟需要小于阈值,最小提取率需要大于0。

审计

我们所做的最后一个提示是关于数据交付的正确性。我们审计发送的每条消息都被所有消费者使用,并测量发生的延迟。对于重要的topics,如果在某个时间段内没有达到一定的完整性,我们会提醒。这个细节在KAFKA-260中讨论。