问题：

Kafka消费者委员会

司寇昱

2023-03-14

我正在做一个Kafka的消费者计划。最近我们在PROD环境下进行了部署。在那里,我们面临以下问题：

[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - No. of records fetched: 1
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Group coordinator opl-kafka-prd2-01:9092 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Discovered group coordinator opl-kafka-prd2-01:9092 (id: 2147483644 rack: null)
[kafka-coordinator-heartbeat-thread | otm-opl-group] INFO org.apache.kafka.clients.consumer.internals.AbstractCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Attempt to heartbeat failed for since member id consumer-otm-opl-group-1-953dfa46-9ced-472f-b24f-36d78c6b940b is not valid.
[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - Batch start offset: 9329428
[main] INFO com.cisco.kafka.consumer.RTRKafkaConsumer - Batch Processing Successful.
[main] INFO org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer clientId=consumer-otm-opl-group-1, groupId=otm-opl-group] Failing OffsetCommit request since the consumer is not part of an active group
Exception in thread "main" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1061)
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:936)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1387)
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1349)
    at com.cisco.kafka.consumer.RTRKafkaConsumer.main(RTRKafkaConsumer.java:72)

我的理解是，当组协调器不可用并被重新发现时，心跳间隔（根据文档为3秒）过期，消费者被踢出组。这是正确的吗？。如果是这样的话，应该为这个工作做些什么呢？。如果我错了，请帮助我理解这个问题，并建议您有任何想法，以解决这个问题。如果需要，我可以分享代码。

微生啸

2023-03-14

您所指的异常

Exception in thread "main" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

给出了正在发生的事情和解决问题的方法。在代码中，此异常描述为

“当使用KafkaConsumer#commitsync()进行的偏移量提交失败并出现不可恢复的错误时，会引发此异常。当组重新平衡在成功应用提交之前完成时，可能会发生这种情况。在这种情况下，通常无法重试提交，因为某些分区可能已分配给组中的另一个成员。”

创建越来越多的使用者而不关闭它们
轮询超时
心跳超时
过期的Kerberos票证

如果将使用者添加到现有ConsumerGroup中，则发生重新平衡。因此，在使用后关闭使用者或始终使用相同的实例而不是为每个消息/迭代创建新的KafkaConsumer对象是至关重要的。

[...]对poll（）的后续调用之间的时间长于配置的max.poll.interval.ms，这通常意味着poll循环花费了太多时间处理消息。

错误消息中还给出了一个可能的解决方案

您可以通过增加max.poll.interval.ms或通过使用max.poll.records减小poll()中返回的批的最大大小来解决此问题。

使用者再次读取所有消息，因为（如错误所示）它无法提交偏移量。这意味着，如果您使用相同的group.id启动使用者，它认为它从未从该主题中读取任何内容。

增加heartbeat.interval.ms和session.timeout.ms设置，同时遵循以下建议：“heartbeat.interval.ms必须设置得低于session.timeout.ms值，但通常不应设置得高于该值的1/3。”

请记住，改变这些值总是需要权衡的。你要么有

更频繁的重新平衡，但更短的反应时间来识别已死亡的消费者或
重新平衡的频率较低，识别死亡消费者的反应时间较长。

在我们的生产集群上，我们看到了在应用程序无法续订Kerberos票证之后的CommitFailedException。

Kafka消费者委员会

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档