Identifying Hidden Kafka Consumer after re-balance

Recently we got stuck in a situation to find a Kafka topics’s consumer which was creating an issue and turned out to difficult to identify.

Problem :

We started noticing that at times we would get re-balance Kafka consumer group and suddenly we would notice our application consumer stop getting messages . And then before we could run some diagnostics it would re-balance again and our consumer would start getting messages but we would have lost some messages in between .

Solution

  • We tried to identify the consumer IP address using Kafka-consumer-groups.sh : This command for a given group id shows the the IP address of the consumer for a given topic . But as mentioned in the problem our group would re-balance before we realized and we wont know who was the consumer in that intermittent period .
  • Changing client side logging for apache to debug : We enabled Kafka logging on client side for package org.apache.kafka.clients.consumer to debug level . This starts printing lots of information like heartbeat and committing of offsets . But what we were interested in the logs printed about re-balance of given consumer group . It showed us that post re-balance our consumer lost control of the topic subscription . This confirmed that there was another mysterious consumer who was snatching this control . But still we didn’t knew who was it and where was it .
  • Changing the Kafka server logging to debug : So we went to Kafka server side . Stopped all the Kafka servers in the cluster except one and changed the logs in log4j.properties in config folder as shown below

log4j.logger.kafka=DEBUG
log4j.logger.org.apache.kafka=DEBUG

This started printing all information and we started drilling server.log file of Kafka server and hola!! we found it . At the start of Kafka server we found couple of lines as below

[2019-07-18 06:01:31,773] DEBUG Processor 0 listening to new connection from /10.50.1.2:40404 (kafka.network.Processor)
 [2019-07-18 06:01:35,476] DEBUG Processor 1 listening to new connection from /10.55.3.49:60514 (kafka.network.Processor)

We knew one of the server but didn’t identify the other one so we looked into it there it was trying to create connection causing re-balance and stealing messages .

So what’s next we killed it and all’s well !!

Comments are closed.

Blog at WordPress.com.

Up ↑

%d bloggers like this: