Retention period and issue with Kafka data not getting deleted

Problem

Default value of retention.ms attribute on kafka topis is set to 7 days but data still persist for pre 7 days in topic.

Version of kafka : 2.1.11

An interesting problem was that even after data being older then 7 days and retention.ms attributes not being overridden ( kept at 7 days ) still we were able to see data in topic older then that.

This is normally OK but for some scenarios where kafka topics are used as source of truth to create a in memory cache on startup of application, this could lead to problem:

  • As they have to read more data on startup
  • Might end up having more data in cache

The problem emerges due the fact of another parameter which is not spoken much segment.ms . This parameter plays a major role.

This parameter decides when does the internal segment of topic gets role. By default again its 7 days.

Now the log cleaner thread only deletes the segment when the last message in a given segment is older then 7 days. So if last message arrived on Saturday and lets say segment rolled on Sunday ( after the week ), the whole segment data ( from last weeks Monday to Sunday) would be available till the next Saturday.

Solution

To resolve this it would be sufficient to set segment.ms parameter to 24 hours, so that the segments get rolled every day and as and when 1 week passes the old data gets deleted.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: