Default value of
retention.ms attribute on kafka topis is set to 7 days but data still persist for pre 7 days in topic.
Version of kafka : 2.1.11
An interesting problem was that even after data being older then 7 days and
retention.ms attributes not being overridden ( kept at 7 days ) still we were able to see data in topic older then that.
This is normally OK but for some scenarios where kafka topics are used as source of truth to create a in memory cache on startup of application, this could lead to problem:
- As they have to read more data on startup
- Might end up having more data in cache
The problem emerges due the fact of another parameter which is not spoken much segment.ms . This parameter plays a major role.
This parameter decides when does the internal segment of topic gets role. By default again its 7 days.
Now the log cleaner thread only deletes the segment when the last message in a given segment is older then 7 days. So if last message arrived on Saturday and lets say segment rolled on Sunday ( after the week ), the whole segment data ( from last weeks Monday to Sunday) would be available till the next Saturday.
To resolve this it would be sufficient to set segment.ms parameter to 24 hours, so that the segments get rolled every day and as and when 1 week passes the old data gets deleted.