On a high level, Kafka provides following guarantees:
-
Messages sent by a producer to a particular topic partition will be appended in the order they are sent.
-
If a message M1 is sent by the same producer as a message M2, and M1 is sent first, then M1 will have a lower offset than M2 and appear earlier in the log.
-
-
A consumer instance sees messages in the order they are stored in the log.
-
For a topic with replication factor N, Kafka will tolerate up to N-1 server failures without losing any messages committed to the log.
There are multiple possible ways to deliver messages:
-
Messages are never redelivered but may be lost
-
Messages may be redelivered but never lost
-
Messages are delivered once and only once
When publishing, a message is committed to the log
If a producer experiences a network error while publishing, it can never be sure if this error happened before or after the message was committed.
Once committed, the message will not be lost.
Guaranteed message publishing
For guaranteed message publishing, configurations such as getting acknowledgements and the waiting time for messages being committed are provided at the producer’s end.
From the consumer point-of-view, replicas have exactly the same log with the same offsets, and the consumer controls its position in this log.
For consumers, Kafka guarantees that the message will be delivered at least once by reading the messages, processing the messages, and finally saving their position.
If the consumer process crashes after processing messages but before saving their position, another consumer process takes over the topic partition and may receive the first few messages, which are already processed.
Message Compression
Kafka provides a message group compression feature for efficient message delivery.
Kafka message compression feature allow recursive message sets where the compressed message may have infinite depth relative to messages within itself.
Kafka message compression feature provide reduced network overhead for the compressed message set as batch of messages is compressed together and sent.
Kafka message compression feature may however also cause a degradation in broker performance in the case of compressed messages.
Data is compressed by the message producer using either the GZIP or Snappy compression protocols.
The following producer configurations need to be provided to use compression at the producer’s end
-
compression.codec
-
compressed.topics
The ByteBufferMessageSet class representing message sets may consist of both uncompressed as well as compressed data. To differentiate between compressed and uncompressed messages, a compression-attributes byte is introduced in the message header.
Log compaction
Last known value for each message key in the log for a topic partition is retained, removing the records where a more recent update with the same primary key is done.
Log compaction is a mechanism to achieve per-message retention, rather than time-based retention.
Log compaction also addresses system failure cases, system restarts etc.
Log compaction is handled by a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log.
The retention policy can be set on a per-topic basis such as time based, size-based, or log compaction-based.
Log compaction ensures the following:
-
Ordering of messages is always maintained
-
The messages will have sequential offsets and the offset never changes
-
Reads progressing from offset 0, or the consumer progressing from the start of the log, will see at least the final state of all records in the order they were written.
- heartin's blog
- Log in or register to post comments
Recent comments