Consumers are clients that read data from Kafka via the KafkaConsumer
API.
Consumer Groups are a bunch of Consumers that share state and coordinate how to distribute the work between them. For example, the group keeps state of what progress it has made reading each partition in the __consumer_offsets
topic.
Two consumers in the same group CANNOT read from the same partition.
Two different groups can read from the same partition though. They are essentially separate markers on the log:
💡 {Consumer Fanout, Read Fanout} is a rough definition that denotes the amount of times a single byte in Kafka is read.
In Kafka, it is considered that an offset that is read once from a consumer isn’t re-read again (usually). Because of this, the number of consumer groups targetting a topic dictate that topic’s fanout ratio. If we have 3 consumer groups reading from topic A, then that topic has a read fanout of 3x because its message at offset
X
will be read three separate times.
Partition Assignment
Within a single group, there can be N consumers (say, 10) reading M partitions (say, 100 topic-partitions). Roughly speaking, even balance would entail each consumer reading from 10 partitions.
How do you decide which consumer reads which partitions, and how do you handle cases where new consumers come online, and/or some go offline?
The Consumer Group Protocol
Kafka has a whole protocol around this idea of assigning consumers to partitions. KIP-848 recently overhauled it, but the old protocol is still supported. I call them v1 and v2.
- v1: a random consumer client is picked to be the Group Leader. It initiates an involved protocol dance that through two request/response cycles manages to collect information about every consumer member of the group and assign partitions to it.
- v2: the group logic is simplified, the request-response dance is removed in favor of a heartbeat approach. A lot of logic, including the one deciding which partition to assign to which consumer, is moved to the server (Kafka broker).
- See my write-up and video on it here: https://blog.2minutestreaming.com/p/kafka-kip-848-new-consumer-group-protocol