The Log is the data structure that Kafka is based on. It’s very simple.

It’s an ordered structure of bytes that only supports appends — you can’t edit or delete records in place.

the log

💡 In Kafka, the order is denoted by the monotonically increasing offset. The log starts at 0, so the first record is at offset 0. Second at 1, and etc.

There are a lot of benefits to it:

  1. ordering: records are ordered sequentially as they’re ingested, so you know what comes after which.

  2. fast: writes and reads from the end (tail) of the log are O(1)

  3. HDD-friendly: it’s incredibly efficient for hard drives because of its sequential read/write patterns. This aligns with how HDDs perform best — large, linear IO operations through its mechanical actuator arm and spinning platter design. It benefits from OS-level and hardware-level IO batching.

  4. big data friendly: its performance doesn’t degrade with size. A 20TB log performs roughly the same as a 1GB log.

  5. read parallelism friendly: because it’s append-only, there is no need for locking when multiple readers are accessing it.

  6. simple: it’s easy to understand, easy to optimize for and easy to implement. That’s worth a lot.

It is a core reason for why Kafka is so fast. 🔥🏎️

💡 {Log, Write-Ahead Log (WAL), Commit Log, Transaction Log} are all the same thing underneath - a log. The only difference is what payload t carries.

Application Logs also use the same underlying structure - an append-only sequence of string messages coming from your app.