Why Kafka is so fast?

Recently I was exploring RabbitMQ and Kafka to do some POCs. I found that Kafka provides very high throughput, which refers to the ability to process a large number of messages in a short amount of time, as compared to RabbitMQ (not RabbitMQ streams). I started exploring why Kafka is so fast.

The main reason behind why Kafka is so fast is the way it stores data on disk and some operating system optimization techniques that it uses.

A Little Correction

Most people think that Kafka is a queue, but that's wrong. At its very core, Kafka is a distributed commit log. It is not like a traditional queue data structure where you push and consume data from different ends.

So What is Commit Log?

A commit log is one of the simplest yet most powerful data storage concepts used in distributed systems. At its core, a commit log is an append-only sequence of records stored on disk. Every new piece of data is simply appended to the end of the log, and nothing is modified or removed from the middle.

Before we start, we need to understand some Kafka internal components and how it stores data.

Offsets

Before we talk about how messages are stored using store and index files, we need to understand one important concept in commit logs: offsets.

An offset is a unique identifier assigned to every message written to the log. Instead of identifying messages using IDs or keys, the system simply assigns an incrementing number to each message as it is appended.

For example, if a log starts empty, the first few messages written to it might look like this:

Offset 0 → Message A
Offset 1 → Message B
Offset 2 → Message C
Offset 3 → Message D
Where does Kafka Stores it's Data?

Kafka and other commit logs usually store all their data on the system disc. They do not use any database for that. Instead Kafka relies on operating system optimisations for high throughput. That's why Kafka throughput depends on how good your disc is.

How Messages Are Stored in a Commit Log

Now that we understand what a commit log is, let's look at how messages are actually stored on disk.

At a high level, a commit log stores data using two main components:

  1. The store file – where the actual message bytes are written.
  2. The index file – which maps offsets to positions in the store file.

These two structures allow the system to append messages sequentially while still supporting fast lookups.

The Store File

The store file is where the actual message payload is written. Messages are written sequentially to the end of the file in an append-only fashion. However, simply writing messages one after another would create a problem: when reading data back, how would the system know where one message ends and the next begins?

To solve this, each message is stored with a small header. Before writing the message payload, the system first writes the length of the message. So the data layout inside the store file looks like this:

| message length | message bytes |
| message length | message bytes |
| message length | message bytes |

When reading a message, the system first reads the length and then reads the corresponding number of bytes. This makes it possible to efficiently traverse the log.

The Index File

While sequential writes are great for performance, reading messages directly from the store file would be slow if we had to scan from the beginning every time. This is where the index file comes in.

The index stores a mapping between:

  • Offset → Position in the store file

Each index entry contains two pieces of information:

  • The offset of the message within the segment
  • The byte position where that message begins in the store file

Conceptually it looks like this:

| offset | position |
| offset | position |
| offset | position |

When a message is appended:

  1. The message is written to the store file.
  2. The system records the byte position where the message was written.
  3. An entry is added to the index mapping the offset to that position.

This allows the system to jump directly to the correct location in the store file when reading a message.

Segments

If a commit log stored all messages in a single growing file, the file would eventually become extremely large and difficult to manage. To solve this problem, commit logs divide the log into multiple smaller files called segments.

Each segment is responsible for storing a continuous range of offsets. A segment typically consists of two files: a store file and an index file. We talked about both of them earlier.

Example

Suppose a log starts with offset 0 and each segment can store 1000 messages.

Segment 0 (base offset: 0)
Offsets: 0 - 999

Segment 1 (base offset: 1000)
Offsets: 1000 - 1999

Segment 2 (base offset: 2000)
Offsets: 2000 - 2999

If a consumer wants to read message offset 1450, the system can quickly determine that this offset belongs to the segment with base offset 1000, and then use that segment’s index file to find the exact position of the message in the store file.

Techniques Kafka Uses

Indexing Based Storage Model

Kafka uses an indexing-based storage model. This way, whenever a message has to be read, instead of reading from the large store file, Kafka just uses the index file to locate the message position in the store file and directly reads it.

Sequential I/O

There are 2 types of I/Os. Random I/O and Sequential I/O. Random I/O is the process of reading data from disc that is stored in non-contiguous memory locations. Reading data from non-contiguous locations is slow. So Kafka uses sequential I/O in which data is stored in contiguous memory locations. This is faster because the disc head only needs to move in a straight line instead of jumping here and there.

Zero Copy Principle

It is the technique of minimising the number of times data is copied between different memory locations. For example, when a web server serves files, the data is read from disc, then goes to the kernel buffer, then the socket buffer and finally the network card. The zero-copy principle allows us to directly copy data from disc to network card.

Memory Mapping: This is one of the ways to implement the zero-copy principle. This allows us to map file contents directly into application memory, allowing the kernel to write directly to that mapped space, avoiding a copy into user-space buffers. This way you can write data in a file byte-wise just like you write in an array. This provides a huge speed boost.

Page Cache Utilization

Page cache is the portion of RAM used to store temporary data. When Kafka writes data instead of storing it on disc, it first writes data to the page cache. Later, this data is handled by the OS. This also increases speed.

Batching

Kafka batches messages together, allowing producers to send large chunks of data rather than small, individual messages. This reduces network overhead.

Partitioning & Parallelism

A Kafka topic can be divided into partitions. This is done to provide horizontal scalability. Each partition can live on a different server. This allows producers and consumers to produce and consume messages in parallel.

Before You Go

If you made it this far, Thank You.

I usually write about backend engineering, distributed systems, and things I learn while working on real problems. Not theory — mostly practical stuff that I wish someone had explained to me earlier.

I run a free newsletter where I share these kinds of write-ups. No spam. Just occasional backend engineering notes.

Share: X LinkedIn