All Posts

How Kafka Works: The Log That Never Forgets

Kafka is not just a message queue; it's a distributed streaming platform. We explain Topics, Partitions, and Offsets.

Abstract AlgorithmsAbstract Algorithms
ยทยท6 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Kafka is a distributed event store. Unlike a traditional queue (RabbitMQ) where messages disappear after reading, Kafka stores them in a persistent Log. This allows multiple consumers to read the same data at their own pace, replay history, and handle massive scale via Partitions.


๐Ÿ“– The Bookshelf Analogy: Kafka vs Traditional Queues

Imagine a Magazine Subscription Service.

  • Producer (Publisher): Writes articles and drops them into a pile.
  • Topic (The Magazine Category): A named channel โ€” e.g., "Sports", "Tech", "user-clicks".
  • Consumer (Subscriber): Reads the magazines.
Traditional Queue (RabbitMQ)Kafka
Publisher hands you a copy. You read it. It's gone.Publisher places every issue on a numbered bookshelf.
If you were on vacation, you missed it.You read issue #3. Your friend reads #1. Both are still there.
One consumer per message.Many consumers, independent pacing.

This bookshelf model is the foundational insight: Kafka never deletes a message just because it was read.


๐Ÿ”ข Topics, Partitions, and Offsets: The Core Vocabulary

Topics

A Topic is a named log. Think of it as a category: user-clicks, payment-events, driver-locations.

Partitions: The Unit of Scale

A topic is split into Partitions to enable parallel processing:

  • Partition 0: Stores events for Users Aโ€“M.
  • Partition 1: Stores events for Users Nโ€“Z.
flowchart LR
    subgraph Topic: user-clicks
        P0["Partition 0\n(Users Aโ€“M)"] 
        P1["Partition 1\n(Users Nโ€“Z)"]
    end
    Producer -->|key hash| P0
    Producer -->|key hash| P1
    P0 --> Broker1["Broker 1"]
    P1 --> Broker2["Broker 2"]

Two brokers now handle reads and writes in parallel โ€” throughput scales horizontally.

Offsets: Your Bookmark in the Log

Each message in a partition gets an Offset โ€” a monotonically increasing integer (0, 1, 2, โ€ฆ).

Partition 0: [Msg 0] [Msg 1] [Msg 2] [Msg 3] ...
                ^                ^
             Group A          Group B
             (at offset 0)   (at offset 2)

Group A and Group B read the same partition independently. Kafka just tracks each group's current offset.


โš™๏ธ Consumer Groups: Parallel Reading Without Coordination

A Consumer Group is a team of consumers that collectively read a topic.

  • Kafka assigns each partition to exactly one consumer in the group at a time.
  • If a consumer crashes, Kafka reassigns its partition to another group member.

Example โ€” 4 partitions, 4 consumers in the "Analytics" group:

PartitionConsumer
0Consumer A
1Consumer B
2Consumer C
3Consumer D

If Consumer D crashes โ†’ Consumer A picks up Partition 3 automatically.

Important: If you add a 5th consumer to a 4-partition topic, the 5th consumer sits idle. Kafka rule: one consumer per partition per group maximum. Add more partitions to scale beyond that.

Offset commit: Kafka tracks "Group=Analytics, Partition=0, Offset=5". On restart, it resumes from offset 5 โ€” no data loss, no duplicates (at-least-once by default; exactly-once with idempotent producers).


๐Ÿง  Log Compaction and Retention Policies

By default Kafka uses time-based retention (e.g., delete messages older than 7 days). This suits event streams where history has a time horizon.

Log compaction is an alternative: Kafka retains only the latest value per key. This is ideal for changelog topics (e.g., user-profile-updates) where you only need the most recent state per user ID.

PolicyKeepsBest for
Time-based retentionAll messages up to N daysAudit logs, analytics pipelines
Size-based retentionAll messages up to N GBBounded storage environments
Log compactionLatest value per keyState snapshots, CDC topics

๐ŸŒ Kafka in Production: LinkedIn, Uber, Netflix

LinkedIn (Kafka's birthplace): 7 trillion messages per day across pipelines for feed ranking, notifications, and metrics collection.

Uber: The driver-locations topic ingests GPS pings every 5 seconds. Three independent consumer groups read the same stream:

  1. ETA Service โ†’ calculate arrival time.
  2. Audit Service โ†’ store history for billing.
  3. Fraud Service โ†’ detect teleporting drivers.

Zero coordination between the three. No data duplication at the source.

Netflix: Chaos event streams flow through Kafka so multiple observability systems can react independently without coupling.


โš–๏ธ When Kafka Is Overkill vs When It Shines

SituationUse KafkaUse Something Else
Multiple consumers need same eventโœ…โ€”
Replay past eventsโœ…โ€”
High-throughput (millions/sec)โœ…โ€”
Simple job queue, one consumerโ€”RabbitMQ or SQS
Request/response semanticsโ€”gRPC or REST
Sub-millisecond latency requiredโ€”In-memory queues

Hot partition warning: If all producers use the same partition key (e.g., userId=null), all traffic lands on one partition. Monitor partition lag per consumer group and redesign key strategy before it becomes a production incident.


๐Ÿ“Œ Summary

  • Log-based: Messages are stored on disk and retained (default: 7 days). Multiple consumers can re-read.
  • Topics + Partitions: The unit of throughput. More partitions = more parallel consumers.
  • Offsets + Consumer Groups: Durable bookmarks enabling crash-safe resume and independent read pacing.
  • Log compaction: Alternative retention keeping only the latest value per key โ€” great for change data capture.

๐Ÿ“ Practice Quiz

  1. You have a topic with 4 partitions and a consumer group with 5 consumers. What happens to the 5th consumer?

    • A) It reads from a random partition.
    • B) It sits idle.
    • C) It creates a new partition.
      Answer: B
  2. Why is Kafka faster than a traditional database for sequential writes?

    • A) It uses RAM only.
    • B) It appends to the end of a file (sequential I/O), which is much faster than random I/O.
    • C) It compresses data before writing.
      Answer: B
  3. You need strict message ordering per customer. How do you configure Kafka?

    • A) Use 1 partition and route all messages for a customer to it via partition key.
    • B) Use 100 partitions for maximum throughput.
    • C) Use a random partitioner.
      Answer: A

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms