How Kafka Works: The Log That Never Forgets
Kafka is not just a message queue; it's a distributed streaming platform. We explain Topics, Partitions, and Offsets.
Abstract AlgorithmsTLDR: Kafka is a distributed event store. Unlike a traditional queue (RabbitMQ) where messages disappear after reading, Kafka stores them in a persistent Log. This allows multiple consumers to read the same data at their own pace, replay history, and handle massive scale via Partitions.
๐ The Bookshelf Analogy: Kafka vs Traditional Queues
Imagine a Magazine Subscription Service.
- Producer (Publisher): Writes articles and drops them into a pile.
- Topic (The Magazine Category): A named channel โ e.g., "Sports", "Tech", "user-clicks".
- Consumer (Subscriber): Reads the magazines.
| Traditional Queue (RabbitMQ) | Kafka |
| Publisher hands you a copy. You read it. It's gone. | Publisher places every issue on a numbered bookshelf. |
| If you were on vacation, you missed it. | You read issue #3. Your friend reads #1. Both are still there. |
| One consumer per message. | Many consumers, independent pacing. |
This bookshelf model is the foundational insight: Kafka never deletes a message just because it was read.
๐ข Topics, Partitions, and Offsets: The Core Vocabulary
Topics
A Topic is a named log. Think of it as a category: user-clicks, payment-events, driver-locations.
Partitions: The Unit of Scale
A topic is split into Partitions to enable parallel processing:
- Partition 0: Stores events for Users AโM.
- Partition 1: Stores events for Users NโZ.
flowchart LR
subgraph Topic: user-clicks
P0["Partition 0\n(Users AโM)"]
P1["Partition 1\n(Users NโZ)"]
end
Producer -->|key hash| P0
Producer -->|key hash| P1
P0 --> Broker1["Broker 1"]
P1 --> Broker2["Broker 2"]
Two brokers now handle reads and writes in parallel โ throughput scales horizontally.
Offsets: Your Bookmark in the Log
Each message in a partition gets an Offset โ a monotonically increasing integer (0, 1, 2, โฆ).
Partition 0: [Msg 0] [Msg 1] [Msg 2] [Msg 3] ...
^ ^
Group A Group B
(at offset 0) (at offset 2)
Group A and Group B read the same partition independently. Kafka just tracks each group's current offset.
โ๏ธ Consumer Groups: Parallel Reading Without Coordination
A Consumer Group is a team of consumers that collectively read a topic.
- Kafka assigns each partition to exactly one consumer in the group at a time.
- If a consumer crashes, Kafka reassigns its partition to another group member.
Example โ 4 partitions, 4 consumers in the "Analytics" group:
| Partition | Consumer |
| 0 | Consumer A |
| 1 | Consumer B |
| 2 | Consumer C |
| 3 | Consumer D |
If Consumer D crashes โ Consumer A picks up Partition 3 automatically.
Important: If you add a 5th consumer to a 4-partition topic, the 5th consumer sits idle. Kafka rule: one consumer per partition per group maximum. Add more partitions to scale beyond that.
Offset commit: Kafka tracks "Group=Analytics, Partition=0, Offset=5". On restart, it resumes from offset 5 โ no data loss, no duplicates (at-least-once by default; exactly-once with idempotent producers).
๐ง Log Compaction and Retention Policies
By default Kafka uses time-based retention (e.g., delete messages older than 7 days). This suits event streams where history has a time horizon.
Log compaction is an alternative: Kafka retains only the latest value per key. This is ideal for changelog topics (e.g., user-profile-updates) where you only need the most recent state per user ID.
| Policy | Keeps | Best for |
| Time-based retention | All messages up to N days | Audit logs, analytics pipelines |
| Size-based retention | All messages up to N GB | Bounded storage environments |
| Log compaction | Latest value per key | State snapshots, CDC topics |
๐ Kafka in Production: LinkedIn, Uber, Netflix
LinkedIn (Kafka's birthplace): 7 trillion messages per day across pipelines for feed ranking, notifications, and metrics collection.
Uber: The driver-locations topic ingests GPS pings every 5 seconds. Three independent consumer groups read the same stream:
- ETA Service โ calculate arrival time.
- Audit Service โ store history for billing.
- Fraud Service โ detect teleporting drivers.
Zero coordination between the three. No data duplication at the source.
Netflix: Chaos event streams flow through Kafka so multiple observability systems can react independently without coupling.
โ๏ธ When Kafka Is Overkill vs When It Shines
| Situation | Use Kafka | Use Something Else |
| Multiple consumers need same event | โ | โ |
| Replay past events | โ | โ |
| High-throughput (millions/sec) | โ | โ |
| Simple job queue, one consumer | โ | RabbitMQ or SQS |
| Request/response semantics | โ | gRPC or REST |
| Sub-millisecond latency required | โ | In-memory queues |
Hot partition warning: If all producers use the same partition key (e.g., userId=null), all traffic lands on one partition. Monitor partition lag per consumer group and redesign key strategy before it becomes a production incident.
๐ Summary
- Log-based: Messages are stored on disk and retained (default: 7 days). Multiple consumers can re-read.
- Topics + Partitions: The unit of throughput. More partitions = more parallel consumers.
- Offsets + Consumer Groups: Durable bookmarks enabling crash-safe resume and independent read pacing.
- Log compaction: Alternative retention keeping only the latest value per key โ great for change data capture.
๐ Practice Quiz
You have a topic with 4 partitions and a consumer group with 5 consumers. What happens to the 5th consumer?
- A) It reads from a random partition.
- B) It sits idle.
- C) It creates a new partition.
Answer: B
Why is Kafka faster than a traditional database for sequential writes?
- A) It uses RAM only.
- B) It appends to the end of a file (sequential I/O), which is much faster than random I/O.
- C) It compresses data before writing.
Answer: B
You need strict message ordering per customer. How do you configure Kafka?
- A) Use 1 partition and route all messages for a customer to it via partition key.
- B) Use 100 partitions for maximum throughput.
- C) Use a random partitioner.
Answer: A

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
