Consistent Hashing: Scaling Without Chaos
How do you distribute data across 100 servers so that adding a new server doesn't require moving...
Abstract AlgorithmsTLDR: Standard hashing (
key % N) breaks when $N$ changes — adding or removing a server reshuffles almost all keys. Consistent Hashing maps both servers and keys onto a ring (0–360°). When a server is added, only its immediate neighbors' keys move, minimizing data migration.
📖 The Problem with key % N: Why Normal Hashing Fails at Scale
Suppose you have 3 cache servers and you route requests like this:
server = hash(key) % 3
Works great — until you add a 4th server. Now hash(key) % 4 is different for almost every key. Nearly all cached data needs to be remapped, causing a cache stampede — every backend database query misses cache simultaneously.
Same problem when a server goes down: hash(key) % 2 reroutes most keys to different servers. Data that was cached is no longer reachable.
The core issue: modular hashing is globally sensitive to $N$.
⚙️ The Ring: One Hash Space for Both Servers and Keys
Consistent hashing fixes this by mapping everything — both servers and keys — onto the same circular hash space (commonly 0 to $2^{32} - 1$).
graph TD
subgraph Hash Ring
direction LR
A((0)) --> B((ServerA\n@90))
B --> C((ServerB\n@180))
C --> D((ServerC\n@270))
D --> A
end
Placement rules:
- Hash each server's name/IP to find its position on the ring.
- Hash each key to find its position on the ring.
- A key belongs to the first server encountered when walking clockwise from the key's position.
Example:
ServerAis at position 90.ServerBis at position 180.ServerCis at position 270.- Key at position 100 → walks clockwise → first server hit is
ServerB(position 180).
🔢 Adding or Removing a Server: Only Neighbors Move
This is the breakthrough. When you add ServerD at position 135:
- Keys previously assigned to
ServerB(in the range 91–180) that now fall in range 91–135 are reassigned toServerD. - Everything else is unaffected.
flowchart LR
A[Add ServerD at 135] --> B{Which keys move?}
B --> C[Keys 91-135\nmove to ServerD]
B --> D[Keys 0-90 → ServerA: unchanged]
B --> E[Keys 136-180 → ServerB: unchanged]
B --> F[All other keys: unchanged]
Before: $N$ servers → adding 1 requires moving approximately $K/N$ keys ($K$ = total keys).
Naive % N: adding 1 requires remapping most keys.
That's the efficiency win of consistent hashing.
🧠 Virtual Nodes: Fixing Uneven Distribution
Raw consistent hashing has a problem: if servers hash to clustered positions on the ring, one server gets most keys and others get few.
Solution: Virtual Nodes (vnodes)
Each physical server is placed at multiple positions on the ring by hashing variants of its name:
ServerA-0 → position 10
ServerA-1 → position 50
ServerA-2 → position 200
ServerB-0 → position 30
ServerB-1 → position 120
ServerB-2 → position 250
With 150 virtual nodes per server (Cassandra's default), the distribution becomes statistically uniform, regardless of how physical servers hash.
| Setting | Distribution | Failure Impact |
| 1 vnode/server | Uneven, lottery of positions | One server takes on all of failed server's load |
| 150 vnodes/server | Near-uniform | Failed server's keys spread across all healthy servers |
🌍 Where Consistent Hashing Powers Real Systems
- Apache Cassandra: Uses vnodes (default 256 per server) for token assignment.
- Amazon DynamoDB: Consistent hashing of partition keys across storage nodes.
- Memcached / Redis Cluster: Client-side consistent hashing libraries route cache keys to shards.
- Nginx / HAProxy: Some load balancing plugins use consistent hashing for session affinity.
When Cassandra adds a node, only the data in that node's token range is streamed over — not a full reshuffle.
⚖️ Limits and When It Doesn't Help
| Concern | Detail |
| Hot keys | A single key (e.g., celebrity tweet) can overwhelm one vnode despite balanced distribution |
| Data skew | Non-uniform key distributions still concentrate load |
| Read replicas | Consistent hashing defines primary placement; replication factor adds complexity |
| Small clusters | With 3 servers, even 150 vnodes overlap enough to cause imbalance |
Virtual nodes help with structural balance but do not fix hot-key problems. Those require application-level solutions (key sharding with suffixes, caching at a higher layer).
📌 Key Takeaways
key % Nremaps almost all keys when $N$ changes — consistent hashing avoids this.- Both servers and keys are placed on a shared ring; a key maps to the next server clockwise.
- Adding/removing a server only moves keys in the adjacent arc — not the full key space.
- Virtual nodes (150–256 per server) ensure uniform distribution despite hash collisions.
- Used in Cassandra, DynamoDB, Memcached, and Redis Cluster.
🧩 Test Your Understanding
- With
key % 3, what percentage of keys need remapping when you add a 4th server? - A key is at position 100 on the ring.
ServerAis at 90,ServerBis at 200. Which server owns the key? - Why do virtual nodes improve fault recovery compared to single-position servers?
- Consistent hashing solves uneven distribution. Does it solve hot keys? Why or why not?
🔗 Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. �...
