Consistent Hashing: Scaling Without Chaos
How do you distribute data across 100 servers so that adding a new server doesn't require moving...

Abstract Algorithms
Helping engineers master software engineering topics.
TLDR: Standard hashing (
key % N) breaks when $N$ changes — adding or removing a server reshuffles almost all keys. Consistent Hashing maps both servers and keys onto a ring (0–360°). When a server is added, only its immediate neighbors' keys move, minimizing data migration.
📖 The Problem with key % N: Why Normal Hashing Fails at Scale
Suppose you have 3 cache servers and you route requests like this:
server = hash(key) % 3
Works great — until you add a 4th server. Now hash(key) % 4 is different for almost every key. Nearly all cached data needs to be remapped, causing a cache stampede — every backend database query misses cache simultaneously.
Same problem when a server goes down: hash(key) % 2 reroutes most keys to different servers. Data that was cached is no longer reachable.
The core issue: modular hashing is globally sensitive to $N$.
🔍 Core Vocabulary: Keys, Hashes, and Rings
Before diving into the ring mechanics, clarify the key terms:
| Term | Meaning |
| Hash function | Maps an arbitrary key to a number in a fixed range (e.g., 0 to 2³²−1) |
| Key | The identifier you want to route — a user ID, session token, or cache key |
| Node / Server | A physical or virtual machine that stores or processes data |
| Token | The position a node occupies on the hash ring |
| Virtual node (vnode) | A logical copy of a physical node, placed at multiple ring positions |
| Ring | The circular representation of the full hash space |
The critical insight: consistent hashing uses the same hash function for both keys and servers, so they share the same address space.
⚙️ The Ring: One Hash Space for Both Servers and Keys
Consistent hashing fixes this by mapping everything — both servers and keys — onto the same circular hash space (commonly 0 to $2^{32} - 1$).
graph TD
subgraph Hash Ring
direction LR
A((0)) --> B((ServerA @90))
B --> C((ServerB @180))
C --> D((ServerC @270))
D --> A
end
Placement rules:
- Hash each server's name/IP to find its position on the ring.
- Hash each key to find its position on the ring.
- A key belongs to the first server encountered when walking clockwise from the key's position.
Example:
ServerAis at position 90.ServerBis at position 180.ServerCis at position 270.- Key at position 100 → walks clockwise → first server hit is
ServerB(position 180).
📊 Adding or Removing a Server: Only Neighbors Move
This is the breakthrough. When you add ServerD at position 135:
- Keys previously assigned to
ServerB(in the range 91–180) that now fall in range 91–135 are reassigned toServerD. - Everything else is unaffected.
flowchart LR
A[Add ServerD at 135] --> B{Which keys move?}
B --> C[Keys 91-135 move to ServerD]
B --> D[Keys 0-90 ServerA: unchanged]
B --> E[Keys 136-180 ServerB: unchanged]
B --> F[All other keys: unchanged]
Before: $N$ servers → adding 1 requires moving approximately $K/N$ keys ($K$ = total keys).
Naive % N: adding 1 requires remapping most keys.
That's the efficiency win of consistent hashing.
🧠 Deep Dive: Virtual Nodes: Fixing Uneven Distribution
Raw consistent hashing has a problem: if servers hash to clustered positions on the ring, one server gets most keys and others get few.
Solution: Virtual Nodes (vnodes)
Each physical server is placed at multiple positions on the ring by hashing variants of its name:
ServerA-0 → position 10
ServerA-1 → position 50
ServerA-2 → position 200
ServerB-0 → position 30
ServerB-1 → position 120
ServerB-2 → position 250
With 150 virtual nodes per server (Cassandra's default), the distribution becomes statistically uniform, regardless of how physical servers hash.
| Setting | Distribution | Failure Impact |
| 1 vnode/server | Uneven, lottery of positions | One server takes on all of failed server's load |
| 150 vnodes/server | Near-uniform | Failed server's keys spread across all healthy servers |
📊 Virtual Nodes: Key Assignment on the Ring
flowchart LR
K["Key hash value (e.g. position 55)"] --> R["Hash Ring (walk clockwise)"]
R --> V1["vnode ServerA-1 @position 50 (skip)"]
R --> V2["vnode ServerB-0 @position 60 (first hit)"]
V2 --> N[Assigned to ServerB]
Note1[ServerA also at 10, 200 ServerB also at 120, 250]
This diagram shows how a key at hash position 55 is assigned to a physical server using virtual nodes. The ring walk proceeds clockwise from position 55, skips ServerA-1's vnode at position 50 (behind the key), and lands on ServerB-0's vnode at position 60 — so the key is owned by ServerB. Each physical server holds multiple vnodes at different ring positions (shown in the note), so when a server fails its load scatters evenly across all remaining servers rather than collapsing onto a single neighbor.
📊 Node Addition: Minimal Key Rebalance
sequenceDiagram
participant C as Client
participant NR as Hash Ring
participant New as New Node D
participant Old as Existing Node B
C->>NR: add Node D at position 135
NR->>Old: which keys are in range 91-135?
Old-->>NR: keys {key1, key2, key3}
NR->>New: transfer keys {key1, key2, key3}
New-->>NR: transfer complete
Note over NR: Only ~1/N keys moved (minimal rebalance)
C->>NR: lookup key at position 100
NR-->>C: route to Node D (new owner)
🌍 Real-World Applications: Where Consistent Hashing Powers Real Systems
- Apache Cassandra: Uses vnodes (default 256 per server) for token assignment.
- Amazon DynamoDB: Consistent hashing of partition keys across storage nodes.
- Memcached / Redis Cluster: Client-side consistent hashing libraries route cache keys to shards.
- Nginx / HAProxy: Some load balancing plugins use consistent hashing for session affinity.
When Cassandra adds a node, only the data in that node's token range is streamed over — not a full reshuffle.
⚖️ Trade-offs & Failure Modes: Limits and When It Doesn't Help
| Concern | Detail |
| Hot keys | A single key (e.g., celebrity tweet) can overwhelm one vnode despite balanced distribution |
| Data skew | Non-uniform key distributions still concentrate load |
| Read replicas | Consistent hashing defines primary placement; replication factor adds complexity |
| Small clusters | With 3 servers, even 150 vnodes overlap enough to cause imbalance |
Virtual nodes help with structural balance but do not fix hot-key problems. Those require application-level solutions (key sharding with suffixes, caching at a higher layer).
🔬 Deep Dive: Replication and Token Ring Internals
Replication Factor
Consistent hashing defines which node is the primary owner of a key, but production systems need replication for durability. Cassandra uses the ring position to determine replicas: after finding the primary node (the first clockwise match), it walks clockwise to find the next N-1 unique physical servers. With replication_factor=3, data lives on 3 servers.
flowchart LR
K["Key K (position 150)"] -->|primary owner| A[ServerB @180]
A -->|replica 1| B[ServerC @270]
B -->|replica 2| C["ServerA @90 (wraps)"]
This means a key is always available even if 2 of 3 replicas are down — the CAP theorem tradeoff made explicit by where you set your read/write quorum (QUORUM, ONE, ALL).
Token Ranges in Cassandra
In Cassandra, each vnode owns a token range — a slice of the 64-bit integer space. The full space is divided into as many ranges as there are vnodes. With 3 servers × 256 vnodes = 768 token ranges, each server owns 256 non-contiguous slices spread evenly around the ring.
When you run nodetool status, you see each node's token assignments. Adding a node causes Cassandra to transfer only the affected token ranges — typically 1/N of total data.
Consistent Hashing in Redis Cluster
Redis Cluster uses a variation: 16,384 hash slots (not a continuous ring, but a segmented space). The CRC16(key) % 16384 formula assigns keys to slots; slots are manually assigned to nodes. This gives operators explicit control over rebalancing — you move slots between nodes rather than relying on automatic ring placement.
🧭 Decision Guide: When to Use Consistent Hashing: Decision Guide
Use consistent hashing when:
- Building a distributed cache (Memcached, Redis Cluster) where servers join/leave regularly.
- Partitioning a distributed database (Cassandra, DynamoDB) and minimizing data movement is critical.
- Implementing a load balancer with session affinity — route a user to the same backend on every request.
- Building a content delivery system where edge nodes are added or removed based on demand.
Skip consistent hashing when:
- Your dataset fits on one server — no sharding needed.
- You need range queries across sorted keys — B-tree-based range sharding is better.
- You need exact control over which shard holds which data — explicit mapping beats ring placement.
| Signal | Recommendation |
| Nodes join/leave frequently | ✅ Consistent hashing |
| Distributed cache with many servers | ✅ Consistent hashing |
| Need sorted range scans | ❌ Use range-based sharding |
| Single server or very small cluster | ❌ Overkill — use key % N |
🌐 Real-World Architecture: Amazon DynamoDB
DynamoDB's storage layer uses consistent hashing with virtual nodes to distribute table partitions across its fleet of storage servers. Here is a simplified read path:
flowchart LR
Client -->|PutItem: userId=123| R[DynamoDB Router]
R -->|hash(userId=123)| Ring[Virtual Node Ring]
Ring -->|token match| S1["Storage Node A (primary)"]
S1 -->|replicate| S2[Storage Node B]
S1 -->|replicate| S3[Storage Node C]
S1 -->|ack| R
R -->|200 OK| Client
When AWS adds capacity to a DynamoDB region, new storage nodes join the ring and absorb only the adjacent token ranges. No global reshuffle happens. A customer writing to userId=123 at the same moment the ring is being rebalanced continues to succeed — the router just uses updated ring metadata for new requests.
The same principle applies to:
- Amazon S3 internal object placement.
- Apache Cassandra deployed by Netflix for viewing history at petabyte scale.
- Discord using Cassandra's consistent hashing ring for message storage across 19 billion messages daily.
🧪 Practical Setup: Consistent Hashing in Python
Here is a minimal consistent hashing implementation showing the core data structure:
import hashlib
from sortedcontainers import SortedDict
class ConsistentHashRing:
def __init__(self, vnodes=150):
self.ring = SortedDict()
self.vnodes = vnodes
def _hash(self, key: str) -> int:
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server: str):
for i in range(self.vnodes):
token = self._hash(f"{server}-{i}")
self.ring[token] = server
def remove_server(self, server: str):
self.ring = SortedDict(
{t: s for t, s in self.ring.items() if s != server}
)
def get_server(self, key: str) -> str:
if not self.ring:
raise Exception("No servers in ring")
token = self._hash(key)
idx = self.ring.bisect_left(token)
if idx == len(self.ring):
idx = 0 # wrap around
return self.ring.peekitem(idx)[1]
# Usage
ring = ConsistentHashRing(vnodes=150)
ring.add_server("server-a")
ring.add_server("server-b")
ring.add_server("server-c")
print(ring.get_server("user-123")) # server-b
The SortedDict gives O(log N) lookup for bisect_left. Adding or removing a server only iterates over its 150 vnode tokens — not all keys.
🛠️ Java TreeMap: A Production-Ready Consistent Hash Ring Without External Dependencies
java.util.TreeMap is the natural data structure for a consistent hash ring in Java: integer tokens are the keys (sorted automatically), server names are the values, and ceilingKey(token) implements the clockwise-lookup in O(log N) — zero external dependencies required.
import java.security.MessageDigest;
import java.util.TreeMap;
public class ConsistentHashRing {
private final int virtualNodes;
private final TreeMap<Long, String> ring = new TreeMap<>();
public ConsistentHashRing(int virtualNodes) {
this.virtualNodes = virtualNodes;
}
public void addServer(String server) {
for (int i = 0; i < virtualNodes; i++) {
long token = hash(server + "-vnode-" + i);
ring.put(token, server);
}
}
public void removeServer(String server) {
for (int i = 0; i < virtualNodes; i++) {
ring.remove(hash(server + "-vnode-" + i));
}
}
public String getServer(String key) {
if (ring.isEmpty()) throw new IllegalStateException("No servers in ring");
long token = hash(key);
// ceilingKey: first token >= hash(key) → first server clockwise from key
Long ceiling = ring.ceilingKey(token);
// Wrap around the ring if the key lands past the last token
return ring.get(ceiling != null ? ceiling : ring.firstKey());
}
private long hash(String input) {
try {
byte[] digest = MessageDigest.getInstance("MD5").digest(input.getBytes());
long h = 0;
for (int i = 0; i < 8; i++) h = (h << 8) | (digest[i] & 0xFFL);
return h;
} catch (Exception e) { throw new RuntimeException(e); }
}
public static void main(String[] args) {
ConsistentHashRing ring = new ConsistentHashRing(150); // 150 vnodes per server
ring.addServer("cache-a");
ring.addServer("cache-b");
ring.addServer("cache-c");
System.out.println(ring.getServer("user-123")); // e.g. cache-b
System.out.println(ring.getServer("session-456")); // e.g. cache-a
// Add a 4th server — only ~1/4 of keys remap (vs. nearly all with key%N)
ring.addServer("cache-d");
System.out.println(ring.getServer("user-123")); // may now route to cache-d
}
}
Ketama and Guava are the two standard production alternatives in the Java ecosystem:
- Ketama (used by
spymemcachedandXMemcached) is the industry-standard Memcached consistent hashing algorithm in Java, using MD5-based token distribution for maximum client-server compatibility across polyglot stacks. - Guava
Hashing.consistentHashprovides a simpler rendezvous-hashing variant — no ring data structure needed; the bucket is computed directly from the hash code.
// Guava consistent hash — maps a key to one of N buckets, minimizes remapping on resize
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;
int bucket = Hashing.consistentHash(
Hashing.murmur3_32().hashString("user-123", StandardCharsets.UTF_8),
3 // number of buckets (servers)
);
System.out.println("Routes to bucket: " + bucket); // 0, 1, or 2
// When you grow from 3 to 4 buckets, only ~1/4 of keys remap — same guarantee as the ring
Choose the TreeMap ring for full Cassandra/DynamoDB-style token management with virtual nodes; choose Guava.consistentHash for simple stateless routing where you only need the bucket number.
For a full deep-dive on Ketama algorithm internals, Guava's rendezvous hashing math, and Redis Cluster hash slot management in Java, a dedicated follow-up post is planned.
📚 Lessons Learned: Pitfalls in Production
1. Hash function choice matters. MD5 and SHA-1 are common choices but have distribution quirks. CRC32 is faster and well-tested in production consistent hashing (used by Memcached). Avoid simple sum(bytes) % range — it clusters keys.
2. Vnode count is a tuning knob, not a default. Cassandra defaults to 256 vnodes. Higher vnodes = better balance but more gossip overhead. A cluster of 3 nodes can have all 768 tokens assigned; a cluster of 300 nodes needs at least 300 vnodes to avoid token collisions.
3. Consistent hashing ≠ balanced load under skewed access. If 80% of reads hit 20% of keys (Pareto distribution), the servers holding those key ranges are hot regardless of ring balance. Layer in a read cache (Redis) in front of hot partitions.
4. Token ownership is metadata, not data. When you add a server, you must stream the actual data from the previous owner before the new server can serve reads. Plan for this migration window — Cassandra calls it "streaming" and it can take hours on large datasets.
📌 TLDR: Summary & Key Takeaways
key % Nremaps almost all keys when $N$ changes — consistent hashing avoids this.- Both servers and keys are placed on a shared ring; a key maps to the next server clockwise.
- Adding/removing a server only moves keys in the adjacent arc — not the full key space.
- Virtual nodes (150–256 per server) ensure uniform distribution despite hash collisions.
- Used in Cassandra, DynamoDB, Memcached, and Redis Cluster.
Article tools
Reader feedback
Was this article useful?
Rate it if it helped, then continue with the next deep dive when you are ready.
Article metadata