Home/Blog/Distributed Systems/Consistent Hashing: Scaling Without Chaos

Distributed SystemsAdvanced•13 min read•Mar 9, 2026

Consistent Hashing: Scaling Without Chaos

How do you distribute data across 100 servers so that adding a new server doesn't require moving...

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: Standard hashing (key % N) breaks when $N$ changes — adding or removing a server reshuffles almost all keys. Consistent Hashing maps both servers and keys onto a ring (0–360°). When a server is added, only its immediate neighbors' keys move, minimizing data migration.

📖 The Problem with `key % N`: Why Normal Hashing Fails at Scale

Suppose you have 3 cache servers and you route requests like this:

server = hash(key) % 3

Works great — until you add a 4th server. Now hash(key) % 4 is different for almost every key. Nearly all cached data needs to be remapped, causing a cache stampede — every backend database query misses cache simultaneously.

Same problem when a server goes down: hash(key) % 2 reroutes most keys to different servers. Data that was cached is no longer reachable.

The core issue: modular hashing is globally sensitive to $N$.

🔍 Core Vocabulary: Keys, Hashes, and Rings

Before diving into the ring mechanics, clarify the key terms:

Term	Meaning
Hash function	Maps an arbitrary key to a number in a fixed range (e.g., 0 to 2³²−1)
Key	The identifier you want to route — a user ID, session token, or cache key
Node / Server	A physical or virtual machine that stores or processes data
Token	The position a node occupies on the hash ring
Virtual node (vnode)	A logical copy of a physical node, placed at multiple ring positions
Ring	The circular representation of the full hash space

The critical insight: consistent hashing uses the same hash function for both keys and servers, so they share the same address space.

⚙️ The Ring: One Hash Space for Both Servers and Keys

Consistent hashing fixes this by mapping everything — both servers and keys — onto the same circular hash space (commonly 0 to $2^{32} - 1$).

graph TD
    subgraph Hash Ring
        direction LR
        A((0)) --> B((ServerA @90))
        B --> C((ServerB @180))
        C --> D((ServerC @270))
        D --> A
    end

Placement rules:

Hash each server's name/IP to find its position on the ring.
Hash each key to find its position on the ring.
A key belongs to the first server encountered when walking clockwise from the key's position.

Example:

ServerA is at position 90.
ServerB is at position 180.
ServerC is at position 270.
Key at position 100 → walks clockwise → first server hit is ServerB (position 180).

📊 Adding or Removing a Server: Only Neighbors Move

This is the breakthrough. When you add ServerD at position 135:

Keys previously assigned to ServerB (in the range 91–180) that now fall in range 91–135 are reassigned to ServerD.
Everything else is unaffected.

flowchart LR
    A[Add ServerD at 135] --> B{Which keys move?}
    B --> C[Keys 91-135 move to ServerD]
    B --> D[Keys 0-90  ServerA: unchanged]
    B --> E[Keys 136-180  ServerB: unchanged]
    B --> F[All other keys: unchanged]

Before: $N$ servers → adding 1 requires moving approximately $K/N$ keys ($K$ = total keys). Naive % N: adding 1 requires remapping most keys.

That's the efficiency win of consistent hashing.

🧠 Deep Dive: Virtual Nodes: Fixing Uneven Distribution

Raw consistent hashing has a problem: if servers hash to clustered positions on the ring, one server gets most keys and others get few.

Solution: Virtual Nodes (vnodes)

Each physical server is placed at multiple positions on the ring by hashing variants of its name:

ServerA-0 → position 10
ServerA-1 → position 50
ServerA-2 → position 200

ServerB-0 → position 30
ServerB-1 → position 120
ServerB-2 → position 250

With 150 virtual nodes per server (Cassandra's default), the distribution becomes statistically uniform, regardless of how physical servers hash.

Setting	Distribution	Failure Impact
1 vnode/server	Uneven, lottery of positions	One server takes on all of failed server's load
150 vnodes/server	Near-uniform	Failed server's keys spread across all healthy servers

📊 Virtual Nodes: Key Assignment on the Ring

flowchart LR
    K["Key hash value (e.g. position 55)"] --> R["Hash Ring (walk clockwise)"]
    R --> V1["vnode ServerA-1 @position 50 (skip)"]
    R --> V2["vnode ServerB-0 @position 60 (first hit)"]
    V2 --> N[Assigned to ServerB]
    Note1[ServerA also at 10, 200 ServerB also at 120, 250]

This diagram shows how a key at hash position 55 is assigned to a physical server using virtual nodes. The ring walk proceeds clockwise from position 55, skips ServerA-1's vnode at position 50 (behind the key), and lands on ServerB-0's vnode at position 60 — so the key is owned by ServerB. Each physical server holds multiple vnodes at different ring positions (shown in the note), so when a server fails its load scatters evenly across all remaining servers rather than collapsing onto a single neighbor.

📊 Node Addition: Minimal Key Rebalance

sequenceDiagram
    participant C as Client
    participant NR as Hash Ring
    participant New as New Node D
    participant Old as Existing Node B

    C->>NR: add Node D at position 135
    NR->>Old: which keys are in range 91-135?
    Old-->>NR: keys {key1, key2, key3}
    NR->>New: transfer keys {key1, key2, key3}
    New-->>NR: transfer complete
    Note over NR: Only ~1/N keys moved (minimal rebalance)
    C->>NR: lookup key at position 100
    NR-->>C: route to Node D (new owner)

🌍 Real-World Applications: Where Consistent Hashing Powers Real Systems

Apache Cassandra: Uses vnodes (default 256 per server) for token assignment.
Amazon DynamoDB: Consistent hashing of partition keys across storage nodes.
Memcached / Redis Cluster: Client-side consistent hashing libraries route cache keys to shards.
Nginx / HAProxy: Some load balancing plugins use consistent hashing for session affinity.

When Cassandra adds a node, only the data in that node's token range is streamed over — not a full reshuffle.

⚖️ Trade-offs & Failure Modes: Limits and When It Doesn't Help

Concern	Detail
Hot keys	A single key (e.g., celebrity tweet) can overwhelm one vnode despite balanced distribution
Data skew	Non-uniform key distributions still concentrate load
Read replicas	Consistent hashing defines primary placement; replication factor adds complexity
Small clusters	With 3 servers, even 150 vnodes overlap enough to cause imbalance

Virtual nodes help with structural balance but do not fix hot-key problems. Those require application-level solutions (key sharding with suffixes, caching at a higher layer).

🔬 Deep Dive: Replication and Token Ring Internals

Replication Factor

Consistent hashing defines which node is the primary owner of a key, but production systems need replication for durability. Cassandra uses the ring position to determine replicas: after finding the primary node (the first clockwise match), it walks clockwise to find the next N-1 unique physical servers. With replication_factor=3, data lives on 3 servers.

flowchart LR
    K["Key K (position 150)"] -->|primary owner| A[ServerB @180]
    A -->|replica 1| B[ServerC @270]
    B -->|replica 2| C["ServerA @90 (wraps)"]

This means a key is always available even if 2 of 3 replicas are down — the CAP theorem tradeoff made explicit by where you set your read/write quorum (QUORUM, ONE, ALL).

Token Ranges in Cassandra

In Cassandra, each vnode owns a token range — a slice of the 64-bit integer space. The full space is divided into as many ranges as there are vnodes. With 3 servers × 256 vnodes = 768 token ranges, each server owns 256 non-contiguous slices spread evenly around the ring.

When you run nodetool status, you see each node's token assignments. Adding a node causes Cassandra to transfer only the affected token ranges — typically 1/N of total data.

Consistent Hashing in Redis Cluster

Redis Cluster uses a variation: 16,384 hash slots (not a continuous ring, but a segmented space). The CRC16(key) % 16384 formula assigns keys to slots; slots are manually assigned to nodes. This gives operators explicit control over rebalancing — you move slots between nodes rather than relying on automatic ring placement.

🧭 Decision Guide: When to Use Consistent Hashing: Decision Guide

Use consistent hashing when:

Building a distributed cache (Memcached, Redis Cluster) where servers join/leave regularly.
Partitioning a distributed database (Cassandra, DynamoDB) and minimizing data movement is critical.
Implementing a load balancer with session affinity — route a user to the same backend on every request.
Building a content delivery system where edge nodes are added or removed based on demand.

Skip consistent hashing when:

Your dataset fits on one server — no sharding needed.
You need range queries across sorted keys — B-tree-based range sharding is better.
You need exact control over which shard holds which data — explicit mapping beats ring placement.

Signal	Recommendation
Nodes join/leave frequently	✅ Consistent hashing
Distributed cache with many servers	✅ Consistent hashing
Need sorted range scans	❌ Use range-based sharding
Single server or very small cluster	❌ Overkill — use `key % N`

🌐 Real-World Architecture: Amazon DynamoDB

DynamoDB's storage layer uses consistent hashing with virtual nodes to distribute table partitions across its fleet of storage servers. Here is a simplified read path:

flowchart LR
    Client -->|PutItem: userId=123| R[DynamoDB Router]
    R -->|hash(userId=123)| Ring[Virtual Node Ring]
    Ring -->|token match| S1["Storage Node A (primary)"]
    S1 -->|replicate| S2[Storage Node B]
    S1 -->|replicate| S3[Storage Node C]
    S1 -->|ack| R
    R -->|200 OK| Client

When AWS adds capacity to a DynamoDB region, new storage nodes join the ring and absorb only the adjacent token ranges. No global reshuffle happens. A customer writing to userId=123 at the same moment the ring is being rebalanced continues to succeed — the router just uses updated ring metadata for new requests.

The same principle applies to:

Amazon S3 internal object placement.
Apache Cassandra deployed by Netflix for viewing history at petabyte scale.
Discord using Cassandra's consistent hashing ring for message storage across 19 billion messages daily.

🧪 Practical Setup: Consistent Hashing in Python

Here is a minimal consistent hashing implementation showing the core data structure:

import hashlib
from sortedcontainers import SortedDict

class ConsistentHashRing:
    def __init__(self, vnodes=150):
        self.ring = SortedDict()
        self.vnodes = vnodes

    def _hash(self, key: str) -> int:
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_server(self, server: str):
        for i in range(self.vnodes):
            token = self._hash(f"{server}-{i}")
            self.ring[token] = server

    def remove_server(self, server: str):
        self.ring = SortedDict(
            {t: s for t, s in self.ring.items() if s != server}
        )

    def get_server(self, key: str) -> str:
        if not self.ring:
            raise Exception("No servers in ring")
        token = self._hash(key)
        idx = self.ring.bisect_left(token)
        if idx == len(self.ring):
            idx = 0  # wrap around
        return self.ring.peekitem(idx)[1]

# Usage
ring = ConsistentHashRing(vnodes=150)
ring.add_server("server-a")
ring.add_server("server-b")
ring.add_server("server-c")

print(ring.get_server("user-123"))  # server-b

The SortedDict gives O(log N) lookup for bisect_left. Adding or removing a server only iterates over its 150 vnode tokens — not all keys.

🛠️ Java TreeMap: A Production-Ready Consistent Hash Ring Without External Dependencies

java.util.TreeMap is the natural data structure for a consistent hash ring in Java: integer tokens are the keys (sorted automatically), server names are the values, and ceilingKey(token) implements the clockwise-lookup in O(log N) — zero external dependencies required.

import java.security.MessageDigest;
import java.util.TreeMap;

public class ConsistentHashRing {

    private final int             virtualNodes;
    private final TreeMap<Long, String> ring = new TreeMap<>();

    public ConsistentHashRing(int virtualNodes) {
        this.virtualNodes = virtualNodes;
    }

    public void addServer(String server) {
        for (int i = 0; i < virtualNodes; i++) {
            long token = hash(server + "-vnode-" + i);
            ring.put(token, server);
        }
    }

    public void removeServer(String server) {
        for (int i = 0; i < virtualNodes; i++) {
            ring.remove(hash(server + "-vnode-" + i));
        }
    }

    public String getServer(String key) {
        if (ring.isEmpty()) throw new IllegalStateException("No servers in ring");
        long token   = hash(key);
        // ceilingKey: first token >= hash(key) → first server clockwise from key
        Long ceiling = ring.ceilingKey(token);
        // Wrap around the ring if the key lands past the last token
        return ring.get(ceiling != null ? ceiling : ring.firstKey());
    }

    private long hash(String input) {
        try {
            byte[] digest = MessageDigest.getInstance("MD5").digest(input.getBytes());
            long h = 0;
            for (int i = 0; i < 8; i++) h = (h << 8) | (digest[i] & 0xFFL);
            return h;
        } catch (Exception e) { throw new RuntimeException(e); }
    }

    public static void main(String[] args) {
        ConsistentHashRing ring = new ConsistentHashRing(150);  // 150 vnodes per server
        ring.addServer("cache-a");
        ring.addServer("cache-b");
        ring.addServer("cache-c");

        System.out.println(ring.getServer("user-123"));     // e.g. cache-b
        System.out.println(ring.getServer("session-456"));  // e.g. cache-a

        // Add a 4th server — only ~1/4 of keys remap (vs. nearly all with key%N)
        ring.addServer("cache-d");
        System.out.println(ring.getServer("user-123"));     // may now route to cache-d
    }
}

Ketama and Guava are the two standard production alternatives in the Java ecosystem:

Ketama (used by spymemcached and XMemcached) is the industry-standard Memcached consistent hashing algorithm in Java, using MD5-based token distribution for maximum client-server compatibility across polyglot stacks.
Guava Hashing.consistentHash provides a simpler rendezvous-hashing variant — no ring data structure needed; the bucket is computed directly from the hash code.

// Guava consistent hash — maps a key to one of N buckets, minimizes remapping on resize
import com.google.common.hash.Hashing;
import java.nio.charset.StandardCharsets;

int bucket = Hashing.consistentHash(
    Hashing.murmur3_32().hashString("user-123", StandardCharsets.UTF_8),
    3   // number of buckets (servers)
);
System.out.println("Routes to bucket: " + bucket);  // 0, 1, or 2

// When you grow from 3 to 4 buckets, only ~1/4 of keys remap — same guarantee as the ring

Choose the TreeMap ring for full Cassandra/DynamoDB-style token management with virtual nodes; choose Guava.consistentHash for simple stateless routing where you only need the bucket number.

For a full deep-dive on Ketama algorithm internals, Guava's rendezvous hashing math, and Redis Cluster hash slot management in Java, a dedicated follow-up post is planned.

📚 Lessons Learned: Pitfalls in Production

1. Hash function choice matters. MD5 and SHA-1 are common choices but have distribution quirks. CRC32 is faster and well-tested in production consistent hashing (used by Memcached). Avoid simple sum(bytes) % range — it clusters keys.

2. Vnode count is a tuning knob, not a default. Cassandra defaults to 256 vnodes. Higher vnodes = better balance but more gossip overhead. A cluster of 3 nodes can have all 768 tokens assigned; a cluster of 300 nodes needs at least 300 vnodes to avoid token collisions.

3. Consistent hashing ≠ balanced load under skewed access. If 80% of reads hit 20% of keys (Pareto distribution), the servers holding those key ranges are hot regardless of ring balance. Layer in a read cache (Redis) in front of hot partitions.

4. Token ownership is metadata, not data. When you add a server, you must stream the actual data from the previous owner before the new server can serve reads. Plan for this migration window — Cassandra calls it "streaming" and it can take hours on large datasets.

📌 TLDR: Summary & Key Takeaways

key % N remaps almost all keys when $N$ changes — consistent hashing avoids this.
Both servers and keys are placed on a shared ring; a key maps to the next server clockwise.
Adding/removing a server only moves keys in the adjacent arc — not the full key space.
Virtual nodes (150–256 per server) ensure uniform distribution despite hash collisions.
Used in Cassandra, DynamoDB, Memcached, and Redis Cluster.

Article tools

Explain simpler Compare approaches What next?

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata