All Posts

Consistent Hashing: Scaling Without Chaos

How do you distribute data across 100 servers so that adding a new server doesn't require moving...

Abstract AlgorithmsAbstract Algorithms
··5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Standard hashing (key % N) breaks when $N$ changes — adding or removing a server reshuffles almost all keys. Consistent Hashing maps both servers and keys onto a ring (0–360°). When a server is added, only its immediate neighbors' keys move, minimizing data migration.


📖 The Problem with key % N: Why Normal Hashing Fails at Scale

Suppose you have 3 cache servers and you route requests like this:

server = hash(key) % 3

Works great — until you add a 4th server. Now hash(key) % 4 is different for almost every key. Nearly all cached data needs to be remapped, causing a cache stampede — every backend database query misses cache simultaneously.

Same problem when a server goes down: hash(key) % 2 reroutes most keys to different servers. Data that was cached is no longer reachable.

The core issue: modular hashing is globally sensitive to $N$.


⚙️ The Ring: One Hash Space for Both Servers and Keys

Consistent hashing fixes this by mapping everything — both servers and keys — onto the same circular hash space (commonly 0 to $2^{32} - 1$).

graph TD
    subgraph Hash Ring
        direction LR
        A((0)) --> B((ServerA\n@90))
        B --> C((ServerB\n@180))
        C --> D((ServerC\n@270))
        D --> A
    end

Placement rules:

  1. Hash each server's name/IP to find its position on the ring.
  2. Hash each key to find its position on the ring.
  3. A key belongs to the first server encountered when walking clockwise from the key's position.

Example:

  • ServerA is at position 90.
  • ServerB is at position 180.
  • ServerC is at position 270.
  • Key at position 100 → walks clockwise → first server hit is ServerB (position 180).

🔢 Adding or Removing a Server: Only Neighbors Move

This is the breakthrough. When you add ServerD at position 135:

  • Keys previously assigned to ServerB (in the range 91–180) that now fall in range 91–135 are reassigned to ServerD.
  • Everything else is unaffected.
flowchart LR
    A[Add ServerD at 135] --> B{Which keys move?}
    B --> C[Keys 91-135\nmove to ServerD]
    B --> D[Keys 0-90 → ServerA: unchanged]
    B --> E[Keys 136-180 → ServerB: unchanged]
    B --> F[All other keys: unchanged]

Before: $N$ servers → adding 1 requires moving approximately $K/N$ keys ($K$ = total keys). Naive % N: adding 1 requires remapping most keys.

That's the efficiency win of consistent hashing.


🧠 Virtual Nodes: Fixing Uneven Distribution

Raw consistent hashing has a problem: if servers hash to clustered positions on the ring, one server gets most keys and others get few.

Solution: Virtual Nodes (vnodes)

Each physical server is placed at multiple positions on the ring by hashing variants of its name:

ServerA-0 → position 10
ServerA-1 → position 50
ServerA-2 → position 200

ServerB-0 → position 30
ServerB-1 → position 120
ServerB-2 → position 250

With 150 virtual nodes per server (Cassandra's default), the distribution becomes statistically uniform, regardless of how physical servers hash.

SettingDistributionFailure Impact
1 vnode/serverUneven, lottery of positionsOne server takes on all of failed server's load
150 vnodes/serverNear-uniformFailed server's keys spread across all healthy servers

🌍 Where Consistent Hashing Powers Real Systems

  • Apache Cassandra: Uses vnodes (default 256 per server) for token assignment.
  • Amazon DynamoDB: Consistent hashing of partition keys across storage nodes.
  • Memcached / Redis Cluster: Client-side consistent hashing libraries route cache keys to shards.
  • Nginx / HAProxy: Some load balancing plugins use consistent hashing for session affinity.

When Cassandra adds a node, only the data in that node's token range is streamed over — not a full reshuffle.


⚖️ Limits and When It Doesn't Help

ConcernDetail
Hot keysA single key (e.g., celebrity tweet) can overwhelm one vnode despite balanced distribution
Data skewNon-uniform key distributions still concentrate load
Read replicasConsistent hashing defines primary placement; replication factor adds complexity
Small clustersWith 3 servers, even 150 vnodes overlap enough to cause imbalance

Virtual nodes help with structural balance but do not fix hot-key problems. Those require application-level solutions (key sharding with suffixes, caching at a higher layer).


📌 Key Takeaways

  • key % N remaps almost all keys when $N$ changes — consistent hashing avoids this.
  • Both servers and keys are placed on a shared ring; a key maps to the next server clockwise.
  • Adding/removing a server only moves keys in the adjacent arc — not the full key space.
  • Virtual nodes (150–256 per server) ensure uniform distribution despite hash collisions.
  • Used in Cassandra, DynamoDB, Memcached, and Redis Cluster.

🧩 Test Your Understanding

  1. With key % 3, what percentage of keys need remapping when you add a 4th server?
  2. A key is at position 100 on the ring. ServerA is at 90, ServerB is at 200. Which server owns the key?
  3. Why do virtual nodes improve fault recovery compared to single-position servers?
  4. Consistent hashing solves uneven distribution. Does it solve hot keys? Why or why not?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms