All Posts

System Design HLD Example: Real-Time Leaderboard

A practical interview-ready HLD for a real-time gaming leaderboard with fast rank queries.

Abstract AlgorithmsAbstract Algorithms
Β·Β·15 min read
πŸ“š

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 15 min

AI-assisted content.

TLDR: Real-time leaderboards for 10M+ active users require an in-memory ranking engine. Redis Sorted Sets (ZSET) are the industry standard, providing $O(\log N)$ updates and rank lookups via an internal Skip List data structure. Relational databases fail at this scale because calculating rank is an $O(N)$ operation (counting rows with higher scores) that becomes a bottleneck during peak traffic spikes.

πŸ† The Last-Minute Snipe

Imagine it’s the final minute of a week-long gaming tournament. Ten million players are grinding for the top spot. You’re currently ranked #10, and you just scored 5,000 points. You expect to see your rank jump to #2 immediately.

If the system uses a standard SQL database, your score update triggers a massive re-calculation. The database has to count exactly how many rows are now below you. With millions of players submitting scores every second, the database's write-ahead log (WAL) gets backed up, and the "Rank" query starts taking 5 seconds. By the time the leaderboard refreshes, the tournament is over, and you’ve been "sniped" by someone whose score processed faster because their database partition wasn't locked.

The challenge of a leaderboard isn't storing scoresβ€”it's ranking. You need a data structure that can handle 50,000 updates per second while providing any player's exact global rank in logarithmic time, ensuring the competition remains fair and the experience remains instantaneous.

πŸ“– Real-Time Leaderboard: Use Cases & Requirements

Actors & Journeys

  • Player: Plays matches, earns points, and checks their global and friends-list ranking.
  • Game Server: The authoritative source that validates gameplay and submits the final score to the leaderboard service.
  • Tournament Admin: Configures seasons, resets scores, and monitors for cheating behavior.

Functional Requirements

  • Score Submission: Update a player's score after a match.
  • Top-N Query: Retrieve the top 10, 100, or 1000 players for the global leaderboard.
  • Personal Rank Query: Find a specific player's exact rank (e.g., "You are #4,502 in the world").
  • Social Leaderboard: View rankings specifically among a subset of users (friends).
  • Time-Bounded Views: Daily, Weekly, and All-time rankings.

Non-Functional Requirements (NFRs)

  • Real-Time Accuracy: Score updates must reflect in rankings within $< 500$ ms.
  • High Read Availability: 99.99% (Players check their rank far more often than they play matches).
  • Scalability: Handle 100 million total registered players and 50,000 score updates/sec during peak events.
  • Durability: A cache failure should not permanently lose player scores; a persistent store must act as the backup.

πŸ” Basics: How Ranking Differs from Sorting

At first glance, a leaderboard seems like a simple sorting problem. Just sort the table by score and pick the top rows. However, at scale, there is a massive difference between sorting and ranking.

  1. Sorting: Ordering a set of elements. In SQL, this is ORDER BY score DESC.
  2. Ranking: Determining the relative position of a single element within that set. In SQL, this requires counting all elements with a higher score.

In a 100M row table, sorting is slow, but ranking is catastrophic. Even with an index on the score, a database cannot easily tell you that you are "exactly the 4,502,123rd player" without traversing millions of index entries. We need a data structure that maintains the order as scores are added, rather than recalculating it on every read.

βš™οΈ Mechanics: The Logic of Score Updates

The core mechanic of a modern leaderboard is the Accumulation and Fan-out of score events.

  • Incremental Updates: Instead of overwriting the score, we often send "score deltas" (e.g., +50 points). This allows for easier concurrent updates.
  • Tie-Breaking: To ensure deterministic ranking, we use a composite score. If two players have the same points, the one who achieved it first is ranked higher. We implement this by appending a timestamp to the score: Total_Score = (Points << 32) | (MAX_INT - Timestamp).
  • Pre-computed Windows: For weekly leaderboards, we don't recalculate the week from scratch. We maintain a specific Redis ZSET for each week, allowing for $O(\log N)$ updates to the active leaderboard.

πŸ“ Estimations & Design Goals

The Math of Massive Ranking

  • Total Players: 100 Million.
  • Daily Active Users (DAU): 10 Million.
  • Peak Write Throughput: 50,000 score updates/sec.
  • Peak Read Throughput: 100,000 rank queries/sec.
  • Memory Footprint: Each entry in Redis (Player ID + Score) $\approx 100$ bytes.
    • $100M \times 100 \text{ bytes} = \mathbf{10 \text{ GB}}$. This easily fits in a single high-memory Redis instance, but we'll shard for high availability.

Design Goals

  • $O(\log N)$ Rank Retrieval: Avoid $O(N)$ scans at all costs.
  • Write-Back Durability: Use a message queue to ensure that even if the ranking engine is busy, no score update is ever lost.

πŸ“Š High-Level Design: The Ranking Pipeline

The architecture separates the fast-path ranking engine from the durable-path storage.

graph TD
    Game[Game Server] -->|Score| Ingest[Ingestion Service]
    Ingest -->|Kafka| MQ[Message Queue]
    MQ --> Processor[Score Processor]

    Processor -->|ZADD| Redis[(Redis ZSET Index)]
    Processor -->|Write| Postgres[(Postgres: Score History)]

    Player -->|Get Rank| API[Leaderboard API]
    API -->|ZREVRANK| Redis
    API -.->|Fallback| Postgres

The Ranking Pipeline above separates two concerns that must never share a bottleneck: the write path (score ingestion via Kafka β†’ Score Processor β†’ Redis ZSET) and the read path (Leaderboard API β†’ Redis ZREVRANK). The Game Server never writes directly to Redis β€” it goes through the Ingestion Service and Kafka, ensuring durability before the score is considered committed. If Redis is unavailable, the API falls back to Postgres for correctness at the cost of higher latency.

🧠 Deep Dive: Why Redis Sorted Sets Achieve What SQL Cannot

The performance gulf between Redis ZSET and SQL for leaderboard rank queries is not a matter of hardware speed β€” it is a fundamental algorithmic difference. Understanding the internals of the Redis Skip List reveals why this gap exists and why no amount of database tuning closes it.

Internals: The Skip List and Hash Map That Power ZREVRANK

A Redis Sorted Set is implemented internally as two data structures working in tandem:

  1. A Skip List for ordered traversal by score. A skip list maintains multiple levels of linked lists, where each higher level is a coarser "express lane" that allows the traversal algorithm to jump over large sections of the sorted order. When you call ZREVRANK, Redis traverses the skip list from the highest score downward, counting the number of elements above the target. Because the traversal uses the express lanes, this count takes O(log N) time β€” not O(N) as a full SQL COUNT(*) would.

  2. A Hash Map for O(1) member lookups by key. When you call ZADD with a new score for an existing player, Redis uses the hash map to locate the player's current skip list node in O(1), removes it from its current position, and re-inserts it at the new score position in O(log N). The combination means both updates and rank queries are logarithmic.

OperationRedis ZSETSQL with Index
Update a player's scoreO(log N) β€” skip list re-insertionO(log N) β€” index update
Get player's exact rankO(log N) β€” skip list traversalO(N) β€” COUNT(*) of all rows with higher score
Get top-K playersO(log N + K) β€” start from top, take KO(K log N) β€” index scan
Get players around rank RO(log N + K) β€” ZRANGE with offsetO(N) β€” no efficient SQL equivalent
Memory per entry~100 bytes~200 bytes (row + index overhead)

The SQL O(N) rank calculation is catastrophic at 100M players. Even with a B-tree index on the score column, counting every entry above a given score requires examining millions of index nodes. At 50,000 score updates per second, the index is constantly being rewritten, making these COUNT(*) queries fight for access to the same index pages β€” creating severe lock contention.

Performance Analysis: Scaling the Ranking Engine to 100 Million Players

A single Redis instance storing a 100M-entry ZSET requires approximately 10 GB of RAM (100M Γ— ~100 bytes per entry). At 50,000 ZADD operations per second, each taking ~0.2 ms of Redis CPU time, a single Redis node operates at ~10,000 ZADD/sec per CPU core β€” well within the capacity of a modern high-memory Redis instance with multiple cores.

However, a single Redis instance is a single point of failure. The production architecture requires sharding and replication:

LayerConfigurationCapacityFailure Behavior
Primary shard1 leader per shard10M players / shardAutomatic failover to replica in < 30 seconds
Read replicas2 replicas per shardAbsorbs all ZREVRANK read trafficSlight stale reads (< 100 ms replication lag)
Sharding keyHash of leaderboard key10 shards for 100M playersRe-sharding requires a migration pipeline
Write throughput5,000 updates/sec per shardLinear scaling with shard countβ€”

For time-bounded leaderboards (daily, weekly, all-time), each time window has its own ZSET key. A daily ZSET is created at midnight UTC and discarded at end of day with DEL or via EXPIRE. This avoids the complexity of retroactively removing old scores from a single global set and allows each time window's memory to be reclaimed automatically.

ZSET Key PatternScopeLifecycle
leaderboard:global:alltimeAll registered playersPermanent; grows with player count
leaderboard:global:weekly:2024-W12Active week's accumulationExpires at end of week + 24h grace period
leaderboard:global:daily:2024-03-29Single day's score deltasExpires after 48 hours
leaderboard:friends:{user_id}Per-user friend subsetRebuilt on friend list change

🌍 Real-World Leaderboards: Stack Overflow, League of Legends, and Duolingo

Stack Overflow uses a reputation leaderboard where every upvote increments a user's score and every downvote decrements it. Their system uses a Redis ZSET for the real-time reputation index with Postgres as the source of truth. The interesting challenge: reputation can decrease (downvotes, vote reversals), so the ZADD must support score replacement with potentially lower values. Redis handles this correctly β€” ZADD always updates the score in-place, regardless of direction.

League of Legends maintains seasonal leaderboards for 150 million players across multiple geographic regions (NA, EUW, KR, etc.). Each regional server has its own ZSET, and a separate cross-region aggregation pipeline runs hourly to build a global "World Ranking." The per-region isolation ensures that a Redis failure in Europe does not bring down NA player rankings. Season resets β€” where all scores are zeroed and a new ZSET key is activated β€” are carefully coordinated operational procedures involving weeks of planning.

Duolingo uses a friend-list leaderboard where each user sees only their friends' rankings. Because friend lists are small (average 50 friends), Duolingo computes friend-list leaderboards on-demand: fetch each friend's score via ZSCORE (O(1) each), sort in application memory (O(K log K) where K ≀ 50), and cache the result in Redis with a 30-second TTL. This approach avoids maintaining O(users Γ— average_friends) separate ZSETs β€” a critical storage optimization at Duolingo's scale.

βš–οΈ In-Memory Speed vs. Durability: Trade-offs in Leaderboard Design

Design DecisionAdvantageRisk
Redis ZSET as ranking engineO(log N) rank update and retrievalRAM-resident; large datasets require multi-GB high-memory nodes
Composite score for tie-breakingDeterministic ordering; no secondary sort neededScore field loses direct human-readable meaning; requires documentation
Kafka for score ingestionDurable buffer; handles 50k/sec burst smoothlyAdds 10–50 ms latency vs. direct Redis write
Separate ZSET per time windowCheap daily/weekly reset; automatic memory reclaimMany ZSET keys to manage; expiry timing must be coordinated
Postgres as durable fallbackSurvive Redis failures without permanent score lossFull re-hydration from Postgres takes 3–20 minutes

Critical Failure Mode β€” The Redis Restart Without Persistence: If the Redis instance restarts and AOF (Append Only File) persistence was not enabled, the entire ZSET is lost. Re-hydrating 100M entries from Postgres requires streaming all score records, computing each player's cumulative score, and executing 100M ZADD commands. At Redis's sustained ingestion rate of ~500K ZADDs/sec, this process takes approximately 200 seconds (~3 minutes) for 100M players β€” during which rank queries return empty or stale results. Mitigation: Enable AOF persistence with appendfsync everysec and maintain at least one read replica per shard with continuous replication. AOF recovery is measured in seconds, not minutes.

🧭 Choosing the Right Ranking Technology for Your Player Count and Latency Requirements

Choose Redis ZSET when:

  • Player count is up to 500M (manageable in a sharded Redis cluster with reasonable RAM budget).
  • Rank queries require sub-10 ms latency to support real-time "your rank just changed" notifications.
  • Score updates occur at more than 1,000 per second during peak play.
  • You need exact real-time rank β€” not an approximation.

Choose approximate ranking (percentile buckets) when:

  • Player count exceeds 500M and the exact rank number is not a core product feature.
  • Showing "Top 1%" is as valuable to the user as showing "Rank #4,502,123."
  • Storage cost of a 500M-entry ZSET (50 GB) is prohibitive given your infrastructure budget.

Use SQL for leaderboards only when:

  • Update rate is under 100 per second and player count is under 1M.
  • Rich multi-dimensional filtering is needed β€” by country, age bracket, device type β€” and a ZSET-per-segment approach would create an unmanageable number of keys.
  • The leaderboard is a secondary feature with relaxed latency requirements (> 500 ms is acceptable).

πŸ§ͺ Delivering This Design in a System Design Interview

Act 1 β€” The SQL Leaderboard Failure (2 minutes): Open with the last-minute tournament scenario. Draw the SQL table with 100M rows. Write out the rank query: SELECT COUNT(*) FROM players WHERE score > (SELECT score FROM players WHERE user_id = ?). Annotate it as O(N). Show that at 50,000 score updates per second, this query is backed up by index contention β€” rank results lag the actual scores by 5+ seconds. This immediately motivates the need for a purpose-built ranking data structure.

Act 2 β€” The Skip List Architecture (5 minutes): Draw the Ranking Pipeline diagram. Explain why the Game Server writes to the Ingestion Service rather than Redis directly β€” you need Kafka durability before confirming the score to the player. Walk through ZADD (O(log N) update) and ZREVRANK (O(log N) rank retrieval). Contrast with the O(N) SQL COUNT(*) to reinforce the algorithmic advantage.

Act 3 β€” Edge Cases (3 minutes): Address tie-breaking (composite score encoding), time-bounded leaderboards (separate ZSET per window), and the Redis failure recovery procedure.

Interviewer QuestionStrong Answer
How do you handle score decreases (e.g., cheating penalty)?ZADD with the lower new score; ZSET updates in-place in O(log N) β€” works for decreases just as well as increases
How do you implement weekly leaderboards?A separate ZSET per week key; Score Processor writes to both the alltime ZSET and the current week's ZSET simultaneously
How do you detect cheating (score anomalies)?Stream score events to an anomaly detection service as a Kafka consumer; flag statistical outliers for manual review

πŸ› οΈ Open Source Tools for Leaderboard Infrastructure

Redis with its native Sorted Set commands (ZADD, ZREVRANK, ZREVRANGE, ZRANGEBYSCORE) is the industry-standard choice for the ranking engine. Redis Cluster provides horizontal sharding and automatic failover with minimal configuration.

Apache Kafka serves as the durable score ingestion buffer between the Game Server and the Score Processor. Kafka's consumer group model allows multiple Score Processor instances to consume in parallel, partitioning the update load across workers. The Postgres write consumer runs as a separate consumer group from the Redis write consumer, ensuring each can scale and fail independently.

ClickHouse is increasingly used as an alternative to Postgres for the score history store, especially when analytics queries (e.g., "top players per game mode over the last 30 days") become expensive on a transactional Postgres schema. ClickHouse's columnar storage and OLAP query engine handle these aggregations 10–100Γ— faster than Postgres at leaderboard scale.

πŸ“š Lessons Learned From Operating Real-Time Leaderboards at Scale

Lesson 1 β€” The composite score encoding must be documented prominently. A developer joining the team months later will see a score like 892000001707868800 in Redis and have no idea it encodes both points and a timestamp. Document the encoding formula in the service README, a comment in the Score Processor, and the data dictionary.

Lesson 2 β€” Season resets are operationally complex events that require weeks of planning. At the end of a competitive season, zeroing 100M scores atomically requires coordinating between the Leaderboard Service, Kafka (to drain any in-flight score updates before reset), and Postgres (to archive the season's score history). A poorly executed reset can cause score data loss or contaminate the new season with old scores.

Lesson 3 β€” The "surrounding players" query is not free. Displaying the 3 players above and below a player's rank requires a ZRANGE with an offset β€” still O(log N + K), but involves two Redis calls. Cache this "neighborhood view" at the CDN or application cache layer with a 5-second TTL to absorb the read amplification from players who obsessively refresh their rank during tournaments.

Lesson 4 β€” Monitor ZSET cardinality and memory growth weekly. As players join the game, the ZSET grows linearly. Set up a weekly alert if ZSET memory usage grows faster than the expected new-player rate. An anomalously fast growth rate may indicate score entries from deleted accounts not being cleaned up, or a bug creating phantom player entries.

πŸ“Œ TLDR & Key Takeaways for Real-Time Leaderboard Design

  • Core problem: SQL COUNT(*) for rank calculation is O(N) β€” catastrophically slow at 100M players under 50k updates/sec.
  • Solution: Redis Sorted Set (ZSET) backed by a Skip List provides O(log N) ZADD (score update) and ZREVRANK (rank lookup) β€” the fundamental algorithmic win.
  • Architecture: Game Server β†’ Ingestion Service β†’ Kafka β†’ Score Processor β†’ Redis ZSET (ranking) + Postgres (durability).
  • Tie-breaking: Composite score (points << 32) | (MAX_INT - timestamp_ms) ensures deterministic ordering when two players share the same point total.
  • Redis memory: 100M players Γ— 100 bytes β‰ˆ 10 GB β€” fits on a single high-memory Redis node; shard by leaderboard key for HA.
  • Critical failure: Redis restart without AOF requires 3+ minutes to rebuild from Postgres β€” enable appendfsync everysec and maintain read replicas.
  • System Design HLD: Distributed Cache β€” Redis architecture fundamentals β€” sharding, replication, eviction, and persistence β€” that underpin the leaderboard's ranking engine.
  • System Design HLD: News Feed β€” Fan-out write patterns and Kafka-driven event pipelines that share the same ingestion architecture as the score update path.
  • System Design HLD: Proximity Service β€” Geospatial Redis data structures that complement leaderboard design for location-aware tournament features.
Share

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms