All Posts

System Design HLD Example: Real-Time Leaderboard

A practical interview-ready HLD for a real-time gaming leaderboard with fast rank queries.

Abstract AlgorithmsAbstract Algorithms
ยทยท24 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Real-time leaderboards for 10M+ players require Redis Sorted Sets as the primary ranking index โ€” ZADD updates a player's score in O(log N) and ZREVRANK returns their global rank in O(log N), keeping rank queries under 10ms at any scale. SQL ORDER BY cannot match this because counting rows above a given score requires a full index scan that grows with the player's rank. Time-windowed leaderboards (daily/weekly) use separate sorted sets per window with TTL expiry โ€” no cleanup jobs needed. MySQL stores the full score history for durability; the Kafka topic is the disaster-recovery replay buffer.

Fortnite runs seasonal competitions with 10M concurrent players. After each match, a player's rank on the global leaderboard must update within seconds and be visible to all. A naive SQL query โ€” SELECT COUNT(*) FROM scores WHERE score > ? โ€” scans the entire table to compute rank. At 10M rows, that query takes seconds per call; at 500K leaderboard reads per second during peak hours, it would instantly saturate any database cluster. How do you make rank queries instant at this scale?

The answer is not a faster database. It is a different data structure: the Redis Sorted Set, which stores every player's score in a skip list that makes ZADD, ZREVRANK, and ZREVRANGE operations O(log N) โ€” about 23 comparisons for 10M players, completing in under 100 microseconds. This walkthrough designs the full system: ingestion pipeline, Redis Sorted Set operations, time-windowed leaderboards, nearby-player queries, persistence strategy, and sharding approach for 100M-player scale.

๐Ÿ“– Use Cases and Actors: What a Real-Time Leaderboard Must Do

Actors

ActorRole
Game ServerEmits score events after each match ends; publishes to the ingestion pipeline
Score Ingestion ServiceReceives raw score submissions; validates, deduplicates by match_id, and enqueues to Kafka
Score ProcessorKafka consumer; issues ZADD commands to Redis Sorted Sets for every time window
Redis Sorted SetPrimary ranking index; stores player_id โ†’ score pairs in a skip list; O(log N) rank queries
Leaderboard APIServes ZREVRANGE (top-N), ZREVRANK (player rank), and ZRANGE (nearby players) via REST
MySQLDurable persistence for full score history; snapshotted from Kafka, not from Redis
Game ClientReads leaderboard via the Leaderboard API; renders rank, nearby competitors, and top-100

Use Cases

  • Submit score โ€” Game Server sends player_id + score + match_id after each match
  • Get top-N leaderboard โ€” return the top 100 players by score for a given time window
  • Get a player's global rank โ€” return a player's exact 1-based rank among all active players
  • Get nearby players (relative rank) โ€” return the 5 players above and below a given player
  • Time-windowed leaderboards โ€” daily, weekly, and all-time views; each is a separate sorted set with its own TTL

Out of Scope (v1 boundary)

  • Fraud and cheat detection (a separate score validation service would gate submissions upstream)
  • Game server architecture and match logic
  • Friend leaderboards (requires a social graph query layer โ€” different access pattern)
  • Cross-region leaderboard federation (single-region deployment for v1)

๐Ÿ” Functional Requirements: Score Submissions, Rank Queries, and Time Windows

FeatureRedis OperationKey Design Decision
Score submissionZADD leaderboard:all-time GT score player_idGT flag: only update if new score exceeds the current best โ€” prevents rank regression
Top-N leaderboardZREVRANGE leaderboard:all-time 0 99 WITHSCORESO(log N + N) โ€” direct index lookup, no sort step
Player rankZREVRANK leaderboard:all-time player_idO(log N) โ€” skip list span-count traversal
Nearby playersZREVRANK โ†’ ZRANGE around that positionTwo commands; combined O(log N + K) for K neighbours
Time-windowed leaderboardSeparate sorted set per window: leaderboard:daily:2026-03-28Key expires automatically via TTL when the window closes
Pagination on top listZREVRANGE key offset countNative Redis; no cursor state needed in the API layer

This table surfaces the critical design choice before any architecture diagram: every use case maps directly to a specific Redis command with a known time complexity, validating the choice of Redis Sorted Set as the primary data structure.

โš™๏ธ Capacity Estimation and Design Goals: Making Rank Queries Instant

Estimations

DimensionAssumptionDerivation
Active players10MFortnite-scale seasonal competition
Score submissions/sec~50K (burst)10M matches/day รท 86,400s ร— burst overlap factor of 0.4
Leaderboard reads/sec~500K10:1 read-to-write ratio during active sessions
Redis sorted set memory~800 MB10M members ร— ~80 bytes per entry (score + player_id + skip list overhead)
MySQL score history~50 GB/year365 days ร— 10M players ร— ~14 bytes per row
Redis nodes (v1)1 primary + 1 replicaSingle 4 GB Redis node holds all sorted sets with 3ร— headroom

Key insight: at 500K rank reads per second, a database query taking 10ms would require 5,000 concurrent connections just for rank lookups. Redis ZREVRANK at under 1ms with 1M+ ops/sec per node makes 500K reads/sec trivial on a single node.

Design Goals

GoalTargetDecision It Drives
Rank query latency< 10ms p99 for 10M playersRedis Sorted Set primary index; no SQL on the hot path
Score update end-to-end< 2 seconds from match end to leaderboard updateKafka buffer + dedicated Score Processor with lag SLO < 1s
Top-100 bulk read< 5ms p99Cache as pre-serialized JSON string; refresh every 10 seconds
Time window accuracyDaily window aligned to UTC midnightSorted set key includes date: leaderboard:daily:2026-03-28
Score regression preventionPlayer rank can only improveZADD GT flag on every score submission

๐Ÿง  Deep Dive: How Redis Sorted Sets Enable O(log N) Ranking at Scale

The Internals: Skip List and Hash Table Working Together

A Redis Sorted Set is not one data structure โ€” it is two cooperating ones:

  1. Skip List โ€” a probabilistic layered linked list. Each node holds a member (player_id) and score, maintained in sorted order by score. Skip lists achieve O(log N) insertion and rank queries by maintaining multiple "express lane" layers: on a 32-level skip list with 10M members, ZREVRANK requires at most ~23 pointer hops. Crucially, each forward pointer stores a span count (how many nodes it skips), so global rank is computed by summing span counts along the traversal path โ€” no row scanning required.

  2. Hash Table โ€” maps each member to its current score in O(1). ZSCORE uses this path. ZADD uses the hash table to locate the old entry before repositioning it in the skip list.

When you call ZADD leaderboard:all-time GT 4500 p:42:

  • Hash table checks existing score for p:42 (O(1))
  • If 4500 > existing score: skip list removes old node (O(log N)) and inserts at new rank (O(log N))
  • Entire operation is atomic โ€” no lock needed because Redis is single-threaded
Skip List layout for a small leaderboard (4 levels):

Level 3: [HEAD] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ [p:42: 4500] โ”€โ”€โ”€ [TAIL]
Level 2: [HEAD] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ [p:07: 2800] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ [p:42: 4500] โ”€โ”€โ”€ [TAIL]
Level 1: [HEAD] โ”€ [p:03: 1200] โ”€ [p:07: 2800] โ”€ [p:15: 3900] โ”€ [p:42: 4500] โ”€ [TAIL]
Level 0: [HEAD] โ”€ [p:01: 800] โ”€ [p:03: 1200] โ”€ [p:07: 2800] โ”€ [p:15: 3900] โ”€ [p:42: 4500] โ”€ [TAIL]

ZREVRANK p:42 โ†’ start at Level 3, accumulate span counts top-down โ†’ rank = 0 (highest score)
ZREVRANK p:07 โ†’ span accumulation finds 2 players above (p:42, p:15) โ†’ rank = 2

Why not a B-Tree? PostgreSQL uses a B-Tree on (score DESC) for ORDER BY score DESC queries. A B-Tree index makes ZRANK O(log N) too โ€” but computing the count of rows above a given score requires traversing all index pages above that position, which is O(R) where R is the number of players ranked above the target. For a player ranked 5 millionth out of 10 million, that is 5 million index entries scanned. A Redis skip list with span counts avoids this entirely: rank = sum of span values along the traversal path, computed in O(log N).

Performance Analysis: ZADD, ZRANK, and ZRANGE at Scale

OperationRedis CommandTime ComplexityAt 10M Members
Score updateZADD key GT score memberO(log N)~23 comparisons
Get player rankZREVRANK key memberO(log N)~23 comparisons
Get top-NZREVRANGE key 0 N-1O(log N + N)Sub-millisecond for N = 100
Nearby players windowZRANGE key R-5 R+5 REVO(log N + K)Sub-millisecond for K = 10
Member countZCARD keyO(1)Instant
Score lookupZSCORE key memberO(1)Instant โ€” hash table path

SQL vs Redis rank query at 10M rows:

ApproachOperationComplexityAt 10M rows (rank 5M)
SQL ORDER BY score DESC LIMIT 1 OFFSET 5MFull index scan with offsetO(R) โ€” R rows above target~5M index entries scanned โ†’ seconds
SQL COUNT(*) WHERE score > ?Index range scanO(R) for R rows above target~5M comparisons โ†’ hundreds of ms
Redis ZREVRANKSkip list span traversalO(log N) always~23 comparisons โ†’ < 1ms

At 500K reads/second targeting median-ranked players, the SQL approach is physically impossible. The Redis approach is trivially achievable on a single node.

๐Ÿ“Š High-Level Architecture: The Full Leaderboard Data Flow

The architecture separates the write path (score ingestion โ†’ Redis) from the read path (API โ†’ Redis) so each can be scaled independently.

flowchart TD
    GS[Game Server] -->|score event: player_id + score + match_id| SI[Score Ingestion Service]
    SI -->|validate + deduplicate| K[Kafka: scores topic]
    K -->|consume partition| SP[Score Processor]
    SP -->|ZADD GT score player_id| RS[Redis: leaderboard:all-time]
    SP -->|ZADD GT score player_id| RD[Redis: leaderboard:daily:DATE]
    SP -->|ZADD GT score player_id| RW[Redis: leaderboard:weekly:WEEK]
    SP -->|async INSERT| DB[(MySQL: score_history)]

    CL[Game Client] -->|GET /rank/player_id| API[Leaderboard API]
    CL -->|GET /top/100| API
    CL -->|GET /nearby/player_id| API
    API -->|ZREVRANK| RS
    API -->|GET top-100 cache key| TC[Top-100 JSON Cache]
    TC -->|cache miss: ZREVRANGE 0 99| RS
    API -->|ZRANGE around rank| RS

This flowchart shows the complete lifecycle of a leaderboard interaction. The Game Server feeds scores through Kafka to the Score Processor, which is the only writer to Redis โ€” this eliminates concurrent ZADD races. MySQL receives async inserts for durability without being in the read hot path. The Leaderboard API hits Redis directly for rank queries and reads the pre-serialized Top-100 JSON Cache for bulk list requests, reducing sorted set scans from 500K/sec to 6 per minute for the most popular query.

Write Path: From Match End to Leaderboard Update

1. Game Server โ†’ Score Ingestion: POST /ingest/score
   { player_id: "p:42", score: 4500, match_id: "m:9981", ts: 1743120000 }

2. Score Ingestion validates: match_id not previously seen (idempotency check) โ†’
   publish to Kafka partition hash("p:42") % 32

3. Score Processor consumes Kafka event:
   ZADD leaderboard:all-time          GT 4500 p:42
   ZADD leaderboard:daily:2026-03-28  GT 4500 p:42
   ZADD leaderboard:weekly:2026-W13   GT 4500 p:42
   (each ZADD is O(log N); all three complete in < 1ms total)

4. Async: INSERT INTO score_history VALUES ('p:42', 4500, 'm:9981', '2026-03-28', NOW())

5. Leaderboard live โ€” ZREVRANK p:42 now returns p:42's updated rank

Why Kafka? Post-match bursts hit 50K submissions/sec. Kafka buffers the burst, provides ordered delivery per player partition (preventing out-of-order ZADD GT conflicts), and retains events for 7 days as the disaster-recovery replay buffer if Redis loses data.

Read Path: From API Call to Rank Response

sequenceDiagram
    participant CL as Game Client
    participant API as Leaderboard API
    participant TC as Top-100 Cache
    participant RS as Redis Sorted Set

    CL->>API: GET /rank/p:42
    API->>RS: ZREVRANK leaderboard:all-time p:42
    RS-->>API: 4999999 (0-indexed)
    API-->>CL: {rank: 5000000, score: 4500, total: 10000000}

    CL->>API: GET /top/100
    API->>TC: GET leaderboard:top100:all-time:cache
    alt Cache fresh (set within last 10s)
        TC-->>API: serialized JSON string
    else Cache stale or missing
        TC->>RS: ZREVRANGE leaderboard:all-time 0 99 WITHSCORES
        RS-->>TC: 100 scored members
        TC-->>API: refreshed JSON string
    end
    API-->>CL: top-100 list (JSON)

This sequence diagram shows why the Top-100 JSON Cache exists. A single GET on a pre-serialized string is O(1) and avoids running ZREVRANGE for every top-100 page load. The cache is refreshed every 10 seconds โ€” stale by at most 10 seconds, which is acceptable for a leaderboard visible to millions of players.

๐ŸŒ Real-World Applications: Component Deep Dives for Time Windows, Bulk Reads, and Nearby Players

Time-Windowed Leaderboards: One Sorted Set Per Window

Use a separate sorted set per time window. This is simpler, faster, and more correct than timestamp-filtered queries on a shared sorted set.

Key naming convention:

leaderboard:all-time              โ†’ global permanent leaderboard
leaderboard:weekly:2026-W13       โ†’ ISO week 13 of 2026
leaderboard:daily:2026-03-28      โ†’ single calendar day
leaderboard:season:s2026-q1       โ†’ custom season window

TTL for automatic expiry:

EXPIRE leaderboard:daily:2026-03-27    604800   # expire 7 days after day closes
EXPIRE leaderboard:weekly:2026-W12    2592000   # expire 30 days after week closes

When the window closes, the sorted set self-deletes. The next window's sorted set is created on the first ZADD. At any moment there are at most ~37 active sorted sets (1 all-time + 1 weekly + 7 recent-daily + ~28 older-daily with remaining TTL). Each window is independently ranked โ€” a player can be rank 1 today but rank 50,000 this week.

WindowKey PatternTTL After CloseMemory
All-timeleaderboard:all-timeNever expires~800 MB
Weeklyleaderboard:weekly:YYYY-WNN30 days~800 MB
Dailyleaderboard:daily:YYYY-MM-DD7 days~800 MB

Why not filter by timestamp in one sorted set? ZREVRANGEBYSCORE returns members in a score range, not a time range. Filtering by submission time requires either encoding time into the score (which distorts rank ordering) or a secondary index โ€” neither is a native sorted set operation. Separate sorted sets preserve the O(log N) rank property per window.

Top-N Bulk Read: Caching the List as a Pre-Serialized JSON String

ZREVRANGE 0 99 is O(log N + 100) โ€” fast, but called 500K times per second it consumes most of a Redis node's CPU. The solution is a pre-serialized JSON cache string:

Key:   leaderboard:top100:all-time:cache
Value: '[{"rank":1,"player_id":"p:7712","score":98500},{"rank":2,...}]'
TTL:   15 seconds

The Leaderboard API reads this string with a single O(1) Redis GET. A background job (or the API on a cache miss) issues ZREVRANGE, serializes to JSON, and sets the cache key. This reduces ZREVRANGE calls from 500K/sec to 6 refreshes per minute โ€” a 5,000ร— reduction.

Nearby Players: Two-Command ZREVRANK + ZRANGE

1. ZREVRANK leaderboard:all-time p:42         โ†’ rank R (e.g., 4999, 0-indexed)
2. ZRANGE leaderboard:all-time (R-5) (R+5) REV WITHSCORES
   โ†’ returns the 10 players surrounding p:42 in descending score order

This two-command sequence is not atomic. A score update between the two commands could shift ranks by ยฑ1. For a leaderboard display this is acceptable โ€” the nearby-players view is informational and a one-position shift between commands is invisible to users. If strict atomicity is required, wrap both commands in a Lua script executed with EVAL.

โš–๏ธ Trade-offs and Failure Modes: Sharding, Hot Seasons, and Redis Persistence

Scaling: From Single Node to Sharded Cluster

BottleneckSymptomMitigation
Single node memoryRedis memory > 4 GB as players grow past 50MShard sorted sets by hash(player_id) % N across N Redis nodes
Throughput ceilingRedis CPU > 80% sustained at 1M+ ops/secAdd read replicas for ZREVRANK / ZRANGE; writes stay on primary
End-of-season burst10ร— normal write rate as season deadline hitsPre-scale: provision read replicas 1 hour before season end
Redis node failureRank queries fail; client errors spikeRedis Sentinel / Cluster with replica promotion < 5 seconds
Score regressionLate score submission overwrites a higher scoreAlways use ZADD GT โ€” ensures score can only increase

Persistence Strategy: Redis-as-Primary, MySQL-as-Archive

Redis holds only each player's current best score per window. MySQL stores the full submission history (every match result) for analytics, audit, and reconstruction.

CREATE TABLE score_history (
    id           BIGINT AUTO_INCREMENT PRIMARY KEY,
    player_id    VARCHAR(64)  NOT NULL,
    score        BIGINT       NOT NULL,
    match_id     VARCHAR(64)  NOT NULL,
    window_date  DATE         NOT NULL,
    submitted_at TIMESTAMPTZ  NOT NULL DEFAULT NOW(),
    INDEX idx_player       (player_id),
    INDEX idx_window_score (window_date, score DESC)
);

Failure recovery: if Redis loses sorted set data (node failure without replica, or corrupted RDB snapshot), the Score Processor replays the Kafka topic (retained 7 days). It reads the score_history table ordered by (player_id, submitted_at ASC), replays ZADD GT for each row, and reconstructs all sorted sets. Replay of 10M entries at 50K writes/sec takes ~3 minutes.

ZADD Flag Semantics: GT vs NX vs Plain

FlagBehaviorWhen to Use
ZADD (plain)Always overwrite scoreWhen the latest score is always authoritative (score can decrease)
ZADD NXInsert only; never updateWhen first submission is final (lock-in semantics)
ZADD GTUpdate only if new score > existingStandard gaming leaderboard โ€” best-score-ever wins
ZADD LTUpdate only if new score < existingSpeedruns, golf, any minimum-score contest

Use ZADD GT for gaming leaderboards. It is idempotent (safe to replay from Kafka), prevents regression, and handles out-of-order event delivery gracefully โ€” if Kafka delivers a lower-score event after a higher-score event, GT silently discards it.

๐Ÿงญ Decision Guide: Redis Sorted Sets vs SQL vs Alternatives

SituationRecommendation
10M+ players, rank query < 10msRedis Sorted Set: ZREVRANK is O(log N), always sub-millisecond
Time-windowed leaderboardsOne sorted set per window with TTL โ€” simpler and faster than timestamp filtering
Score history / analyticsMySQL append-only score_history table โ€” Redis holds only the current best score per player
< 100K playersPostgreSQL with a covering index on (score DESC, player_id) is sufficient; no Redis needed
Friend leaderboardsZUNIONSTORE across friend-subset sorted sets, or a separate social graph query; not the global sorted set
Sharding at 100M+ playersShard by hash(player_id) % N; each shard gives local rank; global rank is computed in a periodic batch job
Score can decreaseUse plain ZADD and track a separate best_score column in MySQL; do not use GT
Tie-breaking by match timeEncode timestamp into score: effective_score = (best_score ร— 10^10) โˆ’ match_timestamp; earlier finishers rank higher at equal point totals

๐Ÿงช Practical Example: Walking the Full Write and Read Path for a Match Result

Scenario: Player p:42 finishes a Fortnite match on March 28, 2026 (Week 13) with 4,500 points โ€” their new personal best. Walk the complete flow from match end to leaderboard display.

Write path:

  1. Game Server emits: POST /ingest/score {player_id: "p:42", score: 4500, match_id: "m:9981"}
  2. Score Ingestion validates: match_id m:9981 not previously seen โ†’ publish to Kafka partition hash("p:42") % 32 = 17
  3. Score Processor consumes from partition 17:
    ZADD leaderboard:all-time          GT 4500 p:42  โ†’ 1 (new entry, or 0 if update)
    ZADD leaderboard:daily:2026-03-28  GT 4500 p:42
    ZADD leaderboard:weekly:2026-W13   GT 4500 p:42
    EXPIRE leaderboard:daily:2026-03-28 604800
    EXPIRE leaderboard:weekly:2026-W13 2592000
    
  4. Async MySQL write: INSERT INTO score_history VALUES ('p:42', 4500, 'm:9981', '2026-03-28', NOW())
  5. Leaderboard is live within < 2 seconds of match end.

Read path โ€” player checks their rank:

  1. Client calls: GET /rank/p:42?window=all-time
  2. Leaderboard API: ZREVRANK leaderboard:all-time p:42 โ†’ 4999999 (0-indexed)
  3. API returns: {"player_id": "p:42", "rank": 5000000, "score": 4500, "total_players": 10000000}
  4. Total API latency: < 5ms including network and serialization

Read path โ€” player checks nearby competitors:

  1. Client calls: GET /nearby/p:42?window=weekly&range=5
  2. API: ZREVRANK leaderboard:weekly:2026-W13 p:42 โ†’ rank 2399 (0-indexed)
  3. API: ZRANGE leaderboard:weekly:2026-W13 2394 2404 REV WITHSCORES
  4. Returns 11 players: p:42 plus 5 above and 5 below, sorted by score descending

๐Ÿ—๏ธ Advanced Scaling: Sharding Sorted Sets for 100M Players

For 100M players, a single sorted set requires ~8 GB and a single Redis node begins to saturate. Two sharding strategies apply:

1. Player-ID hash sharding:

leaderboard:all-time:shard:0  โ†’  players where hash(player_id) % 4 == 0
leaderboard:all-time:shard:1  โ†’  players where hash(player_id) % 4 == 1
leaderboard:all-time:shard:2  โ†’  hash % 4 == 2
leaderboard:all-time:shard:3  โ†’  hash % 4 == 3
  • ZADD routes to shard based on hash(player_id) % 4
  • ZREVRANK is local to the shard โ€” returns rank within shard (~25M players per shard), not global rank
  • Global rank requires cross-shard merge: run ZREVRANK on each shard, sum shard sizes for shards with globally higher scores โ€” computed in a periodic batch job, not per-request

2. Score-tier sharding:

leaderboard:all-time:tier:gold    โ†’  scores >= 10000
leaderboard:all-time:tier:silver  โ†’  scores 1000 to 9999
leaderboard:all-time:tier:bronze  โ†’  scores < 1000
  • Global rank for a bronze player = ZCARD(gold) + ZCARD(silver) + ZREVRANK(bronze, player)
  • Simple 3-command cross-shard rank, but requires knowing tier boundaries in advance

Recommendation for v2: player-ID hash sharding. It avoids cross-shard reads for the common case (per-player rank lookups at 500K/sec) and accepts a periodic batch job for the global top-100 list. Score-tier sharding breaks if score distributions shift mid-season.

๐Ÿ› ๏ธ Redis Sorted Sets: ZADD, ZRANK, and ZRANGE in Practice

Redis is a single-threaded, in-memory data structure server that achieves 1M+ ops/sec on a single node by executing every command atomically without locking. Its Sorted Set implementation combines a skip list (rank and range operations) with a hash table (score lookup), making it purpose-built for leaderboards.

Core commands with leaderboard examples:

# Submit score โ€” only update if new score is higher than existing (ZADD GT)
ZADD leaderboard:all-time GT 4500 p:42
# โ†’ (integer) 1 if inserted (new player), 0 if score was updated

# Get 0-indexed rank from the top (rank 0 = highest score)
ZREVRANK leaderboard:all-time p:42
# โ†’ (integer) 4999999

# Get rank + score in one round trip (Redis 7.2+)
ZREVRANK leaderboard:all-time p:42 WITHSCORE
# โ†’ 1) (integer) 4999999    2) "4500"

# Get top 10 players with scores
ZREVRANGE leaderboard:all-time 0 9 WITHSCORES
# โ†’ p:7712  98500  p:1834  95200  p:0093  94750  ...

# Daily key with auto-expiry after 7 days
ZADD leaderboard:daily:2026-03-28 GT 4500 p:42
EXPIRE leaderboard:daily:2026-03-28 604800

# Nearby players: get 10 players around rank 4999 (0-indexed)
ZRANGE leaderboard:all-time 4994 5004 REV WITHSCORES

Java (Lettuce async) โ€” score submission across all windows:

public CompletableFuture<Void> submitScore(String playerId, long score, LocalDate date) {
    RedisAsyncCommands<String, String> async = redisClient.connect().async();

    String allTimeKey = "leaderboard:all-time";
    String dailyKey   = "leaderboard:daily:" + date;           // e.g., leaderboard:daily:2026-03-28
    String weeklyKey  = "leaderboard:weekly:" + isoWeek(date); // e.g., leaderboard:weekly:2026-W13

    ZAddArgs onlyIfGreater = ZAddArgs.Builder.gt();

    // Fire all three ZADDs in parallel โ€” no ordering dependency between windows
    return CompletableFuture.allOf(
        async.zadd(allTimeKey, onlyIfGreater, (double) score, playerId).toCompletableFuture(),
        async.zadd(dailyKey,   onlyIfGreater, (double) score, playerId).toCompletableFuture(),
        async.zadd(weeklyKey,  onlyIfGreater, (double) score, playerId).toCompletableFuture()
    ).thenRun(() -> {
        // Set TTL on time-windowed keys โ€” idempotent, safe to call on every submit
        async.expire(dailyKey,  604800L);   // 7 days
        async.expire(weeklyKey, 2592000L);  // 30 days
    });
}

Top-100 JSON cache refresh (background job):

@Scheduled(fixedDelay = 10_000)  // refresh every 10 seconds
public void refreshTop100Cache() {
    List<ScoredValue<String>> top100 = redisSync
        .zrevrangeWithScores("leaderboard:all-time", 0, 99);

    List<Map<String, Object>> ranked = IntStream.range(0, top100.size())
        .mapToObj(i -> Map.of(
            "rank",      i + 1,
            "player_id", top100.get(i).getValue(),
            "score",     (long) top100.get(i).getScore()
        ))
        .toList();

    String json = objectMapper.writeValueAsString(ranked);
    // Cache for 15 seconds โ€” slightly longer than the refresh interval as a safety net
    redisSync.setex("leaderboard:top100:all-time:cache", 15, json);
}

For a full deep-dive on Redis Cluster consistent hashing, slot migration, and EVCache-style multi-region topology, see the Distributed Cache Platform HLD.

๐Ÿ“š Lessons Learned

  • Never use SQL ORDER BY for rank queries at scale. Even with a B-Tree covering index, computing rank requires counting rows above the target player โ€” O(R) where R equals the player's rank. Median players cause full half-table scans.
  • ZADD GT is the correct semantic for gaming leaderboards. Plain ZADD allows score regressions. ZADD NX locks after the first entry. Only ZADD GT gives "update only on improvement" โ€” and it is also safe to replay from Kafka without idempotency guards.
  • Use separate sorted sets per time window, not timestamp filters. leaderboard:daily:2026-03-28 with TTL is self-cleaning and preserves O(log N) rank semantics. A single sorted set with timestamp fields requires a secondary index and cannot serve rank queries without scanning.
  • The top-100 JSON cache eliminates almost all ZREVRANGE load. Even though ZREVRANGE 0 99 is fast in isolation, running it 500K times per second is wasteful. A 10-second pre-serialized string cache reduces this to 6 Redis reads per minute โ€” a 5,000ร— improvement.
  • Kafka is both the ingestion buffer and the disaster-recovery mechanism. 7-day retention means any Redis data loss can be replayed and reconstructed without touching MySQL or corrupted RDB snapshots.
  • Score update end-to-end latency is dominated by Kafka consumer lag, not ZADD time. Monitor consumer lag as the primary freshness SLO, not Redis write time.

๐Ÿ“Œ TLDR: Summary & Key Takeaways

  • Redis Sorted Set is the correct data structure for leaderboards at scale. ZADD, ZREVRANK, and ZREVRANGE are all O(log N) โ€” approximately 23 operations for 10M players, completing in under 100 microseconds.
  • SQL ORDER BY rank computation is O(R) for a player ranked R โ€” catastrophic at scale. Even with a B-Tree index, counting rows above a target score grows linearly with the player's rank.
  • Use ZADD GT for score submissions. It ensures scores can only improve, handles out-of-order Kafka delivery gracefully, and is safe to replay for disaster recovery.
  • One sorted set per time window. Key pattern leaderboard:daily:YYYY-MM-DD with TTL auto-cleans old windows. Never timestamp-filter a single shared sorted set.
  • Cache the top-100 list as a pre-serialized JSON string. A 10-second SETEX cache reduces 500K/sec sorted set scans to 6 per minute for the highest-traffic query.
  • MySQL stores score history; Redis holds current best scores. Kafka retains events for 7 days as the reconstruction buffer after any Redis failure.
  • Rule of thumb: if rank query budget is < 10ms and player count > 1M, use Redis Sorted Sets. Below 100K players, a PostgreSQL covering index on (score DESC, player_id) is sufficient.

๐Ÿ“ Practice Quiz

  1. Why does Redis ZREVRANK outperform SELECT COUNT(*) FROM scores WHERE score > ? for a player ranked 5 millionth out of 10 million?

    • A) Redis is always faster than SQL because it uses in-memory storage
    • B) Redis ZREVRANK traverses the skip list accumulating span counts in O(log N) โ€” about 23 operations regardless of rank; the SQL query must scan ~5 million index entries above the target score, making it O(R) where R is the player's rank
    • C) Redis pre-computes all ranks at write time so reads are O(1)

    Correct Answer: B

  2. A new engineer proposes storing all leaderboard data in one sorted set and computing the daily leaderboard by filtering entries with submitted_at >= today_start. What is the fundamental flaw with this design?

    • A) Redis does not support storing timestamps alongside scores
    • B) A sorted set is ordered by score only โ€” filtering by timestamp requires scanning all members above the target player and checking each entry's timestamp, which is an O(N) operation that destroys the O(log N) rank guarantee; a separate sorted set per day preserves O(log N) semantics natively
    • C) Redis would consume more memory with one large sorted set than with multiple smaller daily ones

    Correct Answer: B

  3. Which ZADD flag is correct for a gaming leaderboard where a player's rank should reflect their best score ever, and a lower score from a later match must never overwrite it?

    • A) ZADD NX โ€” only insert if the player has no existing entry; never updates
    • B) ZADD GT โ€” only update the score if the new value is strictly greater than the current score
    • C) ZADD LT โ€” only update if the new score is less than the current score

    Correct Answer: B

  4. The top-100 leaderboard page is loaded 500,000 times per second. ZREVRANGE 0 99 completes in 200 microseconds. What is the problem, and what is the correct mitigation?

    • A) No problem โ€” 500K ร— 200ยตs = 100 seconds total, well within Redis capacity
    • B) 500K ร— 200ยตs = 100 CPU-seconds per wall-clock second โ€” far beyond a single Redis node's throughput; cache the result as a pre-serialized JSON string with a 10-second TTL so 500K requests/sec hit a single O(1) GET instead of triggering 500K sorted set scans
    • C) Increase the number of Redis replicas to distribute ZREVRANGE reads across them

    Correct Answer: B

  5. Open-ended challenge: Your leaderboard has grown to 100M players and a single Redis node is at 90% memory utilisation. You are evaluating two sharding strategies: (a) shard by hash(player_id) % 4, and (b) shard by score tier (bronze/silver/gold). For each strategy, describe in detail how you would compute a player's exact global rank across shards. Then explain which strategy you would choose for a product where per-player rank queries are 10ร— more frequent than global top-100 list requests, and justify your choice.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms