System Design HLD Example: Real-Time Leaderboard
A practical interview-ready HLD for a real-time gaming leaderboard with fast rank queries.
Abstract AlgorithmsTLDR: Real-time leaderboards for 10M+ players require Redis Sorted Sets as the primary ranking index โ ZADD updates a player's score in O(log N) and ZREVRANK returns their global rank in O(log N), keeping rank queries under 10ms at any scale. SQL ORDER BY cannot match this because counting rows above a given score requires a full index scan that grows with the player's rank. Time-windowed leaderboards (daily/weekly) use separate sorted sets per window with TTL expiry โ no cleanup jobs needed. MySQL stores the full score history for durability; the Kafka topic is the disaster-recovery replay buffer.
Fortnite runs seasonal competitions with 10M concurrent players. After each match, a player's rank on the global leaderboard must update within seconds and be visible to all. A naive SQL query โ SELECT COUNT(*) FROM scores WHERE score > ? โ scans the entire table to compute rank. At 10M rows, that query takes seconds per call; at 500K leaderboard reads per second during peak hours, it would instantly saturate any database cluster. How do you make rank queries instant at this scale?
The answer is not a faster database. It is a different data structure: the Redis Sorted Set, which stores every player's score in a skip list that makes ZADD, ZREVRANK, and ZREVRANGE operations O(log N) โ about 23 comparisons for 10M players, completing in under 100 microseconds. This walkthrough designs the full system: ingestion pipeline, Redis Sorted Set operations, time-windowed leaderboards, nearby-player queries, persistence strategy, and sharding approach for 100M-player scale.
๐ Use Cases and Actors: What a Real-Time Leaderboard Must Do
Actors
| Actor | Role |
| Game Server | Emits score events after each match ends; publishes to the ingestion pipeline |
| Score Ingestion Service | Receives raw score submissions; validates, deduplicates by match_id, and enqueues to Kafka |
| Score Processor | Kafka consumer; issues ZADD commands to Redis Sorted Sets for every time window |
| Redis Sorted Set | Primary ranking index; stores player_id โ score pairs in a skip list; O(log N) rank queries |
| Leaderboard API | Serves ZREVRANGE (top-N), ZREVRANK (player rank), and ZRANGE (nearby players) via REST |
| MySQL | Durable persistence for full score history; snapshotted from Kafka, not from Redis |
| Game Client | Reads leaderboard via the Leaderboard API; renders rank, nearby competitors, and top-100 |
Use Cases
- Submit score โ Game Server sends
player_id + score + match_idafter each match - Get top-N leaderboard โ return the top 100 players by score for a given time window
- Get a player's global rank โ return a player's exact 1-based rank among all active players
- Get nearby players (relative rank) โ return the 5 players above and below a given player
- Time-windowed leaderboards โ daily, weekly, and all-time views; each is a separate sorted set with its own TTL
Out of Scope (v1 boundary)
- Fraud and cheat detection (a separate score validation service would gate submissions upstream)
- Game server architecture and match logic
- Friend leaderboards (requires a social graph query layer โ different access pattern)
- Cross-region leaderboard federation (single-region deployment for v1)
๐ Functional Requirements: Score Submissions, Rank Queries, and Time Windows
| Feature | Redis Operation | Key Design Decision |
| Score submission | ZADD leaderboard:all-time GT score player_id | GT flag: only update if new score exceeds the current best โ prevents rank regression |
| Top-N leaderboard | ZREVRANGE leaderboard:all-time 0 99 WITHSCORES | O(log N + N) โ direct index lookup, no sort step |
| Player rank | ZREVRANK leaderboard:all-time player_id | O(log N) โ skip list span-count traversal |
| Nearby players | ZREVRANK โ ZRANGE around that position | Two commands; combined O(log N + K) for K neighbours |
| Time-windowed leaderboard | Separate sorted set per window: leaderboard:daily:2026-03-28 | Key expires automatically via TTL when the window closes |
| Pagination on top list | ZREVRANGE key offset count | Native Redis; no cursor state needed in the API layer |
This table surfaces the critical design choice before any architecture diagram: every use case maps directly to a specific Redis command with a known time complexity, validating the choice of Redis Sorted Set as the primary data structure.
โ๏ธ Capacity Estimation and Design Goals: Making Rank Queries Instant
Estimations
| Dimension | Assumption | Derivation |
| Active players | 10M | Fortnite-scale seasonal competition |
| Score submissions/sec | ~50K (burst) | 10M matches/day รท 86,400s ร burst overlap factor of 0.4 |
| Leaderboard reads/sec | ~500K | 10:1 read-to-write ratio during active sessions |
| Redis sorted set memory | ~800 MB | 10M members ร ~80 bytes per entry (score + player_id + skip list overhead) |
| MySQL score history | ~50 GB/year | 365 days ร 10M players ร ~14 bytes per row |
| Redis nodes (v1) | 1 primary + 1 replica | Single 4 GB Redis node holds all sorted sets with 3ร headroom |
Key insight: at 500K rank reads per second, a database query taking 10ms would require 5,000 concurrent connections just for rank lookups. Redis ZREVRANK at under 1ms with 1M+ ops/sec per node makes 500K reads/sec trivial on a single node.
Design Goals
| Goal | Target | Decision It Drives |
| Rank query latency | < 10ms p99 for 10M players | Redis Sorted Set primary index; no SQL on the hot path |
| Score update end-to-end | < 2 seconds from match end to leaderboard update | Kafka buffer + dedicated Score Processor with lag SLO < 1s |
| Top-100 bulk read | < 5ms p99 | Cache as pre-serialized JSON string; refresh every 10 seconds |
| Time window accuracy | Daily window aligned to UTC midnight | Sorted set key includes date: leaderboard:daily:2026-03-28 |
| Score regression prevention | Player rank can only improve | ZADD GT flag on every score submission |
๐ง Deep Dive: How Redis Sorted Sets Enable O(log N) Ranking at Scale
The Internals: Skip List and Hash Table Working Together
A Redis Sorted Set is not one data structure โ it is two cooperating ones:
Skip List โ a probabilistic layered linked list. Each node holds a member (
player_id) and score, maintained in sorted order by score. Skip lists achieve O(log N) insertion and rank queries by maintaining multiple "express lane" layers: on a 32-level skip list with 10M members, ZREVRANK requires at most ~23 pointer hops. Crucially, each forward pointer stores a span count (how many nodes it skips), so global rank is computed by summing span counts along the traversal path โ no row scanning required.Hash Table โ maps each member to its current score in O(1). ZSCORE uses this path. ZADD uses the hash table to locate the old entry before repositioning it in the skip list.
When you call ZADD leaderboard:all-time GT 4500 p:42:
- Hash table checks existing score for
p:42(O(1)) - If 4500 > existing score: skip list removes old node (O(log N)) and inserts at new rank (O(log N))
- Entire operation is atomic โ no lock needed because Redis is single-threaded
Skip List layout for a small leaderboard (4 levels):
Level 3: [HEAD] โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ [p:42: 4500] โโโ [TAIL]
Level 2: [HEAD] โโโโโโโโโ [p:07: 2800] โโโโโโโโโโ [p:42: 4500] โโโ [TAIL]
Level 1: [HEAD] โ [p:03: 1200] โ [p:07: 2800] โ [p:15: 3900] โ [p:42: 4500] โ [TAIL]
Level 0: [HEAD] โ [p:01: 800] โ [p:03: 1200] โ [p:07: 2800] โ [p:15: 3900] โ [p:42: 4500] โ [TAIL]
ZREVRANK p:42 โ start at Level 3, accumulate span counts top-down โ rank = 0 (highest score)
ZREVRANK p:07 โ span accumulation finds 2 players above (p:42, p:15) โ rank = 2
Why not a B-Tree? PostgreSQL uses a B-Tree on (score DESC) for ORDER BY score DESC queries. A B-Tree index makes ZRANK O(log N) too โ but computing the count of rows above a given score requires traversing all index pages above that position, which is O(R) where R is the number of players ranked above the target. For a player ranked 5 millionth out of 10 million, that is 5 million index entries scanned. A Redis skip list with span counts avoids this entirely: rank = sum of span values along the traversal path, computed in O(log N).
Performance Analysis: ZADD, ZRANK, and ZRANGE at Scale
| Operation | Redis Command | Time Complexity | At 10M Members |
| Score update | ZADD key GT score member | O(log N) | ~23 comparisons |
| Get player rank | ZREVRANK key member | O(log N) | ~23 comparisons |
| Get top-N | ZREVRANGE key 0 N-1 | O(log N + N) | Sub-millisecond for N = 100 |
| Nearby players window | ZRANGE key R-5 R+5 REV | O(log N + K) | Sub-millisecond for K = 10 |
| Member count | ZCARD key | O(1) | Instant |
| Score lookup | ZSCORE key member | O(1) | Instant โ hash table path |
SQL vs Redis rank query at 10M rows:
| Approach | Operation | Complexity | At 10M rows (rank 5M) |
SQL ORDER BY score DESC LIMIT 1 OFFSET 5M | Full index scan with offset | O(R) โ R rows above target | ~5M index entries scanned โ seconds |
SQL COUNT(*) WHERE score > ? | Index range scan | O(R) for R rows above target | ~5M comparisons โ hundreds of ms |
Redis ZREVRANK | Skip list span traversal | O(log N) always | ~23 comparisons โ < 1ms |
At 500K reads/second targeting median-ranked players, the SQL approach is physically impossible. The Redis approach is trivially achievable on a single node.
๐ High-Level Architecture: The Full Leaderboard Data Flow
The architecture separates the write path (score ingestion โ Redis) from the read path (API โ Redis) so each can be scaled independently.
flowchart TD
GS[Game Server] -->|score event: player_id + score + match_id| SI[Score Ingestion Service]
SI -->|validate + deduplicate| K[Kafka: scores topic]
K -->|consume partition| SP[Score Processor]
SP -->|ZADD GT score player_id| RS[Redis: leaderboard:all-time]
SP -->|ZADD GT score player_id| RD[Redis: leaderboard:daily:DATE]
SP -->|ZADD GT score player_id| RW[Redis: leaderboard:weekly:WEEK]
SP -->|async INSERT| DB[(MySQL: score_history)]
CL[Game Client] -->|GET /rank/player_id| API[Leaderboard API]
CL -->|GET /top/100| API
CL -->|GET /nearby/player_id| API
API -->|ZREVRANK| RS
API -->|GET top-100 cache key| TC[Top-100 JSON Cache]
TC -->|cache miss: ZREVRANGE 0 99| RS
API -->|ZRANGE around rank| RS
This flowchart shows the complete lifecycle of a leaderboard interaction. The Game Server feeds scores through Kafka to the Score Processor, which is the only writer to Redis โ this eliminates concurrent ZADD races. MySQL receives async inserts for durability without being in the read hot path. The Leaderboard API hits Redis directly for rank queries and reads the pre-serialized Top-100 JSON Cache for bulk list requests, reducing sorted set scans from 500K/sec to 6 per minute for the most popular query.
Write Path: From Match End to Leaderboard Update
1. Game Server โ Score Ingestion: POST /ingest/score
{ player_id: "p:42", score: 4500, match_id: "m:9981", ts: 1743120000 }
2. Score Ingestion validates: match_id not previously seen (idempotency check) โ
publish to Kafka partition hash("p:42") % 32
3. Score Processor consumes Kafka event:
ZADD leaderboard:all-time GT 4500 p:42
ZADD leaderboard:daily:2026-03-28 GT 4500 p:42
ZADD leaderboard:weekly:2026-W13 GT 4500 p:42
(each ZADD is O(log N); all three complete in < 1ms total)
4. Async: INSERT INTO score_history VALUES ('p:42', 4500, 'm:9981', '2026-03-28', NOW())
5. Leaderboard live โ ZREVRANK p:42 now returns p:42's updated rank
Why Kafka? Post-match bursts hit 50K submissions/sec. Kafka buffers the burst, provides ordered delivery per player partition (preventing out-of-order ZADD GT conflicts), and retains events for 7 days as the disaster-recovery replay buffer if Redis loses data.
Read Path: From API Call to Rank Response
sequenceDiagram
participant CL as Game Client
participant API as Leaderboard API
participant TC as Top-100 Cache
participant RS as Redis Sorted Set
CL->>API: GET /rank/p:42
API->>RS: ZREVRANK leaderboard:all-time p:42
RS-->>API: 4999999 (0-indexed)
API-->>CL: {rank: 5000000, score: 4500, total: 10000000}
CL->>API: GET /top/100
API->>TC: GET leaderboard:top100:all-time:cache
alt Cache fresh (set within last 10s)
TC-->>API: serialized JSON string
else Cache stale or missing
TC->>RS: ZREVRANGE leaderboard:all-time 0 99 WITHSCORES
RS-->>TC: 100 scored members
TC-->>API: refreshed JSON string
end
API-->>CL: top-100 list (JSON)
This sequence diagram shows why the Top-100 JSON Cache exists. A single GET on a pre-serialized string is O(1) and avoids running ZREVRANGE for every top-100 page load. The cache is refreshed every 10 seconds โ stale by at most 10 seconds, which is acceptable for a leaderboard visible to millions of players.
๐ Real-World Applications: Component Deep Dives for Time Windows, Bulk Reads, and Nearby Players
Time-Windowed Leaderboards: One Sorted Set Per Window
Use a separate sorted set per time window. This is simpler, faster, and more correct than timestamp-filtered queries on a shared sorted set.
Key naming convention:
leaderboard:all-time โ global permanent leaderboard
leaderboard:weekly:2026-W13 โ ISO week 13 of 2026
leaderboard:daily:2026-03-28 โ single calendar day
leaderboard:season:s2026-q1 โ custom season window
TTL for automatic expiry:
EXPIRE leaderboard:daily:2026-03-27 604800 # expire 7 days after day closes
EXPIRE leaderboard:weekly:2026-W12 2592000 # expire 30 days after week closes
When the window closes, the sorted set self-deletes. The next window's sorted set is created on the first ZADD. At any moment there are at most ~37 active sorted sets (1 all-time + 1 weekly + 7 recent-daily + ~28 older-daily with remaining TTL). Each window is independently ranked โ a player can be rank 1 today but rank 50,000 this week.
| Window | Key Pattern | TTL After Close | Memory |
| All-time | leaderboard:all-time | Never expires | ~800 MB |
| Weekly | leaderboard:weekly:YYYY-WNN | 30 days | ~800 MB |
| Daily | leaderboard:daily:YYYY-MM-DD | 7 days | ~800 MB |
Why not filter by timestamp in one sorted set? ZREVRANGEBYSCORE returns members in a score range, not a time range. Filtering by submission time requires either encoding time into the score (which distorts rank ordering) or a secondary index โ neither is a native sorted set operation. Separate sorted sets preserve the O(log N) rank property per window.
Top-N Bulk Read: Caching the List as a Pre-Serialized JSON String
ZREVRANGE 0 99 is O(log N + 100) โ fast, but called 500K times per second it consumes most of a Redis node's CPU. The solution is a pre-serialized JSON cache string:
Key: leaderboard:top100:all-time:cache
Value: '[{"rank":1,"player_id":"p:7712","score":98500},{"rank":2,...}]'
TTL: 15 seconds
The Leaderboard API reads this string with a single O(1) Redis GET. A background job (or the API on a cache miss) issues ZREVRANGE, serializes to JSON, and sets the cache key. This reduces ZREVRANGE calls from 500K/sec to 6 refreshes per minute โ a 5,000ร reduction.
Nearby Players: Two-Command ZREVRANK + ZRANGE
1. ZREVRANK leaderboard:all-time p:42 โ rank R (e.g., 4999, 0-indexed)
2. ZRANGE leaderboard:all-time (R-5) (R+5) REV WITHSCORES
โ returns the 10 players surrounding p:42 in descending score order
This two-command sequence is not atomic. A score update between the two commands could shift ranks by ยฑ1. For a leaderboard display this is acceptable โ the nearby-players view is informational and a one-position shift between commands is invisible to users. If strict atomicity is required, wrap both commands in a Lua script executed with EVAL.
โ๏ธ Trade-offs and Failure Modes: Sharding, Hot Seasons, and Redis Persistence
Scaling: From Single Node to Sharded Cluster
| Bottleneck | Symptom | Mitigation |
| Single node memory | Redis memory > 4 GB as players grow past 50M | Shard sorted sets by hash(player_id) % N across N Redis nodes |
| Throughput ceiling | Redis CPU > 80% sustained at 1M+ ops/sec | Add read replicas for ZREVRANK / ZRANGE; writes stay on primary |
| End-of-season burst | 10ร normal write rate as season deadline hits | Pre-scale: provision read replicas 1 hour before season end |
| Redis node failure | Rank queries fail; client errors spike | Redis Sentinel / Cluster with replica promotion < 5 seconds |
| Score regression | Late score submission overwrites a higher score | Always use ZADD GT โ ensures score can only increase |
Persistence Strategy: Redis-as-Primary, MySQL-as-Archive
Redis holds only each player's current best score per window. MySQL stores the full submission history (every match result) for analytics, audit, and reconstruction.
CREATE TABLE score_history (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
player_id VARCHAR(64) NOT NULL,
score BIGINT NOT NULL,
match_id VARCHAR(64) NOT NULL,
window_date DATE NOT NULL,
submitted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
INDEX idx_player (player_id),
INDEX idx_window_score (window_date, score DESC)
);
Failure recovery: if Redis loses sorted set data (node failure without replica, or corrupted RDB snapshot), the Score Processor replays the Kafka topic (retained 7 days). It reads the score_history table ordered by (player_id, submitted_at ASC), replays ZADD GT for each row, and reconstructs all sorted sets. Replay of 10M entries at 50K writes/sec takes ~3 minutes.
ZADD Flag Semantics: GT vs NX vs Plain
| Flag | Behavior | When to Use |
ZADD (plain) | Always overwrite score | When the latest score is always authoritative (score can decrease) |
ZADD NX | Insert only; never update | When first submission is final (lock-in semantics) |
ZADD GT | Update only if new score > existing | Standard gaming leaderboard โ best-score-ever wins |
ZADD LT | Update only if new score < existing | Speedruns, golf, any minimum-score contest |
Use ZADD GT for gaming leaderboards. It is idempotent (safe to replay from Kafka), prevents regression, and handles out-of-order event delivery gracefully โ if Kafka delivers a lower-score event after a higher-score event, GT silently discards it.
๐งญ Decision Guide: Redis Sorted Sets vs SQL vs Alternatives
| Situation | Recommendation |
| 10M+ players, rank query < 10ms | Redis Sorted Set: ZREVRANK is O(log N), always sub-millisecond |
| Time-windowed leaderboards | One sorted set per window with TTL โ simpler and faster than timestamp filtering |
| Score history / analytics | MySQL append-only score_history table โ Redis holds only the current best score per player |
| < 100K players | PostgreSQL with a covering index on (score DESC, player_id) is sufficient; no Redis needed |
| Friend leaderboards | ZUNIONSTORE across friend-subset sorted sets, or a separate social graph query; not the global sorted set |
| Sharding at 100M+ players | Shard by hash(player_id) % N; each shard gives local rank; global rank is computed in a periodic batch job |
| Score can decrease | Use plain ZADD and track a separate best_score column in MySQL; do not use GT |
| Tie-breaking by match time | Encode timestamp into score: effective_score = (best_score ร 10^10) โ match_timestamp; earlier finishers rank higher at equal point totals |
๐งช Practical Example: Walking the Full Write and Read Path for a Match Result
Scenario: Player p:42 finishes a Fortnite match on March 28, 2026 (Week 13) with 4,500 points โ their new personal best. Walk the complete flow from match end to leaderboard display.
Write path:
- Game Server emits:
POST /ingest/score {player_id: "p:42", score: 4500, match_id: "m:9981"} - Score Ingestion validates: match_id
m:9981not previously seen โ publish to Kafka partitionhash("p:42") % 32 = 17 - Score Processor consumes from partition 17:
ZADD leaderboard:all-time GT 4500 p:42 โ 1 (new entry, or 0 if update) ZADD leaderboard:daily:2026-03-28 GT 4500 p:42 ZADD leaderboard:weekly:2026-W13 GT 4500 p:42 EXPIRE leaderboard:daily:2026-03-28 604800 EXPIRE leaderboard:weekly:2026-W13 2592000 - Async MySQL write:
INSERT INTO score_history VALUES ('p:42', 4500, 'm:9981', '2026-03-28', NOW()) - Leaderboard is live within < 2 seconds of match end.
Read path โ player checks their rank:
- Client calls:
GET /rank/p:42?window=all-time - Leaderboard API:
ZREVRANK leaderboard:all-time p:42โ4999999(0-indexed) - API returns:
{"player_id": "p:42", "rank": 5000000, "score": 4500, "total_players": 10000000} - Total API latency: < 5ms including network and serialization
Read path โ player checks nearby competitors:
- Client calls:
GET /nearby/p:42?window=weekly&range=5 - API:
ZREVRANK leaderboard:weekly:2026-W13 p:42โ rank 2399 (0-indexed) - API:
ZRANGE leaderboard:weekly:2026-W13 2394 2404 REV WITHSCORES - Returns 11 players:
p:42plus 5 above and 5 below, sorted by score descending
๐๏ธ Advanced Scaling: Sharding Sorted Sets for 100M Players
For 100M players, a single sorted set requires ~8 GB and a single Redis node begins to saturate. Two sharding strategies apply:
1. Player-ID hash sharding:
leaderboard:all-time:shard:0 โ players where hash(player_id) % 4 == 0
leaderboard:all-time:shard:1 โ players where hash(player_id) % 4 == 1
leaderboard:all-time:shard:2 โ hash % 4 == 2
leaderboard:all-time:shard:3 โ hash % 4 == 3
- ZADD routes to shard based on
hash(player_id) % 4 - ZREVRANK is local to the shard โ returns rank within shard (~25M players per shard), not global rank
- Global rank requires cross-shard merge: run ZREVRANK on each shard, sum shard sizes for shards with globally higher scores โ computed in a periodic batch job, not per-request
2. Score-tier sharding:
leaderboard:all-time:tier:gold โ scores >= 10000
leaderboard:all-time:tier:silver โ scores 1000 to 9999
leaderboard:all-time:tier:bronze โ scores < 1000
- Global rank for a bronze player =
ZCARD(gold)+ZCARD(silver)+ZREVRANK(bronze, player) - Simple 3-command cross-shard rank, but requires knowing tier boundaries in advance
Recommendation for v2: player-ID hash sharding. It avoids cross-shard reads for the common case (per-player rank lookups at 500K/sec) and accepts a periodic batch job for the global top-100 list. Score-tier sharding breaks if score distributions shift mid-season.
๐ ๏ธ Redis Sorted Sets: ZADD, ZRANK, and ZRANGE in Practice
Redis is a single-threaded, in-memory data structure server that achieves 1M+ ops/sec on a single node by executing every command atomically without locking. Its Sorted Set implementation combines a skip list (rank and range operations) with a hash table (score lookup), making it purpose-built for leaderboards.
Core commands with leaderboard examples:
# Submit score โ only update if new score is higher than existing (ZADD GT)
ZADD leaderboard:all-time GT 4500 p:42
# โ (integer) 1 if inserted (new player), 0 if score was updated
# Get 0-indexed rank from the top (rank 0 = highest score)
ZREVRANK leaderboard:all-time p:42
# โ (integer) 4999999
# Get rank + score in one round trip (Redis 7.2+)
ZREVRANK leaderboard:all-time p:42 WITHSCORE
# โ 1) (integer) 4999999 2) "4500"
# Get top 10 players with scores
ZREVRANGE leaderboard:all-time 0 9 WITHSCORES
# โ p:7712 98500 p:1834 95200 p:0093 94750 ...
# Daily key with auto-expiry after 7 days
ZADD leaderboard:daily:2026-03-28 GT 4500 p:42
EXPIRE leaderboard:daily:2026-03-28 604800
# Nearby players: get 10 players around rank 4999 (0-indexed)
ZRANGE leaderboard:all-time 4994 5004 REV WITHSCORES
Java (Lettuce async) โ score submission across all windows:
public CompletableFuture<Void> submitScore(String playerId, long score, LocalDate date) {
RedisAsyncCommands<String, String> async = redisClient.connect().async();
String allTimeKey = "leaderboard:all-time";
String dailyKey = "leaderboard:daily:" + date; // e.g., leaderboard:daily:2026-03-28
String weeklyKey = "leaderboard:weekly:" + isoWeek(date); // e.g., leaderboard:weekly:2026-W13
ZAddArgs onlyIfGreater = ZAddArgs.Builder.gt();
// Fire all three ZADDs in parallel โ no ordering dependency between windows
return CompletableFuture.allOf(
async.zadd(allTimeKey, onlyIfGreater, (double) score, playerId).toCompletableFuture(),
async.zadd(dailyKey, onlyIfGreater, (double) score, playerId).toCompletableFuture(),
async.zadd(weeklyKey, onlyIfGreater, (double) score, playerId).toCompletableFuture()
).thenRun(() -> {
// Set TTL on time-windowed keys โ idempotent, safe to call on every submit
async.expire(dailyKey, 604800L); // 7 days
async.expire(weeklyKey, 2592000L); // 30 days
});
}
Top-100 JSON cache refresh (background job):
@Scheduled(fixedDelay = 10_000) // refresh every 10 seconds
public void refreshTop100Cache() {
List<ScoredValue<String>> top100 = redisSync
.zrevrangeWithScores("leaderboard:all-time", 0, 99);
List<Map<String, Object>> ranked = IntStream.range(0, top100.size())
.mapToObj(i -> Map.of(
"rank", i + 1,
"player_id", top100.get(i).getValue(),
"score", (long) top100.get(i).getScore()
))
.toList();
String json = objectMapper.writeValueAsString(ranked);
// Cache for 15 seconds โ slightly longer than the refresh interval as a safety net
redisSync.setex("leaderboard:top100:all-time:cache", 15, json);
}
For a full deep-dive on Redis Cluster consistent hashing, slot migration, and EVCache-style multi-region topology, see the Distributed Cache Platform HLD.
๐ Lessons Learned
- Never use SQL ORDER BY for rank queries at scale. Even with a B-Tree covering index, computing rank requires counting rows above the target player โ O(R) where R equals the player's rank. Median players cause full half-table scans.
- ZADD GT is the correct semantic for gaming leaderboards. Plain ZADD allows score regressions. ZADD NX locks after the first entry. Only ZADD GT gives "update only on improvement" โ and it is also safe to replay from Kafka without idempotency guards.
- Use separate sorted sets per time window, not timestamp filters.
leaderboard:daily:2026-03-28with TTL is self-cleaning and preserves O(log N) rank semantics. A single sorted set with timestamp fields requires a secondary index and cannot serve rank queries without scanning. - The top-100 JSON cache eliminates almost all ZREVRANGE load. Even though ZREVRANGE 0 99 is fast in isolation, running it 500K times per second is wasteful. A 10-second pre-serialized string cache reduces this to 6 Redis reads per minute โ a 5,000ร improvement.
- Kafka is both the ingestion buffer and the disaster-recovery mechanism. 7-day retention means any Redis data loss can be replayed and reconstructed without touching MySQL or corrupted RDB snapshots.
- Score update end-to-end latency is dominated by Kafka consumer lag, not ZADD time. Monitor consumer lag as the primary freshness SLO, not Redis write time.
๐ TLDR: Summary & Key Takeaways
- Redis Sorted Set is the correct data structure for leaderboards at scale. ZADD, ZREVRANK, and ZREVRANGE are all O(log N) โ approximately 23 operations for 10M players, completing in under 100 microseconds.
- SQL ORDER BY rank computation is O(R) for a player ranked R โ catastrophic at scale. Even with a B-Tree index, counting rows above a target score grows linearly with the player's rank.
- Use
ZADD GTfor score submissions. It ensures scores can only improve, handles out-of-order Kafka delivery gracefully, and is safe to replay for disaster recovery. - One sorted set per time window. Key pattern
leaderboard:daily:YYYY-MM-DDwith TTL auto-cleans old windows. Never timestamp-filter a single shared sorted set. - Cache the top-100 list as a pre-serialized JSON string. A 10-second
SETEXcache reduces 500K/sec sorted set scans to 6 per minute for the highest-traffic query. - MySQL stores score history; Redis holds current best scores. Kafka retains events for 7 days as the reconstruction buffer after any Redis failure.
- Rule of thumb: if rank query budget is < 10ms and player count > 1M, use Redis Sorted Sets. Below 100K players, a PostgreSQL covering index on
(score DESC, player_id)is sufficient.
๐ Practice Quiz
Why does Redis ZREVRANK outperform
SELECT COUNT(*) FROM scores WHERE score > ?for a player ranked 5 millionth out of 10 million?- A) Redis is always faster than SQL because it uses in-memory storage
- B) Redis ZREVRANK traverses the skip list accumulating span counts in O(log N) โ about 23 operations regardless of rank; the SQL query must scan ~5 million index entries above the target score, making it O(R) where R is the player's rank
- C) Redis pre-computes all ranks at write time so reads are O(1)
Correct Answer: B
A new engineer proposes storing all leaderboard data in one sorted set and computing the daily leaderboard by filtering entries with
submitted_at >= today_start. What is the fundamental flaw with this design?- A) Redis does not support storing timestamps alongside scores
- B) A sorted set is ordered by score only โ filtering by timestamp requires scanning all members above the target player and checking each entry's timestamp, which is an O(N) operation that destroys the O(log N) rank guarantee; a separate sorted set per day preserves O(log N) semantics natively
- C) Redis would consume more memory with one large sorted set than with multiple smaller daily ones
Correct Answer: B
Which ZADD flag is correct for a gaming leaderboard where a player's rank should reflect their best score ever, and a lower score from a later match must never overwrite it?
- A)
ZADD NXโ only insert if the player has no existing entry; never updates - B)
ZADD GTโ only update the score if the new value is strictly greater than the current score - C)
ZADD LTโ only update if the new score is less than the current score
Correct Answer: B
- A)
The top-100 leaderboard page is loaded 500,000 times per second. ZREVRANGE 0 99 completes in 200 microseconds. What is the problem, and what is the correct mitigation?
- A) No problem โ 500K ร 200ยตs = 100 seconds total, well within Redis capacity
- B) 500K ร 200ยตs = 100 CPU-seconds per wall-clock second โ far beyond a single Redis node's throughput; cache the result as a pre-serialized JSON string with a 10-second TTL so 500K requests/sec hit a single O(1)
GETinstead of triggering 500K sorted set scans - C) Increase the number of Redis replicas to distribute ZREVRANGE reads across them
Correct Answer: B
Open-ended challenge: Your leaderboard has grown to 100M players and a single Redis node is at 90% memory utilisation. You are evaluating two sharding strategies: (a) shard by
hash(player_id) % 4, and (b) shard by score tier (bronze/silver/gold). For each strategy, describe in detail how you would compute a player's exact global rank across shards. Then explain which strategy you would choose for a product where per-player rank queries are 10ร more frequent than global top-100 list requests, and justify your choice.
๐ Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache Hudi
TLDR: Delta Lake, Apache Iceberg, and Apache Hudi are open table formats that wrap Parquet files with a transaction log (or snapshot tree) to deliver ACID guarantees, time travel, schema evolution, and efficient upserts on object storage. Choose Delt...
Medallion Architecture: Bronze, Silver, and Gold Layers in Practice
TLDR: Medallion Architecture solves the "data swamp" problem by organizing a data lake into three progressively refined zones โ Bronze (raw, immutable), Silver (cleaned, conformed), Gold (aggregated, business-ready) โ so teams always build on a trust...
Kappa Architecture: Streaming-First Data Pipelines
TLDR: Kappa architecture replaces Lambda's batch + speed dual codebases with a single streaming pipeline backed by a replayable Kafka log. Reprocessing becomes replaying from offset 0. One codebase, no drift. TLDR: Kappa is the right call when your t...
Big Data 101: The 5 Vs, Ecosystem, and Why Scale Breaks Everything
TLDR: Traditional databases fail at big data scale for three concrete reasons โ storage saturation, compute bottleneck, and write-lock contention. The 5 Vs (Volume, Velocity, Variety, Veracity, Value) frame what makes data "big." A layered ecosystem ...
