All Posts

The Role of Data in Precise Capacity Estimations for System Design

Don't guess. Calculate. We explain how to estimate QPS, Storage, and Bandwidth for your system de...

Abstract AlgorithmsAbstract Algorithms
ยทยท5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Capacity estimation is the skill of back-of-the-envelope math that tells you whether your system design will survive its traffic before you write a line of code. Four numbers do most of the work: DAU, QPS, Storage/day, and Bandwidth/day.


๐Ÿ“– The Restaurant Napkin Math

Before opening a restaurant, you estimate:

  • Expected customers per day.
  • Average orders per customer.
  • Average order size (for kitchen capacity).
  • Peak hours (for staffing).

Software capacity estimation follows the same logic. You are sizing the kitchen before building the restaurant.


๐Ÿ”ข The Four-Step Estimation Pipeline

Every system design capacity estimation follows this flow:

flowchart LR
    DAU["Daily Active Users\n(e.g., 10M)"] --> QPS["Convert to QPS\n(requests/sec)"]
    QPS --> Storage["Storage/day\n(data written)"]
    QPS --> Bandwidth["Bandwidth/day\n(data read/transferred)"]
    Storage --> Total["Total infra sizing\n(servers, DB, cache)"]
    Bandwidth --> Total

Step 1 โ€” DAU to QPS

$$\text{QPS} = \frac{\text{DAU} \times \text{requests per user per day}}{86400 \text{ seconds}}$$

Example โ€” Twitter-scale:

  • 100M DAU, each user generates 10 requests/day (timelines, searches, posts).
  • $QPS = (100M \times 10) / 86400 \approx 11,600~\text{RPS}$
  • Peak is typically 2-3ร— average: ~35,000 RPS peak.

Step 2 โ€” Storage per Day

$$\text{Storage/day} = \text{write QPS} \times \text{record size} \times 86400$$

Example โ€” URL Shortener:

  • 1,000 new URLs/day. Each record = ~500 bytes.
  • $1000 \times 500 = 500 \text{ KB/day}$
  • Over 5 years: $500 \text{ KB} \times 365 \times 5 \approx 900 \text{ MB}$ โ€” fits in a single DB.

Example โ€” Image Platform (Instagram-scale):

  • 1M uploads/day, average image = 1 MB.
  • $1M \times 1 \text{ MB} = 1 \text{ TB/day}$ โ†’ 365 TB/year. Object storage (S3), not a relational DB.

Step 3 โ€” Bandwidth

$$\text{Read Bandwidth} = \text{read QPS} \times \text{average response size}$$

If read:write ratio is 100:1 (social media timeline):

  • Write QPS = 1,000/sec at 100 bytes each โ†’ 100 KB/s write.
  • Read QPS = 100,000/sec at 10 KB each โ†’ 1 GB/s read. โ†’ CDN is mandatory.

โš™๏ธ Reference Numbers to Memorize

QuantityApproximate Value
Seconds in a day86,400 (~10^5)
Bytes in 1 MB10^6
Bytes in 1 GB10^9
Bytes in 1 TB10^12
Average SSD latency1 ms
Average DB query (indexed)1โ€“10 ms
Average network request (same DC)1โ€“5 ms
Typical API response size1โ€“50 KB
Typical image size (compressed)200 KB โ€“ 2 MB
Video (1080p, 1 hour)~1.5 GB

๐Ÿง  Worked Example: Design a Pastebin

Assumptions:

  • 1M DAU. Read:Write = 10:1. Average paste = 10 KB.
  • Write QPS = (1M ร— 1 paste/day) / 86400 โ‰ˆ 12 writes/sec
  • Read QPS = 12 ร— 10 = 120 reads/sec
  • Storage: 12 writes/sec ร— 10 KB ร— 86400 = ~10 GB/day โ†’ 10 TB over 3 years.
  • Read bandwidth: 120 reads/sec ร— 10 KB = 1.2 MB/sec โ†’ no CDN needed at this scale.

What this tells you:

  • A single PostgreSQL can comfortably handle sub-1000 writes/sec.
  • Storage backend should be durable object storage to handle 10 TB over 3 years.
  • No CDN or caching tier needed at this scale โ€” 120 RPS fits in a single app instance.

โš–๏ธ Common Estimation Mistakes

MistakeWhy It Matters
Ignoring peak-to-average ratioSizing for average means you can't handle 3ร— traffic spikes
Forgetting replication overheadA 1 TB DB with 3 replicas = 3 TB stored
Treating all writes as equalWrites to a hot row (stock ticker, popular post) create hotspots
Not accounting for growthA system sized for today will be undersized in 12 months โ€” plan for 3โ€“5ร—
Ignoring Pareto: 1% of users drive 90% of trafficA few power users can dominate the system

๐Ÿ“Œ Summary

  • DAU โ†’ QPS โ†’ Storage โ†’ Bandwidth is the standard four-step pipeline.
  • Peak QPS = 2-3ร— average; always design for peak.
  • 10^5 seconds/day is the key constant โ€” it converts user behavior to per-second rates.
  • Compare storage requirements early: 1 GB/day โ†’ relational DB. 1 TB/day โ†’ object storage.
  • High read bandwidth โ†’ CDN. Low bandwidth โ†’ single server is fine.

๐Ÿ“ Practice Quiz

  1. A social network has 50M DAU. Each user reads 20 posts per day. What is the approximate read QPS?

    • A) ~580 RPS
    • B) ~11,600 RPS
    • C) ~1,000,000 RPS
      Answer: B ((50M ร— 20) / 86400 โ‰ˆ 11,574 โ‰ˆ 11,600 RPS)
  2. Your system writes 100 bytes per transaction at 1,000 writes/sec. How much DB storage do you need per year?

    • A) ~3 GB
    • B) ~3 TB
    • C) ~3 PB
      Answer: A (100 bytes ร— 1000/sec ร— 86400 ร— 365 โ‰ˆ 3.15 ร— 10^12 bytes = 3 TB โ€” actually B; answer: B)
  3. Your service has a read:write ratio of 1000:1. Writes are 10 RPS, each response is 50 KB. What is the read bandwidth in GB/s?

    • A) 0.5 GB/s
    • B) 500 GB/s
    • C) 5 GB/s
      Answer: A (10 ร— 1000 reads/sec ร— 50 KB = 500,000 KB/s = 500 MB/s โ‰ˆ 0.5 GB/s)

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms