System Design Core Concepts: Scalability, CAP, and Consistency
Abstract AlgorithmsTL;DR
This post covers: how to scale, how to measure performance, and how to choose between consistency and availability.

1. Scalability: The Art of Growing
Scalability is the ability of a system to handle increased load without crashing. There are two main ways to scale.
Vertical Scaling (Scale Up)
This means adding more power (CPU, RAM, Storage) to your existing server.
- Analogy: You have a car that can't drive fast enough. You swap the engine for a Ferrari engine.
- Pros: Simple. No code changes required.
- Cons:
- Hard Limit: There is a maximum amount of RAM you can buy.
- Single Point of Failure: If that one monster server dies, you are down.
- Use Case: Small databases, early-stage startups.
Horizontal Scaling (Scale Out)
This means adding more servers to your pool.
- Analogy: You have a car that can't carry enough people. Instead of making the car bigger, you buy 10 more cars.
- Pros:
- Infinite Scale: Just keep adding servers (Google has millions).
- Resilience: If one server dies, the others take over.
- Cons: Complex. You need Load Balancers, distributed databases, and complex management.
- Use Case: Web servers, Cassandra, MongoDB, large-scale apps.
2. Latency vs. Throughput
These are the two main metrics for performance, but they are often confused.
Latency (Speed)
The time it takes to perform a single action.
- Unit: Milliseconds (ms).
- Example: It takes 200ms for the webpage to load.
- Analogy: How fast a car travels down the highway (100 mph).
Throughput (Capacity)
The number of actions the system can handle per unit of time.
- Unit: Requests per Second (RPS) or Queries per Second (QPS).
- Example: The server handles 10,000 requests per second.
- Analogy: How many cars pass a point on the highway per hour.
The Trade-off: Often, optimizing for throughput hurts latency.
- Batching: If you wait to collect 100 requests before sending them to the DB (to improve throughput), the first request has to wait longer (worse latency).
3. The CAP Theorem
In a distributed system (a system with multiple nodes), you can only guarantee two of the following three properties:
- Consistency (C): Every read receives the most recent write or an error.
- Scenario: You update your password. The very next second, you try to log in. Consistency guarantees the system knows the new password.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Scenario: The database is syncing. You try to log in. The system lets you in (Availability) even if it might check against an old password (loss of Consistency).
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.
The Reality: P is Mandatory
In the real world (internet), networks fail. Cables get cut. Routers crash. Partition Tolerance is non-negotiable. Therefore, you must choose between CP and AP.
CP (Consistency + Partition Tolerance)
- Behavior: If the network splits (partition), the system stops accepting writes to ensure data doesn't get out of sync.
- Result: The system returns an error (Service Unavailable).
- Use Case: Banking, Inventory Management (Don't sell the same seat twice).
- Tech: HBase, MongoDB (default), Redis.
AP (Availability + Partition Tolerance)
- Behavior: If the network splits, the system keeps accepting writes on both sides of the split.
- Result: The nodes will be out of sync (inconsistent) until the network heals.
- Use Case: Social Media Feeds, Likes, Comments. (It's okay if I see 99 likes and you see 100).
- Tech: Cassandra, DynamoDB, CouchDB.
4. Consistency Patterns
If you choose AP (Availability), you need to define how consistent your data will be.
Strong Consistency
- After a write, any subsequent read immediately sees the new value.
- Cost: High latency (must lock data or wait for sync).
Weak Consistency
- After a write, the system makes no guarantee that the read will see the new value.
- Use Case: Video chat (dropped frames), real-time gaming positions.
Eventual Consistency
- The industry standard for high-scale systems.
- After a write, reads will eventually see the new value (usually within milliseconds), but there is a tiny window where they might see old data.
- Example: DNS updates, Email delivery, Amazon reviews.
Summary
- Scale Out (Horizontal) is usually better than Scale Up (Vertical) for massive systems.
- Latency is speed; Throughput is volume.
- CAP Theorem: You can't have it all. Choose CP for money/inventory, AP for social/content.
- Eventual Consistency is the price we pay for high availability.
Next Up: How do we actually connect these servers? We'll explore Networking: DNS, CDNs, and Load Balancers.

Written by
Abstract Algorithms
@abstractalgorithms
