System Design Core Concepts: Scalability, CAP, and Consistency

Abstract Algorithms

·Feb 8, 2026·4 min read

TL;DR

This post covers: how to scale, how to measure performance, and how to choose between consistency and availability.

Cover Image for System Design Core Concepts: Scalability, CAP, and Consistency

1. Scalability: The Art of Growing

Scalability is the ability of a system to handle increased load without crashing. There are two main ways to scale.

Vertical Scaling (Scale Up)

This means adding more power (CPU, RAM, Storage) to your existing server.

Analogy: You have a car that can't drive fast enough. You swap the engine for a Ferrari engine.
Pros: Simple. No code changes required.
Cons:
- Hard Limit: There is a maximum amount of RAM you can buy.
- Single Point of Failure: If that one monster server dies, you are down.
Use Case: Small databases, early-stage startups.

Horizontal Scaling (Scale Out)

This means adding more servers to your pool.

Analogy: You have a car that can't carry enough people. Instead of making the car bigger, you buy 10 more cars.
Pros:
- Infinite Scale: Just keep adding servers (Google has millions).
- Resilience: If one server dies, the others take over.
Cons: Complex. You need Load Balancers, distributed databases, and complex management.
Use Case: Web servers, Cassandra, MongoDB, large-scale apps.

2. Latency vs. Throughput

These are the two main metrics for performance, but they are often confused.

Latency (Speed)

The time it takes to perform a single action.

Unit: Milliseconds (ms).
Example: It takes 200ms for the webpage to load.
Analogy: How fast a car travels down the highway (100 mph).

Throughput (Capacity)

The number of actions the system can handle per unit of time.

Unit: Requests per Second (RPS) or Queries per Second (QPS).
Example: The server handles 10,000 requests per second.
Analogy: How many cars pass a point on the highway per hour.

The Trade-off: Often, optimizing for throughput hurts latency.

Batching: If you wait to collect 100 requests before sending them to the DB (to improve throughput), the first request has to wait longer (worse latency).

3. The CAP Theorem

In a distributed system (a system with multiple nodes), you can only guarantee two of the following three properties:

Consistency (C): Every read receives the most recent write or an error.
- Scenario: You update your password. The very next second, you try to log in. Consistency guarantees the system knows the new password.
Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Scenario: The database is syncing. You try to log in. The system lets you in (Availability) even if it might check against an old password (loss of Consistency).
Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.

The Reality: P is Mandatory

In the real world (internet), networks fail. Cables get cut. Routers crash. Partition Tolerance is non-negotiable. Therefore, you must choose between CP and AP.

CP (Consistency + Partition Tolerance)

Behavior: If the network splits (partition), the system stops accepting writes to ensure data doesn't get out of sync.
Result: The system returns an error (Service Unavailable).
Use Case: Banking, Inventory Management (Don't sell the same seat twice).
Tech: HBase, MongoDB (default), Redis.

AP (Availability + Partition Tolerance)

Behavior: If the network splits, the system keeps accepting writes on both sides of the split.
Result: The nodes will be out of sync (inconsistent) until the network heals.
Use Case: Social Media Feeds, Likes, Comments. (It's okay if I see 99 likes and you see 100).
Tech: Cassandra, DynamoDB, CouchDB.

4. Consistency Patterns

If you choose AP (Availability), you need to define how consistent your data will be.

Strong Consistency

After a write, any subsequent read immediately sees the new value.
Cost: High latency (must lock data or wait for sync).

Weak Consistency

After a write, the system makes no guarantee that the read will see the new value.
Use Case: Video chat (dropped frames), real-time gaming positions.

Eventual Consistency

The industry standard for high-scale systems.
After a write, reads will eventually see the new value (usually within milliseconds), but there is a tiny window where they might see old data.
Example: DNS updates, Email delivery, Amazon reviews.

Summary

Scale Out (Horizontal) is usually better than Scale Up (Vertical) for massive systems.
Latency is speed; Throughput is volume.
CAP Theorem: You can't have it all. Choose CP for money/inventory, AP for social/content.
Eventual Consistency is the price we pay for high availability.

Next Up: How do we actually connect these servers? We'll explore Networking: DNS, CDNs, and Load Balancers.