System Design Core Concepts: Scalability, CAP, and Consistency
The building blocks of distributed systems. Learn about Vertical vs Horizontal scaling, the CAP Theorem, and ACID vs BASE.
Abstract Algorithms
Intermediate
For developers with some experience. Builds on fundamentals.
Estimated read time: 13 min
AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: ๐ Scalability, the CAP Theorem, and consistency models are the three concepts that determine whether a distributed system can grow, stay reliable, and deliver correct results. Get these three right and you can reason about any system design question.
๐ Understanding System Design Core Concepts
In 2011, Netflix engineers deployed Chaos Monkey โ a tool that randomly terminated production servers to test system resilience. Their infrastructure stayed up. Netflix had deliberately designed for AP (available + partition-tolerant): they accepted that some users might see slightly stale recommendations in exchange for the system never going down. A competitor designed for CP (consistent + partition-tolerant) would have returned errors during the same failures to protect data correctness.
That's the CAP theorem in action โ and it explains why your architecture choices have real, permanent engineering consequences.
System design revolves around three foundational questions:
- Can the system handle more load? (Scalability)
- What happens when a network partition occurs? (CAP Theorem)
- When multiple clients write, what does each reader see? (Consistency models)
These aren't independent choices. A bank can't be "highly available during network partitions and always consistent" โ the CAP Theorem makes that impossible.
๐ Scalability: Vertical vs. Horizontal
Vertical scaling (scale up) means adding more power to a single machine โ more CPU cores, more RAM, faster storage.
Horizontal scaling (scale out) means adding more machines and distributing the load across them.
| Dimension | Vertical Scaling | Horizontal Scaling |
| Ceiling | Hard hardware limit | Theoretically unlimited |
| Cost curve | Superlinear (big machines cost disproportionately more) | Linear (add commodity boxes) |
| Complexity | Low (no distribution) | High (coordination, consistency, partitioning) |
| Failure model | Single point of failure | Redundant (one node fails, others continue) |
| Best for | Databases under heavy write lock contention | Stateless services, read-heavy workloads |
In practice: most production systems use both. The database gets a large primary (vertical) plus read replicas (horizontal). The application tier is purely horizontal behind a load balancer.
graph LR
Client --> LB[Load Balancer]
LB --> A[App Server 1]
LB --> B[App Server 2]
LB --> C[App Server 3]
A --> Primary[(Primary DB)]
B --> Replica1[(Read Replica)]
C --> Replica2[(Read Replica)]
Primary --> Replica1
Primary --> Replica2
---
โ๏ธ How the CAP Theorem Works: Every Distributed System Makes a Choice
CAP Theorem states that a distributed system can guarantee at most two of three properties simultaneously:
- Consistency (C) โ every read returns the most recent write (or an error)
- Availability (A) โ every request receives a response (possibly stale)
- Partition Tolerance (P) โ the system continues operating despite network partitions
Because network partitions will happen in any real distributed system, P is non-optional. The real choice is: when a partition occurs, do you sacrifice C or A?
| System type | CAP choice | Example systems | Why |
| CP | Consistency + Partition tolerance | Zookeeper, etcd, HBase | Correct data is more important than availability |
| AP | Availability + Partition tolerance | Cassandra, CouchDB, DynamoDB | Uptime is more important than perfect consistency |
Practical implication:
- Banking transaction? Choose CP โ you can't have a split-brain account balance.
- Social media feed? Choose AP โ slightly stale posts are acceptable; being down is not.
๐ง Deep Dive: Consistency Models Unpacked
The CAP theorem frames a binary choice, but in practice the important dimension is how you trade off consistency against availability during normal operationโnot just during failures.
| Model | Guarantee | Example system |
| Strong consistency | Every read sees the latest write | Single-node RDBMS, ZooKeeper |
| Eventual consistency | Reads converge to the latest writeโeventually | DynamoDB, Cassandra |
| Read-your-writes | You always see your own latest write | Session-scoped DB routing |
| Causal consistency | Causally related ops are ordered; others may not be | MongoDB sessions, CockroachDB |
๐ Eventual Consistency Propagation
sequenceDiagram
participant C1 as Client 1 (Writer)
participant P as Primary Node
participant R1 as Replica 1
participant R2 as Replica 2
participant C2 as Client 2 (Reader)
C1->>P: Write: balance = $500
P-->>C1: ACK (write confirmed)
Note over R1,R2: Async replication in progress
C2->>R2: Read: balance?
R2-->>C2: balance = $400 (stale read)
P->>R1: Replicate: balance = $500
P->>R2: Replicate: balance = $500
Note over R1,R2: Replicas converged
C2->>R2: Read: balance?
R2-->>C2: balance = $500 (consistent)
*This diagram demonstrates eventual consistency in action: Client 1 writes a new balance to the Primary node, which immediately acknowledges the write; meanwhile Client 2 reads from Replica 2 and receives a stale value ($400) because asynchronous replication has not yet propagated the update. After both replicas receive the replication message from Primary, all subsequent reads return the correct value ($500). The key takeaway is that in an AP system there is a window of time after a write โ potentially seconds or longer โ during which different readers may see different values, and application logic must be designed to tolerate this.*
๐ Consistency Levels: State Transitions
stateDiagram-v2
[*] --> Strong : CP system (ZooKeeper)
[*] --> Eventual : AP system (Cassandra)
[*] --> Weak : Cache only (Redis TTL)
Strong --> Eventual : Relax for throughput
Eventual --> Strong : Require linearizability
Eventual --> ReadYourWrites : Scope to session
ReadYourWrites --> Causal : Add causal ordering
Weak --> Eventual : Add replication sync
*This state diagram maps the consistency model spectrum and the architectural decisions that transition between levels. A CP system starts in Strong consistency while an AP system starts in Eventual; the labeled arrows show how relaxing or tightening guarantees moves you between states โ for example, scoping reads to a session turns Eventual into Read-Your-Writes, and adding causal ordering upgrades that to Causal. The key takeaway is that consistency is not a binary on/off choice: you can tune the level per-operation or per-session scope rather than applying one model uniformly across your entire system.*
๐ System Design Decision Flow: Choosing the Right Architecture
When designing a distributed system, the sequence of decisions follows a consistent pattern. Use this flow to reason through any system design problem systematically.
graph TD
A[Define consistency requirement] --> B{Strong consistency required?}
B -->|Yes| C[Choose CP system: etcd, Zookeeper, PostgreSQL]
B -->|No| D{High availability critical?}
D -->|Yes| E[Choose AP system: Cassandra, DynamoDB]
D -->|No| F[Choose simple single-node or CA architecture]
C --> G[Add replication + failover]
E --> H[Add conflict resolution + eventual sync]
F --> I[Scale vertically first, then revisit]
This decision framework applies to every data store, cache, and message queue in your architecture. The same three-question pattern โ consistency need, availability requirement, partition tolerance โ governs each component choice independently.
๐งฎ ACID vs. BASE: Two Ways to Think About Data Correctness
ACID (traditional relational databases):
- Atomic โ all operations in a transaction succeed or all fail
- Consistent โ the database moves from one valid state to another valid state
- Isolated โ concurrent transactions don't interfere with each other
- Durable โ committed data survives crashes
BASE (most NoSQL / distributed systems):
- Basically Available โ availability is prioritized; partial failures are acceptable
- Soft state โ data can be stale; state may change without input (replication catching up)
- Eventual consistency โ given no new writes, all replicas will eventually converge to the same value
| Property | ACID | BASE |
| Consistency model | Strong / serializable | Eventual / weak |
| Availability during partition | May refuse | Always responds |
| Use case | Financial transactions, inventory | User preferences, cache, social data |
| Typical tech | PostgreSQL, MySQL | Cassandra, DynamoDB, Redis |
๐ Real-World Applications: Applying These Concepts: E-Commerce Checkout Example
An e-commerce checkout flow actually uses multiple consistency models simultaneously:
| Component | Consistency need | Choice | Why |
| Inventory decrement | Strong (no overselling) | ACID + CP | Two checkouts can't both buy the last item |
| Product catalog | Eventual (stale by seconds is fine) | BASE + AP | Faster reads; slightly stale prices are acceptable |
| Session cart | Eventual (user-scoped) | AP | Cart doesn't need global consistency |
| Payment processing | Strong (exact, auditable) | ACID + CP | Regulatory and financial correctness |
This is the key insight: you don't choose one consistency model for your whole system. You partition your data by its consistency requirements.
โ๏ธ Trade-offs & Failure Modes: When Things Go Wrong: Partition Scenarios
A network partition means some nodes can't communicate with others. What happens next depends on your CAP choice:
CP system during partition: Nodes that can't confirm they hold the latest data refuse to serve requests. Clients see errors or timeouts. Data is always correct when available.
AP system during partition: All nodes continue serving. Clients on different sides of the partition may see different values. When the partition heals, the system runs a reconciliation protocol (last-write-wins, vector clocks, CRDTs) to converge.
Common failure modes:
- Split-brain โ two nodes both think they're the primary and accept conflicting writes (CP systems prevent this; AP systems must handle reconciliation)
- Stale reads โ a read reaches a replica that hasn't received the latest replication update (expected in AP systems, needs careful application-level handling)
- Replication lag โ primary accepts a write but it hasn't propagated to all replicas before the primary crashes (mitigated by synchronous replication or quorum writes)
๐งญ Decision Guide: Choosing the Right Consistency Level for Your Use Case
| Situation | CAP preference | Consistency model | Reasoning |
| Financial transaction | CP | Strong (serializable) | Correctness is non-negotiable |
| Real-time leaderboard | AP | Eventual | Slightly stale rankings are fine |
| Collaborative document editing | CP with CRDT | Causal | Ordering matters; CRDTs enable merge-friendly writes |
| User profile reads | AP | Read-your-writes | User should see their own updates; global consistency unnecessary |
| Inventory system | CP | Linearizable | Prevent overselling with quorum writes |
๐งช Little's Law: Back-of-Envelope Capacity Math
This section applies Little's Law to back-of-envelope capacity estimation โ one of the most practical calculations in both system design interviews and real production capacity planning. It was chosen because it provides a quantitative bridge between the scalability concepts in this post and a concrete number: given your target request rate and latency budget, how much concurrency must your system support? As you work through the example, focus on how the three variables (concurrency L, arrival rate ฮป, latency W) constrain each other โ improving any one of them directly affects the resources you need to provision.
$$L = \lambda \cdot W$$
Where:
- $L$ = average number of requests in the system (concurrency)
- $\lambda$ = arrival rate (requests per second)
- $W$ = average time spent in the system (latency in seconds)
Example: if your service handles 500 req/s and average latency is 200 ms:
$$L = 500 imes 0.2 = 100 ext{ concurrent requests}$$
This tells you how many concurrent connections your server needs to handle. Multiply by average memory per request to estimate RAM needs.
๐ ๏ธ Implementation Approach: CAP-Aware Architecture in Practice
The e-commerce example from this post โ strong consistency for payments, eventual consistency for the product catalog โ reflects a two-layer architecture that most production systems use. The transactional data path (inventory, payments) routes through a relational database with ACID semantics: each inventory decrement or payment record is wrapped in a database transaction that either commits fully or rolls back entirely, preventing split writes like two concurrent checkouts both claiming the last item. Acquiring a row-level lock before reading the current stock count is what prevents the race condition.
The eventual-consistency data path (product catalog, recommendations, user preferences) routes through a distributed cache with a configurable TTL โ typically 30 to 60 seconds. Reads are served from the cache and may be stale by up to that TTL window. On a cache miss the system fetches fresh data from the database and repopulates the cache. When the underlying record changes, the cache entry is explicitly invalidated so the next read reflects the update within the TTL interval.
The key architectural point is that both paths coexist in the same application under fundamentally different consistency guarantees. Choosing which data travels each path โ and being explicit about the staleness window you accept on the AP path โ is the most consequential consistency decision you will make in a distributed system.
๐ Key Lessons for System Designers
These lessons come from systems that succeeded โ and the many that didn't โ in production environments.
Consistency is not free. Every guarantee of consistency requires coordination among nodes, which adds latency. Before choosing strong consistency, verify that the use case actually requires it. Many systems choose strong consistency out of habit and pay an unnecessary availability tax.
Design for failure, not for success. The CAP Theorem forces you to explicitly decide what your system does when things go wrong, not just when everything works. Define your system's behavior under partition before you write the first line of code.
Start with known tradeoffs. When building a new service, pick an existing well-understood data store (PostgreSQL for ACID, Cassandra for AP, Redis for low-latency cache) and document why you made that choice. Unknown tradeoffs from custom solutions accumulate as technical debt.
Monitor tail latency, not averages. Average latency hides the worst-case experience. p99 latency โ the slowest 1% of requests โ is what users actually notice. Instrument your system for percentile-based metrics from day one.
๐ TLDR: Summary & Key Takeaways
- Vertical scaling has a hard ceiling; horizontal scaling requires coordination but is theoretically unlimited.
- CAP Theorem forces a binary choice during partitions: consistency or availability. Partition tolerance is mandatory.
- ACID databases guarantee correctness; BASE systems trade consistency for availability and throughput.
- Real systems use both โ CP for critical paths (payments, inventory), AP for non-critical paths (catalog, preferences).
- Tail latency (p95/p99), not average latency, is the metric that actually determines whether your system feels fast to users.
๐ Related Posts
- System Design: Databases, SQL vs. NoSQL, and Scaling
- System Design: Networking, DNS, CDNs, and Load Balancers
- Consistent Hashing Explained
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Stale Reads and Cascading Failures in Distributed Systems
TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable โ stale reads...
Clock Skew and Causality Violations: Why Distributed Clocks Lie
TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions โ but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...
Split Brain Explained: When Two Nodes Both Think They Are Leader
TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader โ each accepting writes the other never sees. Prevent it with quorum consensus (at least โN/2โ+1 nodes must agree before leadership is g...
NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data
TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node โ virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...
