All Posts

System Design Core Concepts: Scalability, CAP, and Consistency

The building blocks of distributed systems. Learn about Vertical vs Horizontal scaling, the CAP Theorem, and ACID vs BASE.

Abstract AlgorithmsAbstract Algorithms
ยทยท13 min read
Cover Image for System Design Core Concepts: Scalability, CAP, and Consistency
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: ๐Ÿš€ Scalability, the CAP Theorem, and consistency models are the three concepts that determine whether a distributed system can grow, stay reliable, and deliver correct results. Get these three right and you can reason about any system design question.


๐Ÿ“– Understanding System Design Core Concepts

In 2011, Netflix engineers deployed Chaos Monkey โ€” a tool that randomly terminated production servers to test system resilience. Their infrastructure stayed up. Netflix had deliberately designed for AP (available + partition-tolerant): they accepted that some users might see slightly stale recommendations in exchange for the system never going down. A competitor designed for CP (consistent + partition-tolerant) would have returned errors during the same failures to protect data correctness.

That's the CAP theorem in action โ€” and it explains why your architecture choices have real, permanent engineering consequences.

System design revolves around three foundational questions:

  1. Can the system handle more load? (Scalability)
  2. What happens when a network partition occurs? (CAP Theorem)
  3. When multiple clients write, what does each reader see? (Consistency models)

These aren't independent choices. A bank can't be "highly available during network partitions and always consistent" โ€” the CAP Theorem makes that impossible.


๐Ÿ” Scalability: Vertical vs. Horizontal

Vertical scaling (scale up) means adding more power to a single machine โ€” more CPU cores, more RAM, faster storage.

Horizontal scaling (scale out) means adding more machines and distributing the load across them.

DimensionVertical ScalingHorizontal Scaling
CeilingHard hardware limitTheoretically unlimited
Cost curveSuperlinear (big machines cost disproportionately more)Linear (add commodity boxes)
ComplexityLow (no distribution)High (coordination, consistency, partitioning)
Failure modelSingle point of failureRedundant (one node fails, others continue)
Best forDatabases under heavy write lock contentionStateless services, read-heavy workloads

In practice: most production systems use both. The database gets a large primary (vertical) plus read replicas (horizontal). The application tier is purely horizontal behind a load balancer.

graph LR
    Client --> LB[Load Balancer]
    LB --> A[App Server 1]
    LB --> B[App Server 2]
    LB --> C[App Server 3]
    A --> Primary[(Primary DB)]
    B --> Replica1[(Read Replica)]
    C --> Replica2[(Read Replica)]
    Primary --> Replica1
    Primary --> Replica2

โš™๏ธ How the CAP Theorem Works: Every Distributed System Makes a Choice

CAP Theorem states that a distributed system can guarantee at most two of three properties simultaneously:

  • Consistency (C) โ€” every read returns the most recent write (or an error)
  • Availability (A) โ€” every request receives a response (possibly stale)
  • Partition Tolerance (P) โ€” the system continues operating despite network partitions

Because network partitions will happen in any real distributed system, P is non-optional. The real choice is: when a partition occurs, do you sacrifice C or A?

System typeCAP choiceExample systemsWhy
CPConsistency + Partition toleranceZookeeper, etcd, HBaseCorrect data is more important than availability
APAvailability + Partition toleranceCassandra, CouchDB, DynamoDBUptime is more important than perfect consistency

Practical implication:

  • Banking transaction? Choose CP โ€” you can't have a split-brain account balance.
  • Social media feed? Choose AP โ€” slightly stale posts are acceptable; being down is not.

๐Ÿง  Deep Dive: Consistency Models Unpacked

The CAP theorem frames a binary choice, but in practice the important dimension is how you trade off consistency against availability during normal operationโ€”not just during failures.

ModelGuaranteeExample system
Strong consistencyEvery read sees the latest writeSingle-node RDBMS, ZooKeeper
Eventual consistencyReads converge to the latest writeโ€”eventuallyDynamoDB, Cassandra
Read-your-writesYou always see your own latest writeSession-scoped DB routing
Causal consistencyCausally related ops are ordered; others may not beMongoDB sessions, CockroachDB

๐Ÿ“Š Eventual Consistency Propagation

sequenceDiagram
    participant C1 as Client 1 (Writer)
    participant P as Primary Node
    participant R1 as Replica 1
    participant R2 as Replica 2
    participant C2 as Client 2 (Reader)

    C1->>P: Write: balance = $500
    P-->>C1: ACK (write confirmed)
    Note over R1,R2: Async replication in progress
    C2->>R2: Read: balance?
    R2-->>C2: balance = $400 (stale read)
    P->>R1: Replicate: balance = $500
    P->>R2: Replicate: balance = $500
    Note over R1,R2: Replicas converged
    C2->>R2: Read: balance?
    R2-->>C2: balance = $500 (consistent)

This diagram demonstrates eventual consistency in action: Client 1 writes a new balance to the Primary node, which immediately acknowledges the write; meanwhile Client 2 reads from Replica 2 and receives a stale value ($400) because asynchronous replication has not yet propagated the update. After both replicas receive the replication message from Primary, all subsequent reads return the correct value ($500). The key takeaway is that in an AP system there is a window of time after a write โ€” potentially seconds or longer โ€” during which different readers may see different values, and application logic must be designed to tolerate this.

๐Ÿ“Š Consistency Levels: State Transitions

stateDiagram-v2
    [*] --> Strong : CP system (ZooKeeper)
    [*] --> Eventual : AP system (Cassandra)
    [*] --> Weak : Cache only (Redis TTL)

    Strong --> Eventual : Relax for throughput
    Eventual --> Strong : Require linearizability
    Eventual --> ReadYourWrites : Scope to session
    ReadYourWrites --> Causal : Add causal ordering
    Weak --> Eventual : Add replication sync

This state diagram maps the consistency model spectrum and the architectural decisions that transition between levels. A CP system starts in Strong consistency while an AP system starts in Eventual; the labeled arrows show how relaxing or tightening guarantees moves you between states โ€” for example, scoping reads to a session turns Eventual into Read-Your-Writes, and adding causal ordering upgrades that to Causal. The key takeaway is that consistency is not a binary on/off choice: you can tune the level per-operation or per-session scope rather than applying one model uniformly across your entire system.

๐Ÿ“Š System Design Decision Flow: Choosing the Right Architecture

When designing a distributed system, the sequence of decisions follows a consistent pattern. Use this flow to reason through any system design problem systematically.

graph TD
    A[Define consistency requirement] --> B{Strong consistency required?}
    B -->|Yes| C[Choose CP system: etcd, Zookeeper, PostgreSQL]
    B -->|No| D{High availability critical?}
    D -->|Yes| E[Choose AP system: Cassandra, DynamoDB]
    D -->|No| F[Choose simple single-node or CA architecture]
    C --> G[Add replication + failover]
    E --> H[Add conflict resolution + eventual sync]
    F --> I[Scale vertically first, then revisit]

This decision framework applies to every data store, cache, and message queue in your architecture. The same three-question pattern โ€” consistency need, availability requirement, partition tolerance โ€” governs each component choice independently.


๐Ÿงฎ ACID vs. BASE: Two Ways to Think About Data Correctness

ACID (traditional relational databases):

  • Atomic โ€” all operations in a transaction succeed or all fail
  • Consistent โ€” the database moves from one valid state to another valid state
  • Isolated โ€” concurrent transactions don't interfere with each other
  • Durable โ€” committed data survives crashes

BASE (most NoSQL / distributed systems):

  • Basically Available โ€” availability is prioritized; partial failures are acceptable
  • Soft state โ€” data can be stale; state may change without input (replication catching up)
  • Eventual consistency โ€” given no new writes, all replicas will eventually converge to the same value
PropertyACIDBASE
Consistency modelStrong / serializableEventual / weak
Availability during partitionMay refuseAlways responds
Use caseFinancial transactions, inventoryUser preferences, cache, social data
Typical techPostgreSQL, MySQLCassandra, DynamoDB, Redis

๐ŸŒ Real-World Applications: Applying These Concepts: E-Commerce Checkout Example

An e-commerce checkout flow actually uses multiple consistency models simultaneously:

ComponentConsistency needChoiceWhy
Inventory decrementStrong (no overselling)ACID + CPTwo checkouts can't both buy the last item
Product catalogEventual (stale by seconds is fine)BASE + APFaster reads; slightly stale prices are acceptable
Session cartEventual (user-scoped)APCart doesn't need global consistency
Payment processingStrong (exact, auditable)ACID + CPRegulatory and financial correctness

This is the key insight: you don't choose one consistency model for your whole system. You partition your data by its consistency requirements.


โš–๏ธ Trade-offs & Failure Modes: When Things Go Wrong: Partition Scenarios

A network partition means some nodes can't communicate with others. What happens next depends on your CAP choice:

CP system during partition: Nodes that can't confirm they hold the latest data refuse to serve requests. Clients see errors or timeouts. Data is always correct when available.

AP system during partition: All nodes continue serving. Clients on different sides of the partition may see different values. When the partition heals, the system runs a reconciliation protocol (last-write-wins, vector clocks, CRDTs) to converge.

Common failure modes:

  • Split-brain โ€” two nodes both think they're the primary and accept conflicting writes (CP systems prevent this; AP systems must handle reconciliation)
  • Stale reads โ€” a read reaches a replica that hasn't received the latest replication update (expected in AP systems, needs careful application-level handling)
  • Replication lag โ€” primary accepts a write but it hasn't propagated to all replicas before the primary crashes (mitigated by synchronous replication or quorum writes)

๐Ÿงญ Decision Guide: Choosing the Right Consistency Level for Your Use Case

SituationCAP preferenceConsistency modelReasoning
Financial transactionCPStrong (serializable)Correctness is non-negotiable
Real-time leaderboardAPEventualSlightly stale rankings are fine
Collaborative document editingCP with CRDTCausalOrdering matters; CRDTs enable merge-friendly writes
User profile readsAPRead-your-writesUser should see their own updates; global consistency unnecessary
Inventory systemCPLinearizablePrevent overselling with quorum writes

๐Ÿงช Little's Law: Back-of-Envelope Capacity Math

This section applies Little's Law to back-of-envelope capacity estimation โ€” one of the most practical calculations in both system design interviews and real production capacity planning. It was chosen because it provides a quantitative bridge between the scalability concepts in this post and a concrete number: given your target request rate and latency budget, how much concurrency must your system support? As you work through the example, focus on how the three variables (concurrency L, arrival rate ฮป, latency W) constrain each other โ€” improving any one of them directly affects the resources you need to provision.

$$L = \lambda \cdot W$$

Where:

  • $L$ = average number of requests in the system (concurrency)
  • $\lambda$ = arrival rate (requests per second)
  • $W$ = average time spent in the system (latency in seconds)

Example: if your service handles 500 req/s and average latency is 200 ms:

$$L = 500 imes 0.2 = 100 ext{ concurrent requests}$$

This tells you how many concurrent connections your server needs to handle. Multiply by average memory per request to estimate RAM needs.


๐Ÿ› ๏ธ Spring Boot + Spring Data Redis: CAP-Aware Caching in Java

Spring Boot is the dominant Java framework for building production microservices โ€” its @Cacheable, spring-data-redis, and spring-data-jpa integrations let you implement the AP/CP architectural patterns from this post without writing low-level connection management code.

The e-commerce example in this post โ€” strong consistency for payments, eventual consistency for the product catalog โ€” maps directly to using Spring Data JPA (for PostgreSQL ACID transactions) alongside Spring Data Redis (for the AP cache layer):

// build.gradle dependencies:
// implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
// implementation 'org.springframework.boot:spring-boot-starter-data-redis'
// implementation 'org.springframework.boot:spring-boot-starter-cache'

import org.springframework.cache.annotation.Cacheable;
import org.springframework.cache.annotation.CacheEvict;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

// --- CP path: ACID inventory decrement (PostgreSQL via JPA) ---
@Service
public class InventoryService {

    private final InventoryRepository repo;

    @Transactional  // ACID โ€” rolls back if decrement fails
    public void decrementStock(Long productId, int qty) {
        Inventory inv = repo.findByIdForUpdate(productId)  // SELECT FOR UPDATE (row lock)
            .orElseThrow(() -> new RuntimeException("Product not found"));
        if (inv.getStock() < qty) throw new IllegalStateException("Insufficient stock");
        inv.setStock(inv.getStock() - qty);
        repo.save(inv);
    }
}

// --- AP path: eventually consistent catalog cache (Redis) ---
@Service
public class ProductCatalogService {

    private final ProductRepository productRepo;

    @Cacheable(value = "catalog", key = "#productId",
               unless = "#result == null")          // AP โ€” stale by up to TTL seconds
    public ProductDto getProduct(Long productId) {
        return productRepo.findById(productId)
            .map(ProductDto::from)
            .orElse(null);
    }

    @CacheEvict(value = "catalog", key = "#productId") // Invalidate on update
    public void updateProduct(Long productId, ProductDto dto) {
        // update persisted to PostgreSQL, cache evicted
    }
}

Configure Redis TTL in application.yml:

spring:
  cache:
    redis:
      time-to-live: 30s       # AP: catalog allowed to be stale up to 30 seconds
  data:
    redis:
      host: localhost
      port: 6379

@Transactional on the inventory path enforces ACID semantics; @Cacheable with a TTL on the catalog path provides the BASE/AP behaviour โ€” exactly the per-component consistency partitioning described in the e-commerce real-world example.

For a full deep-dive on Spring Data Redis and CAP-aware Spring architectures, a dedicated follow-up post is planned.


๐Ÿ“š Key Lessons for System Designers

These lessons come from systems that succeeded โ€” and the many that didn't โ€” in production environments.

Consistency is not free. Every guarantee of consistency requires coordination among nodes, which adds latency. Before choosing strong consistency, verify that the use case actually requires it. Many systems choose strong consistency out of habit and pay an unnecessary availability tax.

Design for failure, not for success. The CAP Theorem forces you to explicitly decide what your system does when things go wrong, not just when everything works. Define your system's behavior under partition before you write the first line of code.

Start with known tradeoffs. When building a new service, pick an existing well-understood data store (PostgreSQL for ACID, Cassandra for AP, Redis for low-latency cache) and document why you made that choice. Unknown tradeoffs from custom solutions accumulate as technical debt.

Monitor tail latency, not averages. Average latency hides the worst-case experience. p99 latency โ€” the slowest 1% of requests โ€” is what users actually notice. Instrument your system for percentile-based metrics from day one.


๐Ÿ“Œ TLDR: Summary & Key Takeaways

  • Vertical scaling has a hard ceiling; horizontal scaling requires coordination but is theoretically unlimited.
  • CAP Theorem forces a binary choice during partitions: consistency or availability. Partition tolerance is mandatory.
  • ACID databases guarantee correctness; BASE systems trade consistency for availability and throughput.
  • Real systems use both โ€” CP for critical paths (payments, inventory), AP for non-critical paths (catalog, preferences).
  • Tail latency (p95/p99), not average latency, is the metric that actually determines whether your system feels fast to users.

๐Ÿ“ Practice Quiz

  1. A banking transaction service experiences a network partition. Which CAP choice should it make, and why?

    • A) AP โ€” banking needs to stay available even if data might be stale
    • B) CP โ€” money amounts must be correct; it's better to reject requests than return wrong balances
    • C) CA โ€” banks don't experience partitions so this doesn't apply

    Correct Answer: B โ€” Financial systems must never return or commit incorrect balances. Under a partition, the correct behavior is to reject the request rather than risk split-brain writes to account balances.

  2. Which consistency model does DynamoDB default to, and what is the tradeoff?

    • A) Strong consistency โ€” all reads reflect the latest write, at higher latency and cost
    • B) Eventual consistency โ€” reads may be slightly stale, but throughput is higher and cost is lower
    • C) Serializable isolation โ€” all operations appear to execute sequentially

    Correct Answer: B โ€” DynamoDB defaults to eventually consistent reads; strongly consistent reads are available at higher cost. Choosing between them is a deliberate tradeoff of staleness tolerance vs. performance and cost.

  3. Your product catalog service needs to serve 10M reads/sec. Staleness of up to 2 seconds is acceptable. What is the best choice?

    • A) Single-node PostgreSQL with synchronous replication
    • B) Multi-master NoSQL cluster with eventual consistency and geographically distributed replicas
    • C) Single Redis instance with write-through cache

    Correct Answer: B โ€” At 10M reads/sec, a single-node solution cannot handle the load. A geographically distributed AP cluster scales horizontally and the 2-second staleness tolerance aligns with eventual consistency semantics.



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms