System Design Databases: SQL vs NoSQL and Scaling
The eternal debate: SQL or NoSQL? We break down ACID vs BASE, Sharding vs Replication, and when to use MongoDB vs PostgreSQL.
Abstract Algorithms
TLDR: SQL gives you ACID guarantees and powerful relational queries; NoSQL gives you horizontal scale and flexible schemas. The real decision is not "which is better" โ it is "which trade-offs align with your workload." Understanding replication, sharding, and the CAP theorem helps you pick correctly every time.
๐ The Library vs. the Drop Box: SQL and NoSQL Intuition
Pinterest started on MySQL. By 2013, they had 10 billion pins, 100 million users, and a 4-second page load time. The fix required database sharding (splitting data across multiple servers by user ID), adding read replicas to absorb query load, and eventually migrating hot write paths to Cassandra for fan-out writes. The lesson isn't "use NoSQL" โ it's that the right database choice depends on which scaling levers you'll need to pull in 18 months, not just today.
Think of a SQL database as a meticulously organized library: every book (row) lives on a specific shelf (table) with a fixed classification system (schema). A NoSQL database is more like a book-drop box: anyone can toss a manuscript in any format; you retrieve it quickly by key, but finding relationships between books is harder.
| Feature | SQL (Relational) | NoSQL (Non-Relational) |
| Data model | Tables with rows and columns | Documents, key-value, wide-column, graph |
| Schema | Fixed, enforced | Flexible, often schema-less |
| Consistency | Strong (ACID) | Eventual (BASE) or configurable |
| Scaling | Primarily vertical; horizontal via sharding | Horizontal by design |
| Query language | SQL | Varies (MongoDB query, CQL, etc.) |
| Typical use cases | Finance, ERP, CRM | Social media, IoT, caching |
Neither is universally better. The right choice depends on your consistency requirements, query patterns, and scale.
๐ Key Properties That Define Each Category
Before choosing a database, understand four foundational properties that set SQL and NoSQL apart.
Schema enforcement: SQL requires every row to match the table schema at write time. NoSQL is schema-on-read โ the collection stores any document shape, and your application code interprets the structure at query time. This is powerful for rapid iteration but requires data quality enforcement at the application layer.
Normalization vs. denormalization: SQL normalizes data into related tables to eliminate duplication. NoSQL often denormalizes by embedding related data in a single document so a read fetches everything in one operation, eliminating expensive joins.
Transaction scope: SQL provides multi-statement, multi-table ACID transactions natively. Most NoSQL engines offer single-document atomicity, with optional multi-document transactions at additional latency cost.
Query expressiveness: SQL's declarative language can join, aggregate, and filter across any column combination. NoSQL queries are typically bounded to a single collection and pre-defined indexes.
These four properties determine whether SQL or NoSQL fits your access patterns.
๐ข SQL and NoSQL Engines: What Is Actually Inside
SQL internals
SQL engines parse โ plan โ optimize โ execute.
- B-Tree indexes for range queries.
- Hash indexes for equality lookups.
- Write-Ahead Log (WAL) guarantees durability โ changes are written to the log before being applied to the data files.
NoSQL internals: the LSM-tree write path
Many NoSQL engines (Cassandra, RocksDB) use a Log-Structured Merge-tree:
- Write โ MemTable (in-memory sorted structure).
- MemTable full โ flush to an immutable SSTable on disk.
- Compaction merges overlapping SSTables, discards tombstoned records.
This allows extremely high write throughput at the cost of potential compaction storms during bursts.
Primary data structures by engine
| System | Primary Index | Write Path |
| PostgreSQL | B-Tree (default) | WAL + heap pages |
| MySQL InnoDB | Clustered B-Tree | WAL |
| MongoDB | B-Tree (WiredTiger) | WAL |
| Cassandra | LSM (SSTable) | Commit log + MemTable |
| Redis | Hash table (in-memory) | AOF / RDB snapshot |
โ๏ธ Scaling Strategies: Vertical, Horizontal, Replication, Sharding
graph TD
subgraph Master
A[Write Request] --> B[Write-Ahead Log]
B --> C[Apply to MemTable]
C --> D[Flush to SSTable / Commit]
end
subgraph Replicas
D --> E[Replication Stream]
E --> F[Apply to Replica Log]
F --> G[Update Replica State]
end
G --> H[Read Request from Replica]
This diagram traces the replication write path from a primary database node to its read replicas. A write enters the master's Write-Ahead Log, is applied to the in-memory MemTable, flushed to disk as an SSTable, and then propagated via a replication stream to each replica node, which applies the same changes in order. The key insight is that replicas always trail the primary by some replication lag โ reads from a replica may return slightly stale data, which is acceptable for social feeds but unacceptable for financial transactions.
| Strategy | What it does | When to use | Caveats |
| Vertical (scale-up) | Add CPU/RAM/SSD to one node | Low-moderate traffic, legacy apps | Diminishing returns; single point of failure |
| Horizontal (scale-out) | Add nodes to a cluster | High traffic with sharding | Requires data partitioning |
| Replication | Duplicate data across nodes | Read-heavy workloads, geo-distribution | Replication lag, write conflicts |
| Sharding | Partition data by key across nodes | Write-heavy, massive datasets | Hot-shard risk; cross-shard joins are expensive |
Replication lag is the time difference between master and replica: $$\Delta = t ext{master} - t ext{replica}$$
A small $\Delta$ is acceptable for social feeds; it is catastrophic for financial ledgers.
๐ง Deep Dive: How Write-Ahead Logs and LSM-Trees Protect Data
Both SQL and NoSQL engines guard against data loss using append-only logs before modifying data files. SQL engines use a Write-Ahead Log (WAL): every change is written sequentially to a log before being applied to B-Tree pages, enabling crash recovery. NoSQL LSM-tree engines write first to an in-memory MemTable, flush to immutable SSTables on disk, and periodically compact them.
| Engine type | Log mechanism | Trade-off |
| SQL (WAL) | Sequential log โ apply to B-Tree | Fast recovery; random write amplification |
| NoSQL (LSM) | MemTable โ SSTable โ compaction | High write throughput; compaction pauses under load |
๐ Decision Flow: Selecting the Right Database
Use this flowchart as a first-pass guide when choosing a database for a new service. Most production systems combine multiple databases โ for example, PostgreSQL for transactional data alongside Redis for caching.
flowchart TD
A[Define your workload] --> B{Need ACID transactions?}
B -->|Yes| C[SQL: PostgreSQL / MySQL]
B -->|No| D{Write-heavy at massive scale?}
D -->|Yes| E[Wide-column NoSQL: Cassandra]
D -->|No| F{Schema changes frequently?}
F -->|Yes| G[Document NoSQL: MongoDB]
F -->|No| H{Ultra-fast key lookups needed?}
H -->|Yes| I[Key-Value Store: Redis / DynamoDB]
H -->|No| C
Start here and refine with your specific read/write ratio and consistency requirements.
๐ Read-Your-Writes Consistency in Replicated DB
sequenceDiagram
participant App as Application
participant P as Primary DB
participant R as Read Replica
App->>P: WRITE email = "new@example.com"
P-->>App: OK (write committed)
App->>R: READ email (routed to replica)
R-->>App: "old@example.com" (replication lag)
Note over App,R: Stale read violates RYW guarantee
App->>P: READ email (route to primary)
P-->>App: "new@example.com" (consistent)
Note over P,R: Replica catches up asynchronously
R-->>App: "new@example.com" (lag resolved)
This sequence diagram illustrates the read-your-writes consistency hazard in a replicated database. After the application writes a new email address to the primary, immediately routing the confirming read to a replica returns stale data because replication is asynchronous. The takeaway: for strict read-your-writes guarantees, route the confirming read back to the primary, or use synchronous replication โ accepting that writes will block until the replica acknowledges the change.
๐งช Choosing Your Database: A Worked Example
This example models an orders entity in an e-commerce system side-by-side in PostgreSQL and MongoDB to make the SQL-vs-NoSQL trade-off concrete and visible in real code. The orders entity is the canonical choice because it sits at the boundary: it demands multi-table joins and ACID guarantees for accurate reporting (SQL's strengths) yet benefits from document embedding when every read needs the complete order with all its items (NoSQL's strength). As you read, focus on how the SQL version normalises items into a separate table โ enabling flexible queries at the cost of joins โ while the MongoDB version embeds items directly in the order document, enabling a single-read fetch at the cost of query flexibility.
SQL representation (PostgreSQL):
-- Normalized: Items live in a separate table
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01';
PostgreSQL's query planner uses cost estimation: $$ ext{Cost} = \sum_{i=1}^{k} lpha_i \cdot ext{IO}_i + eta_i \cdot ext{CPU}_i$$
to choose between Hash Join and Merge Join based on row estimates.
NoSQL representation (MongoDB):
{
"order_id": 101,
"customer_id": 42,
"items": [{"sku": "A1", "qty": 2}, {"sku": "B3", "qty": 1}],
"total": 59.97,
"created_at": "2023-11-01T09:15:00Z"
}
The entire document is one read โ no join required. This is why MongoDB excels for catalog and content workloads where you always read the whole entity.
๐ Real-World Applications by Workload Type
| Workload | Best fit | Why |
| Banking ledger / ERP | SQL (PostgreSQL, Oracle) | ACID โ money cannot disappear mid-transaction |
| Flexible or evolving schema (MVP) | Document NoSQL (MongoDB) | Schema-less, fast iteration |
| Write-heavy telemetry at scale | Wide-column NoSQL (Cassandra) | Linear write scalability, tunable consistency |
| Ultra-fast read cache | Key-value (Redis, DynamoDB) | O(1) hash lookups |
| Relationship-heavy data | Graph (Neo4j) | Native edge traversal |
| Complex analytical queries | Columnar SQL (ClickHouse, Redshift) | Vectorized execution, column pruning |
โ๏ธ Trade-offs & Failure Modes: Trade-offs and Failure Modes
| Dimension | SQL | NoSQL |
| Correctness vs availability | Prioritizes correctness (CP in CAP) | Prioritizes availability (AP), tolerates stale reads |
| Performance | Strong consistency adds lock overhead | Cheap commodity nodes; extra replicas for durability |
| Operational complexity | Schema migrations can be painful | Easier schema iteration; sharding and compaction add ops complexity |
| Failure modes | Deadlocks, lock timeouts, single-master failover | Replication lag, split-brain, hot-shard overload |
| Mitigation | Connection pooling, read replicas, Patroni | Consistent hashing, quorum reads, anti-entropy repair |
Split-brain scenario (master-master setup):
- Network partition isolates two masters.
- Both accept writes โ divergent data.
- When the partition heals, conflict resolution (last-write-wins, custom merge) must reconcile.
Mitigation: Use a consensus protocol (Raft via etcd, CockroachDB) to elect a single leader and prevent split-brain.
๐งญ Decision Guide: SQL vs. NoSQL at a Glance
| Situation | Recommendation | Reason |
| Strong transactional guarantees (banking) | SQL (PostgreSQL, MySQL) | ACID ensures money never vanishes |
| Rapidly evolving schema (MVP) | Document NoSQL (MongoDB) | Flexible JSON, no migrations |
| Write-heavy telemetry at massive scale | Wide-column NoSQL (Cassandra) | Linear write scalability |
| Complex analytical queries | Columnar SQL (Redshift, ClickHouse) | Vectorized execution |
| Social graph or recommendations | Graph DB (Neo4j) | Native relationship traversal |
๐ฏ What to Learn Next
- System Design Core Concepts: Scalability, CAP, and Consistency
- System Design Networking: DNS, CDNs, and Load Balancers
- The Ultimate Guide to Acing the System Design Interview
๐ ๏ธ Spring Data JPA: ACID-Compliant SQL Access in Java
Spring Data JPA is an open-source Spring module that provides repository interfaces, @Transactional semantics, and query derivation on top of JPA/Hibernate โ letting you implement the SQL ACID patterns from this post (normalized schema, multi-table transactions, connection pooling) with minimal boilerplate.
The orders-entity example from this post maps directly to a JPA entity with a @OneToMany relationship, executed as a single ACID transaction via @Transactional:
// dependencies: spring-boot-starter-data-jpa, postgresql driver, HikariCP (auto-configured)
import jakarta.persistence.*;
import org.springframework.data.jpa.repository.*;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.math.BigDecimal;
import java.time.LocalDateTime;
import java.util.List;
@Entity @Table(name = "orders")
public class Order {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private Long customerId;
private BigDecimal total;
private LocalDateTime createdAt;
@OneToMany(mappedBy = "order", cascade = CascadeType.ALL, fetch = FetchType.LAZY)
private List<OrderItem> items;
// getters/setters omitted
}
// Spring Data Repository โ query derivation from method names
public interface OrderRepository extends JpaRepository<Order, Long> {
List<Order> findByCustomerIdAndCreatedAtAfter(Long customerId, LocalDateTime since);
// Equivalent to the SQL JOIN from the worked example
@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.customerId = :cid")
List<Order> findOrdersWithItems(@Param("cid") Long customerId);
}
@Service
public class OrderService {
private final OrderRepository orderRepo;
private final InventoryRepository inventoryRepo;
// ACID: both the order insert AND inventory decrement succeed or both roll back
@Transactional
public Order placeOrder(Long customerId, List<OrderItemDto> items) {
items.forEach(item -> inventoryRepo.decrementStock(item.sku(), item.qty()));
Order order = new Order(customerId, items.stream()
.map(i -> i.price().multiply(BigDecimal.valueOf(i.qty())))
.reduce(BigDecimal.ZERO, BigDecimal::add));
return orderRepo.save(order);
}
}
Spring's @Transactional wraps the method in a database transaction โ if decrementStock throws a RuntimeException, the entire operation rolls back, guaranteeing the "no partial writes" ACID property that prevents overselling.
For a full deep-dive on Spring Data JPA, a dedicated follow-up post is planned.
๐ ๏ธ Spring Data MongoDB: Schema-Flexible NoSQL Access in Java
Spring Data MongoDB is an open-source Spring module that maps Java objects to MongoDB documents via @Document annotations and provides MongoRepository interfaces with the same query-derivation pattern as Spring Data JPA โ making it straightforward to implement the denormalized document model from this post in a production Java service.
// dependency: spring-boot-starter-data-mongodb
import org.springframework.data.annotation.Id;
import org.springframework.data.mongodb.core.mapping.Document;
import org.springframework.data.mongodb.repository.MongoRepository;
import java.math.BigDecimal;
import java.time.Instant;
import java.util.List;
// Denormalized document โ items embedded, single read (no JOIN)
@Document(collection = "orders")
public record Order(
@Id String orderId,
Long customerId,
List<OrderItem> items, // embedded subdocuments โ no JOIN required at read time
BigDecimal total,
Instant createdAt
) {}
public record OrderItem(String sku, int qty, BigDecimal unitPrice) {}
// Schema evolution: add promoCode field to the record above โ zero migration required.
// MongoDB writes the new field automatically on next save; no ALTER TABLE, no downtime.
public interface OrderMongoRepository extends MongoRepository<Order, String> {
List<Order> findByCustomerIdAndCreatedAtAfter(Long customerId, Instant since);
}
Configure in application.yml:
spring:
data:
mongodb:
uri: mongodb://localhost:27017/ecommerce
auto-index-creation: true # creates @Indexed fields automatically in dev
MongoRepository.save() maps to an upsert โ it inserts if the document is new or replaces it if it exists โ implementing the "schema-on-write at application layer" pattern described in the NoSQL internals section.
For a full deep-dive on Spring Data MongoDB, a dedicated follow-up post is planned.
๐ Key Lessons from the Field
Teams that have scaled databases in high-traffic production systems converge on the same lessons.
Start relational: most early-stage systems benefit from PostgreSQL's consistency guarantees and mature ecosystem. Premature optimization to NoSQL creates operational complexity before the need arises.
Introduce NoSQL at a specific bottleneck: the typical trigger is write throughput beyond what a relational primary can sustain, or a schema that changes faster than migrations can safely track.
Replication is not a backup strategy: replicas protect against node failure, not accidental deletes or corrupted writes. Maintain separate point-in-time backups.
Monitor replication lag: in read-replica setups, queries routed to lagging replicas return stale data. Track seconds_behind_master (MySQL) or pg_replication_lag (PostgreSQL) and alert on anomalies above one second.
Cache early: before reaching for a NoSQL engine, add a Redis cache layer in front of your SQL database. Many workloads that appear to need NoSQL are actually just read-heavy and benefit from caching alone.
๐ TLDR: Summary & Key Takeaways
- SQL databases provide ACID guarantees and powerful relational queries; NoSQL databases provide horizontal scale and schema flexibility.
- The CAP theorem forces every distributed system to trade off between consistency, availability, and partition tolerance โ pick two.
- LSM-trees give NoSQL engines high write throughput but can cause compaction storms under heavy load.
- Hot-shard risk and cross-shard joins are the main operational hazards in sharded systems.
- For most new systems: start with SQL, add a caching layer (Redis) for hot reads, and only introduce NoSQL when a specific workload requires it.
๐ Practice Quiz
Q1: What does ACID stand for and why does it matter for banking?
- A) Atomicity, Consistency, Isolation, Durability โ ensures partial transactions never corrupt data
- B) Availability, Consistency, Integrity, Durability โ ensures replicas stay in sync
- C) Atomicity, Consistency, Indexing, Durability โ speeds up queries
Correct Answer: A
Q2: Why do LSM-tree databases achieve higher write throughput than B-tree databases?
- A) They skip durability guarantees entirely
- B) They batch writes to an in-memory MemTable before flushing sequentially to disk
- C) They store data in RAM permanently
Correct Answer: B
Q3: What is "split-brain" in a master-master replication setup?
- A) When the primary node runs out of memory
- B) When a network partition causes two masters to accept conflicting writes independently
- C) When sharding keys overlap
Correct Answer: B
๐ Related Posts
- System Design Core Concepts: Scalability, CAP, and Consistency
- System Design Protocols: REST, RPC, and TCP/UDP
- The Ultimate Guide to Acing the System Design Interview

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions โ with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy โ but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose โ range, hash, consistent hashing, or directory โ determines whether range queries stay ch...
