System Design Databases: SQL vs NoSQL and Scaling
The eternal debate: SQL or NoSQL? We break down ACID vs BASE, Sharding vs Replication, and when to use MongoDB vs PostgreSQL.
Abstract Algorithms
Intermediate
For developers with some experience. Builds on fundamentals.
Estimated read time: 14 min
AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: SQL gives you ACID guarantees and powerful relational queries; NoSQL gives you horizontal scale and flexible schemas. The real decision is not "which is better" β it is "which trade-offs align with your workload." Understanding replication, sharding, and the CAP theorem helps you pick correctly every time.
π The Library vs. the Drop Box: SQL and NoSQL Intuition
Pinterest started on MySQL. By 2013, they had 10 billion pins, 100 million users, and a 4-second page load time. The fix required database sharding (splitting data across multiple servers by user ID), adding read replicas to absorb query load, and eventually migrating hot write paths to Cassandra for fan-out writes. The lesson isn't "use NoSQL" β it's that the right database choice depends on which scaling levers you'll need to pull in 18 months, not just today.
Think of a SQL database as a meticulously organized library: every book (row) lives on a specific shelf (table) with a fixed classification system (schema). A NoSQL database is more like a book-drop box: anyone can toss a manuscript in any format; you retrieve it quickly by key, but finding relationships between books is harder.
| Feature | SQL (Relational) | NoSQL (Non-Relational) |
| Data model | Tables with rows and columns | Documents, key-value, wide-column, graph |
| Schema | Fixed, enforced | Flexible, often schema-less |
| Consistency | Strong (ACID) | Eventual (BASE) or configurable |
| Scaling | Primarily vertical; horizontal via sharding | Horizontal by design |
| Query language | SQL | Varies (MongoDB query, CQL, etc.) |
| Typical use cases | Finance, ERP, CRM | Social media, IoT, caching |
Neither is universally better. The right choice depends on your consistency requirements, query patterns, and scale.
π Key Properties That Define Each Category
Before choosing a database, understand four foundational properties that set SQL and NoSQL apart.
Schema enforcement: SQL requires every row to match the table schema at write time. NoSQL is schema-on-read β the collection stores any document shape, and your application code interprets the structure at query time. This is powerful for rapid iteration but requires data quality enforcement at the application layer.
Normalization vs. denormalization: SQL normalizes data into related tables to eliminate duplication. NoSQL often denormalizes by embedding related data in a single document so a read fetches everything in one operation, eliminating expensive joins.
Transaction scope: SQL provides multi-statement, multi-table ACID transactions natively. Most NoSQL engines offer single-document atomicity, with optional multi-document transactions at additional latency cost.
Query expressiveness: SQL's declarative language can join, aggregate, and filter across any column combination. NoSQL queries are typically bounded to a single collection and pre-defined indexes.
These four properties determine whether SQL or NoSQL fits your access patterns.
π’ SQL and NoSQL Engines: What Is Actually Inside
SQL internals
SQL engines parse β plan β optimize β execute.
B-Tree indexes for range queries.
Hash indexes for equality lookups.
Write-Ahead Log (WAL) guarantees durability β changes are written to the log before being applied to the data files.
NoSQL internals: the LSM-tree write path
Many NoSQL engines (Cassandra, RocksDB) use a Log-Structured Merge-tree:
Write β MemTable (in-memory sorted structure).
MemTable full β flush to an immutable SSTable on disk.
Compaction merges overlapping SSTables, discards tombstoned records.
This allows extremely high write throughput at the cost of potential compaction storms during bursts.
Primary data structures by engine
| System | Primary Index | Write Path |
| PostgreSQL | B-Tree (default) | WAL + heap pages |
| MySQL InnoDB | Clustered B-Tree | WAL |
| MongoDB | B-Tree (WiredTiger) | WAL |
| Cassandra | LSM (SSTable) | Commit log + MemTable |
| Redis | Hash table (in-memory) | AOF / RDB snapshot |
βοΈ Scaling Strategies: Vertical, Horizontal, Replication, Sharding
graph TD
subgraph Master
A[Write Request] --> B[Write-Ahead Log]
B --> C[Apply to MemTable]
C --> D[Flush to SSTable / Commit]
end
subgraph Replicas
D --> E[Replication Stream]
E --> F[Apply to Replica Log]
F --> G[Update Replica State]
end
G --> H[Read Request from Replica]
This diagram traces the replication write path from a primary database node to its read replicas. A write enters the master's Write-Ahead Log, is applied to the in-memory MemTable, flushed to disk as an SSTable, and then propagated via a replication stream to each replica node, which applies the same changes in order. The key insight is that replicas always trail the primary by some replication lag β reads from a replica may return slightly stale data, which is acceptable for social feeds but unacceptable for financial transactions.
| Strategy | What it does | When to use | Caveats |
| Vertical (scale-up) | Add CPU/RAM/SSD to one node | Low-moderate traffic, legacy apps | Diminishing returns; single point of failure |
| Horizontal (scale-out) | Add nodes to a cluster | High traffic with sharding | Requires data partitioning |
| Replication | Duplicate data across nodes | Read-heavy workloads, geo-distribution | Replication lag, write conflicts |
| Sharding | Partition data by key across nodes | Write-heavy, massive datasets | Hot-shard risk; cross-shard joins are expensive |
Replication lag is the time difference between master and replica: Ξ=textmasterβtextreplica
A small $\Delta$ is acceptable for social feeds; it is catastrophic for financial ledgers.
π§ Deep Dive: How Write-Ahead Logs and LSM-Trees Protect Data
Both SQL and NoSQL engines guard against data loss using append-only logs before modifying data files. SQL engines use a Write-Ahead Log (WAL): every change is written sequentially to a log before being applied to B-Tree pages, enabling crash recovery. NoSQL LSM-tree engines write first to an in-memory MemTable, flush to immutable SSTables on disk, and periodically compact them.
| Engine type | Log mechanism | Trade-off |
| SQL (WAL) | Sequential log β apply to B-Tree | Fast recovery; random write amplification |
| NoSQL (LSM) | MemTable β SSTable β compaction | High write throughput; compaction pauses under load |
π Decision Flow: Selecting the Right Database
Use this flowchart as a first-pass guide when choosing a database for a new service. Most production systems combine multiple databases β for example, PostgreSQL for transactional data alongside Redis for caching.
Start here and refine with your specific read/write ratio and consistency requirements.
π Read-Your-Writes Consistency in Replicated DB
This sequence diagram illustrates the read-your-writes consistency hazard in a replicated database. After the application writes a new email address to the primary, immediately routing the confirming read to a replica returns stale data because replication is asynchronous. The takeaway: for strict read-your-writes guarantees, route the confirming read back to the primary, or use synchronous replication β accepting that writes will block until the replica acknowledges the change.
π§ͺ Choosing Your Database: A Worked Example
This example models an orders entity in an e-commerce system side-by-side in PostgreSQL and MongoDB to make the SQL-vs-NoSQL trade-off concrete and visible in real code. The orders entity is the canonical choice because it sits at the boundary: it demands multi-table joins and ACID guarantees for accurate reporting (SQL's strengths) yet benefits from document embedding when every read needs the complete order with all its items (NoSQL's strength). As you read, focus on how the SQL version normalises items into a separate table β enabling flexible queries at the cost of joins β while the MongoDB version embeds items directly in the order document, enabling a single-read fetch at the cost of query flexibility.
SQL representation (PostgreSQL):
In the normalized SQL model, customer names and order totals live in separate tables linked by a foreign key on user_id. Fetching a report of recent orders requires joining the users and orders tables and filtering by the created_at column. PostgreSQL's query planner uses cost estimation: Math input errorMath input error
to choose between Hash Join and Merge Join based on row estimates.
NoSQL representation (MongoDB):
In MongoDB, a single order document embeds the customer reference, an array of line-item sub-documents (each containing SKU and quantity), the order total, and the creation timestamp β all in one nested structure. Fetching a complete order requires a single document read with no joins. This is why document databases excel for catalog and content workloads where the entire entity is always read together. The trade-off is that querying across items (for example, "all orders containing SKU A1") requires an index on the nested field and cannot leverage relational query optimizations.
π Real-World Applications by Workload Type
| Workload | Best fit | Why |
| Banking ledger / ERP | SQL (PostgreSQL, Oracle) | ACID β money cannot disappear mid-transaction |
| Flexible or evolving schema (MVP) | Document NoSQL (MongoDB) | Schema-less, fast iteration |
| Write-heavy telemetry at scale | Wide-column NoSQL (Cassandra) | Linear write scalability, tunable consistency |
| Ultra-fast read cache | Key-value (Redis, DynamoDB) | O(1) hash lookups |
| Relationship-heavy data | Graph (Neo4j) | Native edge traversal |
| Complex analytical queries | Columnar SQL (ClickHouse, Redshift) | Vectorized execution, column pruning |
βοΈ Trade-offs & Failure Modes: Trade-offs and Failure Modes
| Dimension | SQL | NoSQL |
| Correctness vs availability | Prioritizes correctness (CP in CAP) | Prioritizes availability (AP), tolerates stale reads |
| Performance | Strong consistency adds lock overhead | Cheap commodity nodes; extra replicas for durability |
| Operational complexity | Schema migrations can be painful | Easier schema iteration; sharding and compaction add ops complexity |
| Failure modes | Deadlocks, lock timeouts, single-master failover | Replication lag, split-brain, hot-shard overload |
| Mitigation | Connection pooling, read replicas, Patroni | Consistent hashing, quorum reads, anti-entropy repair |
Split-brain scenario (master-master setup):
Network partition isolates two masters.
Both accept writes β divergent data.
When the partition heals, conflict resolution (last-write-wins, custom merge) must reconcile.
Mitigation: Use a consensus protocol (Raft via etcd, CockroachDB) to elect a single leader and prevent split-brain.
π§ Decision Guide: SQL vs. NoSQL at a Glance
| Situation | Recommendation | Reason |
| Strong transactional guarantees (banking) | SQL (PostgreSQL, MySQL) | ACID ensures money never vanishes |
| Rapidly evolving schema (MVP) | Document NoSQL (MongoDB) | Flexible JSON, no migrations |
| Write-heavy telemetry at massive scale | Wide-column NoSQL (Cassandra) | Linear write scalability |
| Complex analytical queries | Columnar SQL (Redshift, ClickHouse) | Vectorized execution |
| Social graph or recommendations | Graph DB (Neo4j) | Native relationship traversal |
π― What to Learn Next
π οΈ Implementing ACID and BASE Patterns in Practice
SQL databases enforce ACID through transactional boundaries: every related write β inserting an order and decrementing inventory β is wrapped in a single transaction that either fully commits or fully rolls back. If any step fails, the database returns to its last consistent state automatically. Connection pooling (managed by libraries like HikariCP) ensures that multiple application threads share a bounded set of database connections without exhausting resources.
Document databases enforce atomicity at the document level rather than across tables. An operation that updates a single order document β including all its embedded items β is atomic by default. For operations spanning multiple documents, most modern document databases offer multi-document transactions, though these carry additional latency compared to the single-document case.
The key design decision is aligning your transaction boundary with your consistency requirement: use SQL when a business operation must atomically update data across multiple entities; use a document database when each entity's data is self-contained and can be read or written in a single round trip.
π οΈ Schema Evolution: SQL vs. NoSQL in Practice
One of the most tangible day-to-day differences between SQL and NoSQL is how they handle schema changes as product requirements evolve.
SQL schema evolution requires explicit migration scripts. Adding a column means running an ALTER TABLE statement, which in large tables can lock rows or require online migration tooling (such as pt-online-schema-change for MySQL or pg_repack for PostgreSQL) to avoid downtime. The upside is that the schema is the source of truth: every row is guaranteed to have the defined structure.
Document NoSQL schema evolution is additive by default. Adding a new field to the document model requires no migration β the next write simply includes the new field, while existing documents without the field are handled gracefully in application code. The trade-off is that schema consistency is entirely the application's responsibility: two versions of the application can exist simultaneously with different document shapes, requiring careful versioning.
| Scenario | SQL approach | Document NoSQL approach |
| Add optional field | ALTER TABLE ADD COLUMN col DEFAULT NULL | Write new field on next save; old docs lack field |
| Rename field | Migration: copy data, drop old column | Dual-read logic during transition; backfill async |
| Remove field | ALTER TABLE DROP COLUMN | Stop writing; old docs retain field (garbage) |
| Change data type | Migration with data cast | Application-layer coercion at read time |
The right choice depends on your team's migration discipline and how frequently your schema changes. Teams with high iteration velocity often prefer document databases for their schema flexibility; teams with strict compliance or audit requirements prefer SQL for its enforced structure.
π Key Lessons from the Field
Teams that have scaled databases in high-traffic production systems converge on the same lessons.
Start relational: most early-stage systems benefit from PostgreSQL's consistency guarantees and mature ecosystem. Premature optimization to NoSQL creates operational complexity before the need arises.
Introduce NoSQL at a specific bottleneck: the typical trigger is write throughput beyond what a relational primary can sustain, or a schema that changes faster than migrations can safely track.
Replication is not a backup strategy: replicas protect against node failure, not accidental deletes or corrupted writes. Maintain separate point-in-time backups.
Monitor replication lag: in read-replica setups, queries routed to lagging replicas return stale data. Track seconds_behind_master (MySQL) or pg_replication_lag (PostgreSQL) and alert on anomalies above one second.
Cache early: before reaching for a NoSQL engine, add a Redis cache layer in front of your SQL database. Many workloads that appear to need NoSQL are actually just read-heavy and benefit from caching alone.
π TLDR: Summary & Key Takeaways
SQL databases provide ACID guarantees and powerful relational queries; NoSQL databases provide horizontal scale and schema flexibility.
The CAP theorem forces every distributed system to trade off between consistency, availability, and partition tolerance β pick two.
LSM-trees give NoSQL engines high write throughput but can cause compaction storms under heavy load.
Hot-shard risk and cross-shard joins are the main operational hazards in sharded systems.
For most new systems: start with SQL, add a caching layer (Redis) for hot reads, and only introduce NoSQL when a specific workload requires it.
π Related Posts
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Stale Reads and Cascading Failures in Distributed Systems
TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable β stale reads...
Clock Skew and Causality Violations: Why Distributed Clocks Lie
TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions β but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...
Split Brain Explained: When Two Nodes Both Think They Are Leader
TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader β each accepting writes the other never sees. Prevent it with quorum consensus (at least βN/2β+1 nodes must agree before leadership is g...
NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data
TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node β virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...
