All Posts

System Design Databases: SQL vs NoSQL and Scaling

Abstract AlgorithmsAbstract Algorithms
··6 min read

TL;DR

Maximize efficiency by selecting the right database: SQL vs. NoSQL insights and scaling tips.

Cover Image for System Design Databases: SQL vs NoSQL and Scaling

TLDR: Choosing the wrong database is the most expensive mistake you can make. If you need strict consistency (Banking), use SQL. If you need massive scale (Instagram Likes), use NoSQL. This guide breaks down the decision framework, the CAP theorem, and how to scale when one server isn't enough.


SQL vs. NoSQL (The "No-Jargon" Explanation)

Imagine you are organizing a Library.

  • SQL (Relational): This is the Dewey Decimal System. Every book has a specific place. You have a strict catalog (Schema). If you want to add a "DVD", but your catalog only has fields for "Pages" and "Author", you have to redesign the whole catalog. It's rigid but perfectly organized.

  • NoSQL (Non-Relational): This is a Giant Bucket. You can throw a Book, a DVD, or a T-shirt in there. You just stick a label on it (Key) and throw it in. It's messy, but you can store anything instantly without asking for permission.


1. The Decision Framework: How to Choose?

Don't just pick what's trendy. Use this flowchart logic.

Step 1: Do you need ACID?

  • ACID (Atomicity, Consistency, Isolation, Durability): If a transaction fails, nothing changes.

  • Scenario: A Bank Transfer. If money leaves Alice's account but the server crashes before reaching Bob's, the money must reappear in Alice's account.

  • Choice: SQL (Relational).

Step 2: Is your data structure stable?

  • Scenario: An E-commerce Order. An order always has a Customer, Items, and Total.

  • Choice: SQL.

Step 3: Do you need massive write throughput or flexible data?

  • Scenario: Storing logs from 1 million IoT sensors every second. The data format might change tomorrow.

  • Choice: NoSQL.


2. Deep Dive: The NoSQL Landscape

"NoSQL" isn't one thing. It's a family of 4 distinct tools.

TypeBest ForExampleConsistency PatternCAP Theorem Position
Key-ValueCaching, User Sessions, Shopping CartsRedis, DynamoDBEventual (usually)AP (Available, Partition Tolerant)
DocumentCMS, Catalogs, User ProfilesMongoDBTunable (Strong to Eventual)CP (Consistent, Partition Tolerant)
Column-FamilyAnalytics, Time-Series, Massive WritesCassandra, HBaseTunable (Eventual default)AP (Available, Partition Tolerant)
GraphSocial Networks, Recommendation EnginesNeo4jStrong (ACID within graph)CA (Consistent, Available)

The CAP Theorem (The Trade-off)

You can only have 2 out of 3:

  1. Consistency (C): Every read receives the most recent write or an error.

  2. Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.

  3. Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped (network failure).

  • SQL (MySQL/Postgres): Usually CA (Consistent + Available). If the network breaks, the DB stops accepting writes to prevent data corruption.

  • Cassandra/DynamoDB: Usually AP (Available + Partition Tolerant). If the network breaks, it keeps accepting writes, but Alice and Bob might see different data for a few seconds (Eventual Consistency).


3. Scaling: When One Server Isn't Enough

A. Vertical Scaling (The "Lazy" Way)

  • What: Buy a bigger computer (More RAM, faster CPU).

  • Limit: There is a physical limit to how big a single machine can be.

B. Horizontal Scaling (The "Hard" Way)

  • What: Add more computers.

Technique 1: Replication (Read Scaling)

  • Concept: One Master (Writes), Multiple Slaves (Reads).

  • Flow:

    1. App writes to Master.

    2. Master copies data to Slaves.

    3. App reads from Slaves.

  • Pros: Great for "Read-Heavy" apps (Twitter, News sites).

  • Cons: Replication Lag. You might post a comment, refresh the page, and not see it yet because the Slave hasn't updated.

Technique 2: Sharding (Write Scaling)

  • Concept: Split the data across multiple servers based on a key.

  • Toy Example: User Database

User IDNameShard (Server)
1AliceServer A (IDs 1-100)
101BobServer B (IDs 101-200)
205CharlieServer C (IDs 201-300)
  • Pros: Infinite write scaling.

  • Cons:

    • Joins are impossible: You can't join a table on Server A with a table on Server B easily.

    • Hotspots: If Justin Bieber (User 1) gets 1 million comments, Server A melts while Server B is idle.


Deep Dive: Consistent Hashing (How to Shard Properly)

If we just use User_ID % Number_Of_Servers, adding a new server breaks everything (all keys move). We use Consistent Hashing.

The Concept: Imagine a Ring (0 to 360 degrees).

  1. Place Servers at random points on the ring.

  2. Place Data (Keys) on the ring.

  3. Rule: A Key belongs to the first Server it finds moving clockwise.

Toy Scenario:

  • Ring Size: 0-100.

  • Servers:

    • Server A at position 20.

    • Server B at position 60.

    • Server C at position 90.

Data Placement:

Key (Hash)PositionAssigned Server (Clockwise)
User 110Server A (20)
User 240Server B (60)
User 380Server C (90)
User 495Server A (Wraps around to 20)

The Magic: If we add Server D at position 50:

  • Only keys between 20 and 50 (User 2) move to Server D.

  • User 1, User 3, and User 4 stay put.

  • Result: Minimal data movement when scaling up/down.


Summary & Key Takeaways

  • SQL: Use for Financial/Critical data. Strict Schema. ACID.

  • NoSQL: Use for Big Data/High Throughput. Flexible Schema. BASE.

  • CAP Theorem: You can't have it all. Choose Consistency (CP) or Availability (AP).

  • Sharding: The only way to scale writes indefinitely, but it kills Joins.


Practice Quiz: Test Your Design Skills

  1. Scenario: You are building a "Likes" counter for a viral video platform. It needs to handle 100,000 writes per second. It's okay if the count is off by a few numbers for a second.

    • A) PostgreSQL (SQL)

    • B) Redis (Key-Value NoSQL)

    • C) Neo4j (Graph)

  2. Scenario: You are building a Billing System for a hospital. Every transaction must be accurate and permanent immediately.

    • A) MongoDB (Document)

    • B) Cassandra (Column-Family)

    • C) PostgreSQL (SQL)

(Answers: 1-B, 2-C)


What's Next?

We have the data, but reading from a disk is slow. In the next post, we explore Caching (RAM is King) and Asynchronism (Message Queues) to make our system lightning fast.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms