Category
databases
15 articles across 9 sub-topics
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose — range, hash, consistent hashing, or directory — determines whether range queries stay ch...
Key Terms in Distributed Systems: The Definitive Glossary
TLDR: Distributed systems vocabulary is precise for a reason. Mixing up read skew and write skew costs you an interview. Confusing Snapshot Isolation with Serializable costs you a production outage. This glossary organises every critical term into co...
System Design Sharding Strategy: Choosing Keys, Avoiding Hot Spots, and Resharding Safely
TLDR: Sharding means splitting one logical dataset across multiple physical databases so no single node carries all the data and traffic. The hard part is not adding more nodes. The hard part is choosing a shard key that keeps data balanced and queri...
System Design Replication and Failover: Keep Services Alive When a Primary Dies
TLDR: Replication means keeping multiple copies of your data so the system can survive machine, process, or availability-zone failures. Failover is the coordinated act of promoting a healthy replica, rerouting traffic, and recovering without corrupti...
Elasticsearch vs Time-Series DB: Key Differences Explained
TLDR: Elasticsearch is built for search — full-text log queries, fuzzy matching, and relevance ranking via an inverted index. InfluxDB and Prometheus are built for metrics — numeric time series with aggressive compression. Picking the wrong one waste...
Change Data Capture Pattern: Log-Based Data Movement Without Full Reloads
TLDR: Change data capture moves committed database changes into downstream systems without full reloads. It is most useful when freshness matters, replay matters, and the source database must remain the system of record. TLDR: CDC becomes production-...
Understanding Consistency Patterns: An In-Depth Analysis
TLDR TLDR: Consistency is about whether all nodes in a distributed system show the same data at the same time. Strong consistency gives correctness but costs latency. Eventual consistency gives speed but requires tolerance for briefly stale reads. C...
Data Warehouse vs Data Lake vs Data Lakehouse: Which One to Choose?
TLDR: Warehouse = structured, clean data for BI and SQL dashboards (Snowflake, BigQuery). Lake = raw, messy data for ML and data science (S3, HDFS). Lakehouse = open table formats (Delta Lake, Iceberg) that bring SQL performance to raw storage — the ...
Partitioning Approaches in SQL and NoSQL: Horizontal, Vertical, Range, Hash, and List Partitioning
TLDR: Partitioning splits one logical table into smaller physical pieces called partitions. The database planner skips irrelevant partitions entirely — turning a 30-second full-table scan into a 200ms single-partition read. Range partitioning is best...
Isolation Levels in Databases: Read Committed, Repeatable Read, Snapshot, and Serializable Explained
TLDR: Isolation levels control which concurrency anomalies a transaction can see. Read Committed (PostgreSQL and Oracle's default) prevents dirty reads but still silently allows non-repeatable reads, write skew, and lost updates. Repeatable Read adds...
Database Anomalies: How SQL and NoSQL Handle Dirty Reads, Phantom Reads, and Write Skew
TLDR: Database anomalies are the predictable side-effects of concurrent transactions — dirty reads, phantom reads, write skew, and lost updates. SQL databases use MVCC and isolation levels to prevent them; PostgreSQL's Serializable Snapshot Isolation...
Probabilistic Data Structures Explained: Bloom Filters, HyperLogLog, and Count-Min Sketch
TLDR: Probabilistic data structures — Bloom Filters, Count-Min Sketch, HyperLogLog, and Cuckoo Filters — trade a small, bounded probability of being wrong for orders-of-magnitude better memory efficiency and O(1) speed. Bloom filters answer "definite...
How CDC Works Across Databases: PostgreSQL, MySQL, MongoDB, and Beyond
A data engineering team at a fintech company built what they believed was a robust Change Data Capture pipeline: three source databases (PostgreSQL, MongoDB, and Cassandra), Debezium connectors wired to Kafka, and a downstream data warehouse receivin...
BASE Theorem Explained: How it Stands Against ACID
TLDR TLDR: ACID (Atomicity, Consistency, Isolation, Durability) is the gold standard for banking. BASE (Basically Available, Soft state, Eventual consistency) is the standard for social media. BASE intentionally sacrifices instant accuracy in exchan...
