Abstract Algorithms

Articles

Engineering deep dives for understanding systems.

Distributed systems, AI infrastructure, data structures, and system design explained with calm, production-minded depth.

Explore the archive

Continue exploring

Pick up a systems thread

LLM Skills vs Tools: The Missing Layer in Agent Design

TLDR: A tool is a single callable capability (search, SQL, calculator). A skill is a reusable mini-workflow that coordinates multiple tool calls with policy, guardrails, retries, and output structure.

Little's Law: The Secret Formula for System Performance

TLDR: Little's Law (\(L = \lambda W\)) connects three metrics every system designer measures: \(L\) = concurrent requests in flight, \(\lambda\) = throughput (RPS), \(W\) = average response time. If l

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF

TLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer tr

Concept collections

Curated ways into the archive

System Design

OWASP Credential Stuffing Key Terms Explained with Practical Examples NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data HyperLogLog Explained: Counting Billions of Unique Items with 12 KB

Backend Systems

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data SQL Partitioning: Range, Hash, List, and Composite Strategies Explained Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

AI Infrastructure

Data Lineage Explained: Tracking Data Flow Across Your Organization OWASP Credential Stuffing Key Terms Explained with Practical Examples Softmax Function Explained: From Raw Scores to Probabilities

Distributed Systems

HyperLogLog Explained: Counting Billions of Unique Items with 12 KB Count-Min Sketch Explained: Frequency Estimation at Streaming Scale Clock Skew and Causality Violations: Why Distributed Clocks Lie

Tradeoff Reasoning

Clock Skew and Causality Violations: Why Distributed Clocks Lie Stale Reads and Cascading Failures in Distributed Systems Split Brain Explained: When Two Nodes Both Think They Are Leader

Recent deep dives

Editorial reading rhythm

282 articles

Data Lineage Explained: Tracking Data Flow Across Your Organization

Featured deep dive

Data Lineage Explained: Tracking Data Flow Across Your Organization

TLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan

Read deep dive

Data Governance Essentials: Framework and Best Practices

Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 min

OWASP Credential Stuffing Key Terms Explained with Practical Examples

OWASP Credential Stuffing Key Terms Explained with Practical ExamplesTLDR: Credential-stuffing defense works only when you treat login as a layered, risk-adaptive system: detect attack shape, add step-up authentication, combine bot and fingerprint signals, prevent user15 min

Softmax Function Explained: From Raw Scores to Probabilities

Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 min

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split DataTLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nod24 min

Systems exploration

Follow the concept continuity

View related systems

Browse the full archive

Data Lineage Explained: Tracking Data Flow Across Your OrganizationTLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan12 min read

Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 min read

OWASP Credential Stuffing Key Terms Explained with Practical ExamplesTLDR: Credential-stuffing defense works only when you treat login as a layered, risk-adaptive system: detect attack shape, add step-up authentication, combine bot and fingerprint signals, prevent user15 min read

Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split DataTLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nod24 min read

Java 21 to 25: Virtual Threads, Pattern Matching, and Structured ConcurrencyTLDR: Java 21 LTS makes virtual threads a production-ready replacement for bounded thread pools — your newFixedThreadPool(200) can become newVirtualThreadPerTaskExecutor() and handle 10× the concurren22 min read

Java 14 to 17: Records, Sealed Classes, Text Blocks, and Pattern MatchingTLDR: Java 14–17 ran a deliberate four-release preview-to-stable conveyor belt. Records replaced 50-line POJOs with one line. Text blocks ended escape-sequence chaos in multi-line strings. Sealed clas25 min read

HyperLogLog Explained: Counting Billions of Unique Items with 12 KBTLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit st18 min read

Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural NetworksTLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes22 min read

Count-Min Sketch Explained: Frequency Estimation at Streaming ScaleTLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one c22 min read

Clock Skew and Causality Violations: Why Distributed Clocks LieTLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or aft19 min read

Bloom Filters Explained: Membership Testing with Zero False NegativesTLDR: A Bloom filter is a bit array of m bits + k independent hash functions that sets k bits on insert and checks those same k bits on lookup. If any checked bit is 0, the element is definitely not i19 min read

…

Continue

Read one deep dive, then follow the next related system.

Explore concept collections