Abstract Algorithms

Chapters

Engineering chapters for understanding systems.

Distributed systems, AI infrastructure, data structures, and system design explained with calm, production-minded depth.

Find a chapter

Continue reading

Pick up a chapter

LLM Skills vs Tools: The Missing Layer in Agent Design

LLM Skills vs Tools: The Missing Layer in Agent Design

TLDR: A tool is a single callable capability (search, SQL, calculator). A skill is a reusable mini-workflow that coordinates multiple tool calls with policy, guardrails, retries, and output structure.

Little's Law: The Secret Formula for System Performance

Little's Law: The Secret Formula for System Performance

TLDR: Little's Law (\(L = \lambda W\)) connects three metrics every system designer measures: \(L\) = concurrent requests in flight, \(\lambda\) = throughput (RPS), \(W\) = average response time. If l

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF

TLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer tr

Discovery shortcuts

Move into Discovery

System Design

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ OWASP Credential Stuffing Key Terms Explained with Practical Examples NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

Backend Systems

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data SQL Partitioning: Range, Hash, List, and Composite Strategies Explained

AI Infrastructure

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ Data Lineage Explained: Tracking Data Flow Across Your Organization OWASP Credential Stuffing Key Terms Explained with Practical Examples

Distributed Systems

HyperLogLog Explained: Counting Billions of Unique Items with 12 KB Count-Min Sketch Explained: Frequency Estimation at Streaming Scale Clock Skew and Causality Violations: Why Distributed Clocks Lie

Tradeoff Reasoning

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ Clock Skew and Causality Violations: Why Distributed Clocks Lie Stale Reads and Cascading Failures in Distributed Systems

Recent chapters

Editorial reading rhythm

283 chapters

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ

Featured chapter

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ

TLDR: If your dataset is small and correctness is critical, use Flat. If you need high recall with low latency and enough RAM, use HNSW. If your corpus is huge and memory is your bottleneck, use IVF-P

Read chapter

Data Lineage Explained: Tracking Data Flow Across Your Organization

Data Lineage Explained: Tracking Data Flow Across Your OrganizationTLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan12 min

Data Governance Essentials: Framework and Best Practices

Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 min

OWASP Credential Stuffing Key Terms Explained with Practical Examples

OWASP Credential Stuffing Key Terms Explained with Practical ExamplesTLDR: Credential-stuffing defense works only when you treat login as a layered, risk-adaptive system: detect attack shape, add step-up authentication, combine bot and fingerprint signals, prevent user15 min

Softmax Function Explained: From Raw Scores to Probabilities

Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 min

Systems exploration

Follow the concept continuity

View related systems

Browse all chapters

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQTLDR: If your dataset is small and correctness is critical, use Flat. If you need high recall with low latency and enough RAM, use HNSW. If your corpus is huge and memory is your bottleneck, use IVF-P14 min read

Data Lineage Explained: Tracking Data Flow Across Your OrganizationTLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan12 min read

Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 min read

OWASP Credential Stuffing Key Terms Explained with Practical ExamplesTLDR: Credential-stuffing defense works only when you treat login as a layered, risk-adaptive system: detect attack shape, add step-up authentication, combine bot and fingerprint signals, prevent user15 min read

Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split DataTLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nod24 min read

Java 21 to 25: Virtual Threads, Pattern Matching, and Structured ConcurrencyTLDR: Java 21 LTS makes virtual threads a production-ready replacement for bounded thread pools — your newFixedThreadPool(200) can become newVirtualThreadPerTaskExecutor() and handle 10× the concurren22 min read

Java 14 to 17: Records, Sealed Classes, Text Blocks, and Pattern MatchingTLDR: Java 14–17 ran a deliberate four-release preview-to-stable conveyor belt. Records replaced 50-line POJOs with one line. Text blocks ended escape-sequence chaos in multi-line strings. Sealed clas25 min read

HyperLogLog Explained: Counting Billions of Unique Items with 12 KBTLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit st18 min read

Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural NetworksTLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes22 min read

Count-Min Sketch Explained: Frequency Estimation at Streaming ScaleTLDR: Count-Min Sketch (CMS) is a fixed-size d × w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one c22 min read

Clock Skew and Causality Violations: Why Distributed Clocks LieTLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or aft19 min read

…

Continue

Read one chapter, then follow the next related system.