Abstract Algorithms

performance

12 articles across 7 sub-topics

Shuffles in Spark: Why groupBy Kills Performance

TLDR: A Spark shuffle is the most expensive operation in any distributed job — it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization barrier between every upstream and downstream stag...

Apr 19, 2026•29 min read

Partitioning in Spark: HashPartitioner, RangePartitioner, and Custom Strategies

TLDR: Spark's partition count and partitioning strategy are the two levers that determine whether a job scales linearly or crumbles under data growth. HashPartitioner distributes keys by hash modulo — fast and uniform for well-distributed keys, catas...

Apr 19, 2026•23 min read

Caching and Persistence in Spark: Storage Levels and When to Use Them

TLDR: Calling cache() or persist() does not immediately store anything — Spark caches lazily at the first action, partition by partition, managed by a per-executor BlockManager. When memory fills up, LRU eviction silently drops or spills partitions. ...

Apr 19, 2026•22 min read

Broadcast Joins vs Sort-Merge Joins in Spark

📖 The 45-Minute Join Stage That Became 90 Seconds A data engineering team at a retail company was running a nightly Spark job that joined their 500 GB transaction fact table against a 50 MB product dimension table. The job had been in production for...

Apr 19, 2026•23 min read

Spark Architecture: Driver, Executors, DAG Scheduler, and Task Scheduler Explained

TLDR: Spark's architecture is a precise chain of responsibility. The Driver converts user code into a DAG, the DAGScheduler breaks it into stages at shuffle boundaries, the TaskScheduler dispatches tasks to Executors respecting data locality, and the...

Apr 19, 2026•26 min read

Spark Adaptive Query Execution: Dynamic Coalescing, Pruning, and Skew Handling

TLDR: Before AQE, Spark compiled your entire query into a static physical plan using size estimates that were frequently wrong — and a wrong estimate at planning time meant a skewed join, 800 small tasks, or a missed broadcast opportunity that no amo...

Apr 19, 2026•34 min read

Databases(1)

Partitioning Approaches in SQL and NoSQL: Horizontal, Vertical, Range, Hash, and List Partitioning

TLDR: Partitioning splits one logical table into smaller physical pieces called partitions. The database planner skips irrelevant partitions entirely — turning a 30-second full-table scan into a 200ms single-partition read. Range partitioning is best...

Apr 12, 2026•37 min read

Garbage Collection(1)

How JVM Garbage Collection Works: Types, Memory Impact, and Tuning

TLDR: JVM garbage collection automatically reclaims unused heap memory, but every algorithm makes a different trade-off between throughput, latency, and memory footprint. The default G1GC targets 200ms pause goals and works well for most services. Fo...

Apr 10, 2026•23 min read

Concurrency(1)

Adapting to Virtual Threads for Spring Developers

TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions — with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Apr 5, 2026•17 min read

Cost Optimization(1)

LLM Model Selection Guide: GPT-4o vs Claude vs Llama vs Mistral — When to Use Which

TLDR: 🧠 Choosing the right LLM can save you 80% on costs while maintaining quality. This guide provides a decision framework, cost comparison, and practical examples to help engineering teams select between GPT-4o, Claude, Llama, and Mistral based o...

Mar 29, 2026•21 min read

Caching(1)

System Design: Complete Guide to Caching — Patterns, Eviction, and Distributed Strategies

TLDR: Caching is the single highest-leverage performance tool in distributed systems. This guide covers every read/write pattern (Cache-Aside through Refresh-Ahead), every eviction policy (LRU through ARC), cache invalidation pitfalls, thundering her...

Mar 9, 2026•29 min read

Architecture(1)

Little's Law: The Secret Formula for System Performance

TLDR: Little's Law ($L = \lambda W$) connects three metrics every system designer measures: $L$ = concurrent requests in flight, $\lambda$ = throughput (RPS), $W$ = average response time. If latency spikes, your concurrency requirement explodes with ...

Mar 9, 2026•12 min read