Home/Learn/Performance
Topic

Performance

Learn Performance as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts21 Articles8h

Overview

Learn Performance as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

#apache Spark
Intermediate
Interview Prep
System Design

Learning Path in this Topic

Series that contain articles from Performance. Select a path to filter the article list.

Articles

21 matched articles

Article 1Shuffles in Spark: Why groupBy Kills PerformanceTLDR: A Spark shuffle is the most expensive operation in any distributed job — it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization 31 minArticle 2Little's Law: The Secret Formula for System PerformanceTLDR: Little's Law (\(L = \lambda W\)) connects three metrics every system designer measures: \(L\) = concurrent requests in flight, \(\lambda\) = throughput (RPS), \(W\) = average response time. If l9 minArticle 3SQL Partitioning: Range, Hash, List, and Composite Strategies ExplainedTLDR: SQL partitioning divides one logical table into smaller physical child tables, all accessed through the parent table name. The query optimizer skips irrelevant child tables entirely — a process 25 minArticle 4Partitioning in Spark: HashPartitioner, RangePartitioner, and Custom StrategiesTLDR: Spark's partition count and partitioning strategy are the two levers that determine whether a job scales linearly or crumbles under data growth. HashPartitioner distributes keys by hash modulo —26 minArticle 5Caching and Persistence in Spark: Storage Levels and When to Use ThemTLDR: Calling cache() or persist() does not immediately store anything — Spark caches lazily at the first action, partition by partition, managed by a per-executor BlockManager. When memory fills up, 24 minArticle 6Broadcast Joins vs Sort-Merge Joins in Spark📖 The 45-Minute Join Stage That Became 90 Seconds A data engineering team at a retail company was running a nightly Spark job that joined their 500 GB transaction fact table against a 50 MB product d26 min

Page 1 of 4