Abstract Algorithms
Abstract Algorithms

Follow
Abstract Algorithms
Home

Topic

shuffle

1 article

Shuffles in Spark: Why groupBy Kills Performance

TLDR: A Spark shuffle is the most expensive operation in any distributed job — it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization

Apr 19, 2026•31 min read

Abstract Algorithms · © 2026 · Engineering learning lab