Abstract AlgorithmsAbstract Algorithms

  • Home
  • All Posts
  • All Series
  • About

Category

shuffle

1 article

Shuffles in Spark: Why groupBy Kills Performance

TLDR: A Spark shuffle is the most expensive operation in any distributed job — it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization barrier between every upstream and downstream stag...

Apr 19, 2026•32 min read

Abstract Algorithms

Exploring the fascinating world of algorithms, data structures, and software engineering through clear explanations and practical examples.

Navigation

  • Home
  • All Posts
  • All Series
  • About

Series

  • Apache Spark Engineering
  • Python Programming

Popular Topics

  • #apache-spark
  • intermediate
  • performance
  • Python
  • Structured Streaming
  • big data

Author

Abstract Algorithms

Abstract Algorithms

@abstractalgorithms

1 followers on Hashnode

© 2026 Abstract Algorithms. All rights reserved.

Powered by Hashnode