Abstract Algorithms
Home

Topic

intermediate

15 articles across 2 sub-topics

Sub-topic

#Apache-spark

9 articles

Watermarking and Late Data Handling in Spark Structured Streaming

TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global mi

โ€ข27 min read

Spark Structured Streaming: Micro-Batch vs Continuous Processing

๐Ÿ“– The 15-Minute Gap: How a Fraud Team Discovered They Needed Real-Time Streaming A fintech team runs payment fraud detection with a well-tuned Spark batch job. Every 15 minutes it reads a day's worth

โ€ข27 min read

Shuffles in Spark: Why groupBy Kills Performance

TLDR: A Spark shuffle is the most expensive operation in any distributed job โ€” it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization

โ€ข31 min read

Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC

TLDR: Parquet's columnar layout with row-group statistics enables predicate pushdown that can reduce a 500 GB scan to 8 GB. Delta Lake wraps Parquet with a JSON transaction log to add ACID semantics a

โ€ข34 min read

Partitioning in Spark: HashPartitioner, RangePartitioner, and Custom Strategies

TLDR: Spark's partition count and partitioning strategy are the two levers that determine whether a job scales linearly or crumbles under data growth. HashPartitioner distributes keys by hash modulo โ€”

โ€ข26 min read

Kafka and Spark Structured Streaming: Building a Production Pipeline

๐Ÿ“– The 500K-Event Problem: When a Naive Kafka Consumer Falls Apart An analytics platform at a mid-sized fintech company needs to process 500,000 payment events per second from a Kafka cluster. The tea

โ€ข23 min read

Sub-topic

Python

6 articles

Pythonic Code: Idioms Every Developer Should Know

TLDR: Writing for i in range(len(arr)): works, but Python veterans will flag it in your first code review. Idiomatic Python uses enumerate, zip, comprehensions, context managers, unpacking, the walrus

โ€ข27 min read

Python OOP: Classes, Dataclasses, and Dunder Methods

๐Ÿ“– Why Every Java Developer Writes Un-Pythonic Classes on Day One Imagine a developer โ€” let's call him Daniel โ€” who has written Java for six years. He sits down to write his first Python class and pro

โ€ข22 min read

List Comprehensions, Generators, and Lazy Evaluation in Python

๐Ÿ“– The MemoryError That Launched a Thousand Generators Meet Priya. She is a data engineer at a logistics company, tasked with crunching a 10 GB CSV of shipping events. She opens her laptop, writes wha

โ€ข24 min read

Functional Python: map, filter, itertools, and functools

๐Ÿ“– The Nested-Loop Tax: When Five Stages of ETL Collapse Under Their Own Weight Picture this task. You receive a batch of raw order records from a sales API. Your pipeline must: (1) skip cancelled ord

โ€ข29 min read

Decorators Explained: From Functions to Frameworks

๐Ÿ“– The Copy-Paste Crisis: When Timing Code Invades Twenty Functions Sofia is three months into her first Python backend role. The team runs a performance review and discovers the data-processing API i

โ€ข24 min read

Async Python: asyncio, Coroutines, and Event Loops Without the Confusion

๐Ÿ“– The 500-Second Problem: What Cooperative Multitasking Actually Fixes Suppose your monitoring pipeline checks the health endpoint of 1,000 internal microservices. Each HTTP call takes about 500 mill

โ€ข27 min read

Abstract Algorithms ยท ยฉ 2026 ยท Engineering learning lab