Abstract Algorithms

Home

Topic

intermediate

15 articles across 2 sub-topics

Sub-topic

#Apache-spark

9 articles

Watermarking and Late Data Handling in Spark Structured Streaming

TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global mi

Apr 19, 2026•27 min read

Spark Structured Streaming: Micro-Batch vs Continuous Processing

📖 The 15-Minute Gap: How a Fraud Team Discovered They Needed Real-Time Streaming A fintech team runs payment fraud detection with a well-tuned Spark batch job. Every 15 minutes it reads a day's worth

Apr 19, 2026•27 min read

Shuffles in Spark: Why groupBy Kills Performance

TLDR: A Spark shuffle is the most expensive operation in any distributed job — it moves every matching key across the network, writes temporary sorted files to disk, and forces a hard synchronization

Apr 19, 2026•31 min read

Reading and Writing Data in Spark: Parquet, Delta, JSON, and JDBC

TLDR: Parquet's columnar layout with row-group statistics enables predicate pushdown that can reduce a 500 GB scan to 8 GB. Delta Lake wraps Parquet with a JSON transaction log to add ACID semantics a

Apr 19, 2026•34 min read

Partitioning in Spark: HashPartitioner, RangePartitioner, and Custom Strategies

TLDR: Spark's partition count and partitioning strategy are the two levers that determine whether a job scales linearly or crumbles under data growth. HashPartitioner distributes keys by hash modulo —

Apr 19, 2026•26 min read

Kafka and Spark Structured Streaming: Building a Production Pipeline

📖 The 500K-Event Problem: When a Naive Kafka Consumer Falls Apart An analytics platform at a mid-sized fintech company needs to process 500,000 payment events per second from a Kafka cluster. The tea

Apr 19, 2026•23 min read

Sub-topic

Python

6 articles

Pythonic Code: Idioms Every Developer Should Know

TLDR: Writing for i in range(len(arr)): works, but Python veterans will flag it in your first code review. Idiomatic Python uses enumerate, zip, comprehensions, context managers, unpacking, the walrus

Apr 19, 2026•27 min read

Python OOP: Classes, Dataclasses, and Dunder Methods

📖 Why Every Java Developer Writes Un-Pythonic Classes on Day One Imagine a developer — let's call him Daniel — who has written Java for six years. He sits down to write his first Python class and pro

Apr 19, 2026•22 min read

List Comprehensions, Generators, and Lazy Evaluation in Python

📖 The MemoryError That Launched a Thousand Generators Meet Priya. She is a data engineer at a logistics company, tasked with crunching a 10 GB CSV of shipping events. She opens her laptop, writes wha

Apr 19, 2026•24 min read

Functional Python: map, filter, itertools, and functools

📖 The Nested-Loop Tax: When Five Stages of ETL Collapse Under Their Own Weight Picture this task. You receive a batch of raw order records from a sales API. Your pipeline must: (1) skip cancelled ord

Apr 19, 2026•29 min read

Decorators Explained: From Functions to Frameworks

📖 The Copy-Paste Crisis: When Timing Code Invades Twenty Functions Sofia is three months into her first Python backend role. The team runs a performance review and discovers the data-processing API i

Apr 19, 2026•24 min read

Async Python: asyncio, Coroutines, and Event Loops Without the Confusion

📖 The 500-Second Problem: What Cooperative Multitasking Actually Fixes Suppose your monitoring pipeline checks the health endpoint of 1,000 internal microservices. Each HTTP call takes about 500 mill

Apr 19, 2026•27 min read