Abstract Algorithms

Topic

structured streaming

4 articles

Watermarking and Late Data Handling in Spark Structured Streaming

TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global mi

Apr 19, 2026•27 min read

Spark Structured Streaming: Micro-Batch vs Continuous Processing

📖 The 15-Minute Gap: How a Fraud Team Discovered They Needed Real-Time Streaming A fintech team runs payment fraud detection with a well-tuned Spark batch job. Every 15 minutes it reads a day's worth

Apr 19, 2026•27 min read

Stateful Aggregations in Spark Structured Streaming: mapGroupsWithState

TLDR: mapGroupsWithState gives each streaming key its own mutable state object, persisted in a fault-tolerant state store that checkpoints to object storage on every micro-batch. Where window aggregat

Apr 19, 2026•28 min read

Kafka and Spark Structured Streaming: Building a Production Pipeline

📖 The 500K-Event Problem: When a Naive Kafka Consumer Falls Apart An analytics platform at a mid-sized fintech company needs to process 500,000 payment events per second from a Kafka cluster. The tea

Apr 19, 2026•23 min read