Home/Learn/Streaming
Topic

Streaming

Learn Streaming as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts12 Articles4h 14m

Overview

Learn Streaming as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

#apache Spark
Structured Streaming
System Design
Architecture

Learning Path in this Topic

Series that contain articles from Streaming. Select a path to filter the article list.

Articles

12 matched articles

Article 1Count-Min Sketch Explained: Frequency Estimation at Streaming ScaleTLDR: Count-Min Sketch (CMS) is a fixed-size d Ɨ w counter matrix that estimates how often any element has appeared in a stream. Insert: hash the element with each of the d hash functions to get one c22 minArticle 2Spark Structured Streaming: Micro-Batch vs Continuous ProcessingšŸ“– The 15-Minute Gap: How a Fraud Team Discovered They Needed Real-Time Streaming A fintech team runs payment fraud detection with a well-tuned Spark batch job. Every 15 minutes it reads a day's worth27 minArticle 3How Kafka Works: The Log That Never ForgetsTLDR: Kafka is a distributed event store. Unlike a traditional queue (RabbitMQ) where messages disappear after reading, Kafka stores them in a persistent Log. This allows multiple consumers to read th13 minArticle 4Watermarking and Late Data Handling in Spark Structured StreamingTLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global mi27 minArticle 5Stateful Aggregations in Spark Structured Streaming: mapGroupsWithStateTLDR: mapGroupsWithState gives each streaming key its own mutable state object, persisted in a fault-tolerant state store that checkpoints to object storage on every micro-batch. Where window aggregat28 minArticle 6Kafka and Spark Structured Streaming: Building a Production PipelinešŸ“– The 500K-Event Problem: When a Naive Kafka Consumer Falls Apart An analytics platform at a mid-sized fintech company needs to process 500,000 payment events per second from a Kafka cluster. The tea23 min

Page 1 of 2