Home/Learn/Big Data
Topic

Big Data

Learn Big Data as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts17 Articles5h 59m

Overview

Learn Big Data as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Data Engineering
Architecture
System Design
#apache Spark

Learning Path in this Topic

Series that contain articles from Big Data. Select a path to filter the article list.

Articles

17 matched articles

Article 1Big Data 101: The 5 Vs, Ecosystem, and Why Scale Breaks EverythingTLDR: Traditional databases fail at big data scale for three concrete reasons — storage saturation, compute bottleneck, and write-lock contention. The 5 Vs (Volume, Velocity, Variety, Veracity, Value)21 minArticle 2Big Data Architecture Patterns: Lambda, Kappa, CDC, Medallion, and Data MeshTLDR: A serious data platform is defined less by where files are stored and more by how changes enter the system, how serving layers are materialized, and who owns quality over time. Lambda, Kappa, CD17 minArticle 3Spark on Kubernetes: Operator, Dynamic Allocation, and Production MonitoringTLDR: Running Spark on Kubernetes replaces YARN's static queue model with a container-native, elastically-scaled execution environment. The kubeflow Spark Operator manages SparkApplication CRDs throug36 minArticle 4Spark Executor Sizing: Memory Model, Core Tuning, and GC StrategyTLDR: Spark executor OOMs are almost never caused by insufficient total cluster RAM — they are caused by misallocating memory across five distinct JVM regions while ignoring GC behavior and memoryOver37 minArticle 5Spark Architecture: Driver, Executors, DAG Scheduler, and Task Scheduler ExplainedTLDR: Spark's architecture is a precise chain of responsibility. The Driver converts user code into a DAG, the DAGScheduler breaks it into stages at shuffle boundaries, the TaskScheduler dispatches ta28 minArticle 6Spark Adaptive Query Execution: Dynamic Coalescing, Pruning, and Skew HandlingTLDR: Before AQE, Spark compiled your entire query into a static physical plan using size estimates that were frequently wrong — and a wrong estimate at planning time meant a skewed join, 800 small ta39 min

Page 1 of 3