All Series

Series

Big Data Engineering

12 articles·~224 min total·Updated
Big Data Engineering
  1. 01

    Data Warehouse vs Data Lake vs Data Lakehouse: Which One to Choose?

    Snowflake, Databricks, or S3? We explain the evolution of data storage. Learn when to use a struc...

    15 min read
  2. 02

    Big Data Architecture Patterns: Lambda, Kappa, CDC, Medallion, and Data Mesh

    Choose ingestion, serving, and ownership patterns deliberately when data platforms start to scale.

    16 min read
  3. 03

    Data Pipeline Orchestration Pattern: DAG Scheduling, Retries, and Recovery

    Orchestrate dependent data jobs with backfills, idempotent tasks, and lineage-aware operations.

    14 min read
  4. 04

    Dimensional Modeling and SCD Patterns: Building Stable Analytics Warehouses

    Design fact tables, dimensions, and SCD strategies that keep BI metrics historically correct.

    14 min read
  5. 05

    Lambda Architecture Pattern: Balancing Batch Accuracy with Streaming Freshness

    Combine speed and batch layers when both low latency and deterministic recompute are mandatory.

    13 min read
  6. Stream Processing Pipeline Pattern: Stateful Real-Time Data Products06

    Stream Processing Pipeline Pattern: Stateful Real-Time Data Products

    Build low-latency pipelines with windowing, state stores, and exactly-once outcomes.

    16 min read
  7. 08

    Big Data 101: The 5 Vs, Ecosystem, and Why Scale Breaks Everything

    Why traditional databases fail at scale, what the 5 Vs really mean, and a practical map of the big data ecosystem from ingestion to insights.

    21 min read
  8. Kappa Architecture: Streaming-First Data Pipelines09

    Kappa Architecture: Streaming-First Data Pipelines

    Eliminate the batch layer entirely: how Kappa architecture uses a single streaming pipeline for both real-time and historical processing.

    24 min read
  9. Medallion Architecture: Bronze, Silver, and Gold Layers in Practice10

    Medallion Architecture: Bronze, Silver, and Gold Layers in Practice

    Structure your data lake with progressive refinement: raw Bronze ingestion, cleaned and conformed Silver, and business-ready Gold aggregates.

    26 min read
  1. 07

    Apache Spark for Data Engineers: RDDs, DataFrames, and Structured Streaming

    Process petabytes with Python: understand Spark's execution model, DataFrame transformations, partitioning strategy, and real-time streaming.

    20 min read
  2. Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache Hudi11

    Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache Hudi

    Parquet alone breaks at scale: Delta Lake, Iceberg, and Hudi bring ACID, time travel, and schema evolution to object storage.

    27 min read
  3. 12

    Big Data Engineering: Your Complete Learning Roadmap

    4 Phases, 11 Posts, The Right Order

    18 min read