Home/Learn/Data Engineering
Topic

Data Engineering

Learn Data Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts17 Articles5h 26m

Overview

Learn Data Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Architecture
Big Data
System Design
#apache Spark

Learning Path in this Topic

Series that contain articles from Data Engineering. Select a path to filter the article list.

Articles

17 matched articles

Article 1Data Lineage Explained: Tracking Data Flow Across Your OrganizationTLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan12 minArticle 2Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 minArticle 3How CDC Works Across Databases: PostgreSQL, MySQL, MongoDB, and BeyondA data engineering team at a fintech company built what they believed was a robust Change Data Capture pipeline: three source databases (PostgreSQL, MongoDB, and Cassandra), Debezium connectors wired 37 minArticle 4Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache HudiTLDR: Delta Lake, Apache Iceberg, and Apache Hudi are open table formats that wrap Parquet files with a transaction log (or snapshot tree) to deliver ACID guarantees, time travel, schema evolution, an24 minArticle 5Medallion Architecture: Bronze, Silver, and Gold Layers in PracticeTLDR: Medallion Architecture solves the "data swamp" problem by organizing a data lake into three progressively refined zones — Bronze (raw, immutable), Silver (cleaned, conformed), Gold (aggregated, 23 minArticle 6Kappa Architecture: Streaming-First Data PipelinesTLDR: Kappa architecture replaces Lambda's batch + speed dual codebases with a single streaming pipeline backed by a replayable Kafka log. Reprocessing becomes replaying from offset 0. One codebase, n21 min

Page 1 of 3