Abstract Algorithms

AbstractAlgorithms

Home/Learn/Data Engineering

Topic

Data Engineering

Learn Data Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

Client

Data

Storage

10 Concepts17 Articles5h 26m

Start Learning Add to Learning

Overview

Learn Data Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Architecture

Big Data

System Design

#apache Spark

Learning Path in this Topic

Series that contain articles from Data Engineering. Select a path to filter the article list.

Articles

17 matched articles

Article 1Data Lineage Explained: Tracking Data Flow Across Your OrganizationTLDR: 📊 Data lineage is the complete genealogy of your data — where it comes from, how it's transformed, and where it ends up. It's critical for debugging pipelines, proving compliance, and understan12 min

Article 2Data Governance Essentials: Framework and Best PracticesTLDR: 📋 Data governance is the framework that answers "who owns this data, who can access it, and what quality standards must it meet?" Without governance, data pipelines become chaotic. Implement it9 min

Article 3How CDC Works Across Databases: PostgreSQL, MySQL, MongoDB, and BeyondA data engineering team at a fintech company built what they believed was a robust Change Data Capture pipeline: three source databases (PostgreSQL, MongoDB, and Cassandra), Debezium connectors wired 37 min

Article 4Modern Table Formats: Delta Lake vs Apache Iceberg vs Apache HudiTLDR: Delta Lake, Apache Iceberg, and Apache Hudi are open table formats that wrap Parquet files with a transaction log (or snapshot tree) to deliver ACID guarantees, time travel, schema evolution, an24 min

Article 5Medallion Architecture: Bronze, Silver, and Gold Layers in PracticeTLDR: Medallion Architecture solves the "data swamp" problem by organizing a data lake into three progressively refined zones — Bronze (raw, immutable), Silver (cleaned, conformed), Gold (aggregated, 23 min

Article 6Kappa Architecture: Streaming-First Data PipelinesTLDR: Kappa architecture replaces Lambda's batch + speed dual codebases with a single streaming pipeline backed by a replayable Kafka log. Reprocessing becomes replaying from offset 0. One codebase, n21 min

Page 1 of 3