How CDC Works Across Databases: PostgreSQL, MySQL, MongoDB, and Beyond

Why log-based capture succeeds in some databases and silently fails in others — and what to do when your database doesn't expose a log.

How It Works: Internals Explained

Abstract Algorithms

·Apr 5, 2026·33 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 33 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

A data engineering team at a fintech company built what they believed was a robust Change Data Capture pipeline: three source databases (PostgreSQL, MongoDB, and Cassandra), Debezium connectors wired to Kafka, and a downstream data warehouse receiving every insert, update, and delete. Six months into production they discovered three silent failures: their Cassandra CDC had been silently converting deletes into nothing — because Cassandra's commitlog CDC does not emit delete notifications in the way application developers expect. Their MongoDB connector had been failing to capture changes on one of the clusters — because that cluster was a standalone instance, not a replica set, and MongoDB CDC simply does not work on standalone deployments. Their PostgreSQL replication slot had accumulated 80 GB of unprocessed WAL during a weekend consumer outage and was threatening to fill the disk entirely. The pipeline looked healthy on every dashboard. Every connector showed green. The problems were deeper — inside each database's own CDC mechanism.

TLDR: CDC is not a single technology — it is a different internal mechanism in every database engine. PostgreSQL uses logical decoding on the WAL. MySQL requires row-based binlog mode. MongoDB CDC requires a replica set and can be lost if the capped oplog rolls over. Cassandra CDC accumulates log segments on disk and cannot reliably express deletes. Oracle, SQL Server, and DynamoDB each have their own trade-offs and configuration requirements. Warehouses like BigQuery and Redshift expose no log at all. Choosing the right CDC approach means understanding what your specific database actually exposes — and knowing where it silently fails.

If you want the pipeline side (Debezium, Kafka, snapshot handoff, schema evolution), see Change Data Capture Pattern.

📖 Why the Database Log Is the Only Reliable Source for CDC

Before CDC, teams used three approaches to detect what changed in a database: polling with a watermark column (checking updated_at > last_run_time), database triggers that write change rows to a side table, or application-layer dual-writes that send events alongside every database write.

Each of these breaks in a different and often silent way.

Watermark polling misses hard deletes entirely — a deleted row has no updated_at because it no longer exists. It also misses rows where the updated_at column was not updated (bulk imports, legacy inserts, or direct SQL that bypasses the application layer). High-frequency tables that receive thousands of writes per second require increasingly large polling windows and cause query load spikes on the source database.

Trigger-based CDC works but adds overhead to every write. Every insert, update, and delete fires a synchronous trigger that writes a row to a side table. On a high-traffic table this can add 20–40% write latency. Triggers also break when developers execute bulk operations that explicitly bypass triggers (using COPY in PostgreSQL, for example, which skips row-level triggers entirely).

Application dual-write is the most brittle: it requires every write path in the application to explicitly emit an event. One missed code path — a background job, a migration script, a stored procedure — creates a permanent gap in the change stream. There is no recovery path for events that were never emitted.

The database's own write-ahead log, redo log, or oplog is different. Every committed change must pass through the database's own durability mechanism before it is acknowledged to the writer. The log is written before the change is applied to the data files. This means the log is the authoritative, complete, ordered record of everything that changed — including deletes. CDC that reads from the log is called log-based CDC, and it is the only approach that is both complete and non-intrusive.

The challenge is that every database exposes this log differently, with different requirements, different failure modes, and different limitations.

⚙️ How Log-Based CDC Moves from Write to Consumer: The Universal Pattern

Regardless of the database, log-based CDC follows the same four-step pattern. Understanding this pattern makes the per-database differences easier to reason about.

Step 1 — The write lands in the log. When a transaction commits, the database engine writes the change to its durability log (WAL, binlog, oplog, commitlog, redo log) before acknowledging the commit to the client. The log entry contains the before image of the row, the after image, the operation type (INSERT, UPDATE, DELETE), the timestamp, and the transaction identifier.

Step 2 — The CDC reader tails the log. A CDC connector (like a Debezium source connector) opens a persistent read position on the log and continuously reads new entries. In PostgreSQL this is a replication slot. In MySQL it is a binlog position or GTID set. In MongoDB it is a change stream cursor backed by the oplog.

Step 3 — The reader emits structured events. Each log entry is translated into a structured change event — typically a JSON or Avro record with operation type, table name, before/after row images, and metadata. These events are published to a message bus (usually Kafka).

Step 4 — Consumers process events at their own pace. Downstream systems — a search index, a data warehouse, a cache invalidation service — consume events from the bus and apply them. The CDC reader tracks its position in the log so that if the consumer falls behind, the log can be re-read from the last checkpoint.

The diagram below shows this flow from a committed write through the log to a downstream consumer. The key observation is that the CDC reader is entirely outside the write path — it reads the log asynchronously without modifying the source database or adding latency to writes.

flowchart TD
    A[Application Write (INSERT / UPDATE / DELETE)] --> B[Database Engine Write Path]
    B --> C[Durability Log (WAL / Binlog / Oplog / Redo Log)]
    C --> B2[Data Files (Tables / Documents / SSTables)]
    C --> D[CDC Reader (Debezium / LogMiner / Change Stream Cursor)]
    D --> E[Structured Change Events (JSON / Avro with before + after image)]
    E --> F[Message Bus (Kafka Topic)]
    F --> G1[Data Warehouse (Snowflake / BigQuery)]
    F --> G2[Search Index (Elasticsearch)]
    F --> G3[Cache Invalidation (Redis)]

The log accumulates new entries continuously. The CDC reader's position in the log — sometimes called a slot, a cursor, or an offset — determines what has been read and what is still pending. If the reader stops or falls behind, the log must retain entries until the reader catches up. This retention requirement is the source of most of the operational risks in log-based CDC.

🔍 PostgreSQL: WAL and Logical Decoding Slots

PostgreSQL's Write-Ahead Log (WAL) records every change to the database in a sequence of binary files called WAL segments, each typically 16 MB. WAL was designed for crash recovery and streaming replication — CDC is a secondary use case enabled by a feature called logical decoding.

Logical decoding converts the physical WAL records (which are internal page-level changes) into a logical stream of row-level changes that applications can consume. To use logical decoding, the database must be configured with wal_level = logical. The default is replica, which captures enough for physical replication but not enough for logical decoding.

A replication slot is a named cursor into the WAL stream. When a CDC connector creates a replication slot (e.g., debezium_slot), PostgreSQL begins retaining all WAL segments from that slot's last confirmed position forward. The slot remembers where the connector last confirmed consuming up to, and PostgreSQL will not delete any WAL that the slot has not yet confirmed.

Two output plugins translate WAL entries into change events: pgoutput (built into PostgreSQL 10+, the default for Debezium 1.x+) and decoderbufs (protobuf-based, requires a separate plugin installation). Both expose the same logical change stream; the difference is encoding format and installation requirements.

The replication slot lag danger is the most operationally critical aspect of PostgreSQL CDC. If the CDC connector stops — due to a connector restart, a network partition, or a slow consumer — the replication slot stops advancing. PostgreSQL continues writing new WAL for normal database operations, but it cannot discard any WAL from before the slot's position. On a high-traffic database, WAL can accumulate at hundreds of megabytes per minute. A weekend outage without alerting on slot lag can easily produce 50–200 GB of retained WAL. If the disk fills, PostgreSQL shuts down entirely. Monitoring replication slot lag (pg_replication_slots.confirmed_flush_lsn vs current WAL position) is mandatory for any production PostgreSQL CDC deployment.

🔍 MySQL and MariaDB: Binlog and the Statement-Mode Trap

MySQL's durability log is the binary log (binlog), a sequential log of all changes to the database written before each transaction commits. The binlog has three formats: STATEMENT, ROW, and MIXED.

Statement-based binlog records the SQL statement that was executed. This breaks CDC in two ways. First, non-deterministic functions like NOW() or RAND() in a statement produce different results when replayed, making the binlog unreliable for replication and useless for CDC. Second, UPDATE statements affect a set of rows — the binlog records the statement but not which specific rows changed or what their before/after values were. A CDC reader cannot emit a row-level change event from a statement-mode binlog entry.

Row-based binlog records the before and after image of every row affected by every statement. This is what CDC requires. The MySQL configuration binlog_format = ROW enables this mode. Additionally, binlog_row_image = FULL ensures that the complete before and after image is captured (as opposed to MINIMAL, which only captures the primary key and changed columns in the before image, creating gaps in the before-image data that some CDC use cases require).

GTIDs (Global Transaction Identifiers) provide a unique identifier for every committed transaction across all servers in a replication topology. GTID-based replication (gtid_mode = ON, enforce_gtid_consistency = ON) replaces file-and-position binlog coordinates with globally unique IDs, making it much easier for a CDC connector to resume reliably after a restart and to handle failover to a replica without losing its position.

MariaDB follows the same binlog structure but with its own GTID implementation (gtid_strict_mode). Debezium supports both MySQL and MariaDB with separate connectors due to these GTID differences.

🔍 MongoDB: Oplog, Change Streams, and the Replica Set Requirement

MongoDB's CDC mechanism is fundamentally different from relational databases. MongoDB does not have a WAL in the same sense. Instead, it has the oplog — short for operations log — a special capped collection in the local database on every replica set member. Every write operation that modifies data is recorded in the oplog in the form that can be applied to a secondary.

The oplog is a capped collection. Capped collections have a fixed maximum size, and when that size is reached, the oldest entries are automatically overwritten by new entries. This is the single most dangerous property of the oplog for CDC: if a CDC connector falls behind the oplog tail — because it restarted, or because the oplog is too small for the write volume — the connector's last read position no longer exists in the oplog. The cursor is invalidated. The connector must fall back to a full snapshot of the source collection before it can resume incremental capture. Sizing the oplog appropriately (--oplogSize in MB, configured at replica set initialization or via replSetResizeOplog) is therefore a prerequisite for reliable MongoDB CDC.

Change Streams (MongoDB 4.0+) provide a higher-level abstraction over the oplog. A change stream is a cursor that returns structured change events from the oplog with a stable resume token. The resume token allows a connector to reconnect and resume from exactly where it left off, as long as that position still exists in the oplog. Change streams support filtering, projection, and full document lookup.

The standalone instance trap. The oplog exists only on replica set members. MongoDB CDC does not work on standalone instances at all — a standalone instance has no oplog. This is a deployment requirement, not a configuration option. A CDC connector pointed at a standalone MongoDB instance will fail immediately. The fix is to convert the standalone to a single-member replica set, which adds the oplog without requiring additional servers.

Pre-images and post-images (MongoDB 4.4+) allow change stream events to include the full document state before and after the change. Without this configuration, change stream events only include the updated fields, not the complete before-image of the document. Applications that need to track what a document looked like before a deletion or complex update must enable this feature per collection.

Multi-document transactions in MongoDB (4.0+) produce oplog entries that span multiple operations. A CDC consumer that reads oplog entries individually (rather than using change streams) may see partial transaction effects. Change streams coalesce transaction operations into a single event per document, but connectors must be transaction-aware to avoid publishing partial states.

🔍 Oracle: LogMiner, XStream, and the Cost of Redo Log Parsing

Oracle uses the Redo Log for all committed change durability. CDC on Oracle is available through two mechanisms: LogMiner and XStream.

LogMiner is a built-in Oracle utility that reads redo log files and translates the internal block-level changes into SQL-level row change records. It requires supplemental logging to be enabled — without supplemental logging, the redo log does not capture enough column data for CDC to reconstruct full row images. Minimal supplemental logging adds the primary key to every change. All-column supplemental logging adds every column's before and after value. CDC tooling typically requires at least primary-key supplemental logging; full before-image capture requires all-column or identified-column supplemental logging.

LogMiner operates by SQL query against the redo log files. This means it competes with normal database I/O for CPU and memory. On high-transaction databases, LogMiner parsing can consume 10–20% of CPU capacity. The latency between a committed transaction and the LogMiner output is typically seconds to tens of seconds, depending on redo log segment sizes and flush frequency.

XStream is Oracle's streaming CDC API, available with the GoldenGate option license. XStream provides a persistent streaming interface with significantly lower latency than LogMiner and higher throughput, but it requires the GoldenGate license which is separate and expensive. Most open-source CDC tools (including Debezium's Oracle connector) use LogMiner rather than XStream due to the licensing requirement.

🔍 SQL Server: Built-in CDC vs Change Tracking

SQL Server is the only major relational database that ships with two separate built-in change capture features: Change Data Capture (CDC) and Change Tracking (CT).

SQL Server CDC captures row-level change data into system tables in the msdb schema. When enabled on a table, SQL Server asynchronously reads the transaction log and writes insert, update, and delete events into a corresponding cdc.<schema>_<table>_CT table. Consumers query these tables using the cdc.fn_cdc_get_all_changes_* and cdc.fn_cdc_get_net_changes_* functions. The before and after images of changed rows are stored, supporting full delta replay.

Change Tracking is fundamentally different: it records only which rows changed (by primary key), not what the row values were. Change Tracking is optimized for synchronization scenarios where the consumer will re-fetch the current row state — it answers "which rows are different from version N?" but not "what did the row look like before the change?" For audit, replication, and analytical CDC scenarios, Change Tracking is insufficient because it cannot provide delete data with full before-images.

SQL Server CDC requires SQL Server Agent to be running. The CDC capture and cleanup jobs run as SQL Server Agent jobs. If SQL Server Agent is stopped, CDC capture stops. This is a frequently missed operational dependency in containerized and cloud-managed SQL Server deployments.

The retention window danger. SQL Server CDC stores change data in system tables for a configurable retention period (default: 3 days, controlled by @retention in the CDC configuration). After the retention period expires, old change rows are cleaned up. If a CDC consumer falls behind by more than the retention period, it will miss changes with no recovery path other than a full re-snapshot. Unlike PostgreSQL's replication slot (which retains WAL until the consumer advances), SQL Server CDC silently discards old change data based on wall-clock time.

🔍 Cassandra: Commitlog CDC and the Silent Tombstone Problem

Apache Cassandra uses a commitlog as its durability mechanism — all writes are appended to the commitlog before being acknowledged. Since Cassandra 3.0, there is a CDC feature that copies commitlog segments to a configurable CDC output directory (cdc_raw_directory) for external consumers to process.

Disk fill danger. Cassandra's CDC mechanism works by accumulating commitlog segments in the CDC directory until a consumer processes and deletes them. If the consumer stops or falls behind, commitlog segments accumulate indefinitely. Unlike PostgreSQL (which blocks new WAL retention when disk is critically low), Cassandra does not block new writes when the CDC directory fills — it simply drops CDC data. This makes Cassandra CDC one of the riskier mechanisms to operate: a slow consumer means silent data loss, not backpressure.

Tombstones and the delete problem. In Cassandra's data model, a delete is not a physical row deletion — it is a tombstone: a special marker that says "this row (or cell, or range) was deleted at timestamp T." In the CDC commitlog, a delete appears as a tombstone write, not as a DELETE operation with a before-image of the row. This means Cassandra CDC consumers receive tombstones, not structured delete events. Reconstructing "which row was deleted" from a tombstone requires matching the tombstone's partition key and clustering columns against the consumer's own view of the current data. There is no before-image of the deleted row in the CDC log.

No update semantics. Cassandra uses an upsert model — there is no separate UPDATE operation. Every write is an upsert (an INSERT with a USING TIMESTAMP). In the commitlog, updates appear as individual cell writes with timestamps. Reconstructing whether a commitlog entry represents an insert, an update, or an upsert requires the consumer to track the previous state of the row. The commitlog does not express "this row existed before and the value changed from X to Y."

🔍 DynamoDB Streams: Managed CDC with a 24-Hour Window

DynamoDB Streams is Amazon's managed CDC mechanism for DynamoDB tables. When enabled on a table, DynamoDB Streams captures a time-ordered log of every item-level modification for up to 24 hours.

Stream record types. Each stream record describes a single item modification and includes the operation type (INSERT, MODIFY, REMOVE) and the item data. The StreamViewType setting controls what data is included:

StreamViewType	What it includes
`KEYS_ONLY`	Only the primary key attributes of the modified item
`NEW_IMAGE`	The entire item after the modification
`OLD_IMAGE`	The entire item before the modification
`NEW_AND_OLD_IMAGES`	Both the before and after state of the item

For CDC use cases that require full change semantics (before/after images, delete payloads), NEW_AND_OLD_IMAGES is the only useful option.

Shard-based delivery. DynamoDB Streams is organized into shards, where each shard covers a subset of the table's partition key space. Shards are created and closed as the table scales. A CDC consumer must continuously poll open shards using the Streams API, manage shard iterators, and handle shard splits and closures. AWS Lambda event source mappings can automate this, but custom consumers must implement their own shard management.

The 24-hour retention window. Unlike Kafka (configurable retention, potentially unlimited) or PostgreSQL replication slots (retained until consumed), DynamoDB Streams records expire after exactly 24 hours. A consumer that stops for longer than 24 hours loses all changes from the outage window with no recovery mechanism other than a full table scan.

Table reconstruction problem. If a DynamoDB table is deleted and recreated (for example, during a disaster recovery scenario), the new table has a different Stream ARN. Any CDC consumer tracking the old Stream ARN will not automatically migrate to the new stream. This is a subtle but critical operational gap in DR scenarios.

🧠 Deep Dive: How Slot Lag, Oplog Rollover, and Retention Expiry Break CDC Pipelines

The most dangerous CDC failure modes are not connector crashes or network errors — those are visible. The most dangerous failures are the ones that look healthy on the dashboard but have been silently accumulating a problem that eventually becomes unrecoverable.

The Internals of CDC Consumer Lag and Log Retention

Every log-based CDC mechanism has a two-sided tension: the log producer (the database) writes new entries as fast as the application generates writes, while the log consumer (the CDC reader) reads and confirms entries. The log must retain entries from the oldest unconfirmed consumer position forward.

In PostgreSQL, the replication slot's restart_lsn tracks the oldest WAL position the slot needs. The difference between current_wal_lsn and confirmed_flush_lsn — the slot lag — is the volume of WAL that cannot be discarded. A 10,000-writes-per-second table on PostgreSQL generates approximately 50–200 MB of WAL per minute. A 2-hour consumer outage generates 6–24 GB of retained WAL. A 48-hour weekend outage generates 144–576 GB. On a 500 GB disk, the database itself may shut down.

In MongoDB, the oplog is a capped collection with a fixed maximum size (configurable, default varies by MongoDB version and deployment). The oplog rolls over when it reaches capacity — newer entries overwrite the oldest entries. A CDC connector that falls behind the oplog tail and whose last resume token no longer exists in the oplog cannot resume incrementally. It must perform a new full collection snapshot, which may itself take hours on large collections, during which writes continue and the oplog continues to roll.

In SQL Server CDC and DynamoDB Streams, the retention is time-based rather than consumption-based. The database discards change data after a fixed window regardless of whether any consumer has read it. A consumer that is offline for 3+ days (SQL Server default) or 24 hours (DynamoDB Streams) returns to find its position is no longer valid. The only recovery is a full re-snapshot of the source.

Performance Analysis: Throughput, Latency, and the Log Reading Cost

The throughput a CDC connector can sustain is bounded by the rate at which it can read and parse log entries, not by the rate at which the source database writes them. Log reading is typically I/O-bound on disk or network (reading from a replica log file or through a replication protocol).

Database	CDC Read Mechanism	Typical Latency	Throughput Ceiling	Primary Bottleneck
PostgreSQL	Logical replication protocol (streaming)	< 1 second	Hundreds of MB/s	Network to replica
MySQL	Binlog streaming over MySQL replication protocol	< 1 second	Hundreds of MB/s	Network to replica
MongoDB	Change stream cursor (oplog tailing)	< 1 second	Depends on oplog read throughput	Oplog read IOPS
Oracle LogMiner	SQL query against redo log files	5–30 seconds	~10–50k rows/second	CPU for redo log parsing
SQL Server CDC	SQL query against CDC system tables	1–10 seconds	~50–100k rows/second	SQL Server Agent job frequency
Cassandra	File system copy from commitlog	1–60 seconds	Segment flush frequency	CDC directory I/O
DynamoDB Streams	HTTP API polling	< 1 second	Per-shard read limits	Number of open shards

Oracle's LogMiner stands out as the highest-latency, highest-CPU-cost mechanism. The redo log parsing that LogMiner performs is computationally intensive — it must decode Oracle's internal change vector format into SQL-level row change representations. On busy OLTP databases, LogMiner can introduce measurable database-level CPU contention.

❌ Where CDC Doesn't Work: Warehouses, Serverless, and Blocked Managed Databases

Not every database exposes a log that external readers can access. The following categories either lack a log-based CDC mechanism entirely or expose it only within the database's own ecosystem.

BigQuery. Google BigQuery is an analytical data warehouse with no transaction log exposed to users. There is no streaming log of row-level changes. Workarounds include scheduled queries that compare partition snapshots, INFORMATION_SCHEMA.PARTITIONS watermarks to detect new data arrivals, or using BigQuery's built-in CHANGE_HISTORY in certain configurations. None of these deliver the sub-second latency or delete semantics of true log-based CDC.

Amazon Redshift. Redshift does not expose its WAL to external consumers. CDC into Redshift (moving data from another database into Redshift) is common, but CDC out of Redshift is not natively supported. Teams that need to capture changes in Redshift typically use a watermark-based polling approach (querying a staging table with increasing sequence numbers) or use S3 as an intermediate staging layer. Neither approach reliably captures deletes.

Snowflake Streams. Snowflake has a feature called Streams that tracks row-level changes to a table using an internal change-tracking mechanism. It is available for consumption inside Snowflake pipelines (via SELECT * FROM my_stream within Snowflake tasks). However, Snowflake Streams cannot be consumed externally as a real-time event stream — they cannot be read by Debezium, Kafka Connect, or any external CDC tool. They are a within-Snowflake change tracking facility, not a log-based external CDC interface.

Serverless and HTTP-only databases. PlanetScale (serverless MySQL) previously exposed binlog-based CDC through its PlanetScale Connect feature, but this was available only on certain plans and has changed over time. Fauna and Firestore do not expose a CDC log to external consumers — Firestore offers real-time listeners (SDK-based change subscriptions), but these are not a durable log and cannot be consumed by standard CDC tooling. The pattern for these databases is either application-layer event emission or polling.

Tightly managed SaaS databases. Some SaaS and shared database offerings block log access entirely for security or multi-tenancy isolation reasons. AWS RDS for MySQL and PostgreSQL do expose binlog and logical decoding (with specific parameter group configurations), and Aurora MySQL exposes the binlog. However, some fully managed or shared-tier database products — particularly those built on proprietary infrastructure — do not expose their logs. Before choosing a CDC approach, verifying that the specific managed service allows the required log access (and at what tier) is essential.

Columnar append-only stores. Apache Parquet files in a data lake, ClickHouse append-only tables, and similar column-store systems that only permit inserts trivially express "new data" through append detection — but they cannot represent updates or deletes because those operations do not exist in their data model. CDC in these systems reduces to new-file notification.

📊 From Write to Consumer: Visualizing the CDC Failure Boundaries

The diagram below maps the log retention lifecycle for three representative databases side by side, showing where each mechanism's failure boundary lies. Understanding these boundaries is the first step to building operational alerting that catches problems before they become unrecoverable.

flowchart LR
    subgraph PostgreSQL[PostgreSQL  Slot-Based Retention]
        direction TB
        PW[Write  WAL] --> PS[Replication Slot (retains WAL until confirmed)]
        PS -->|"Slot lag > disk capacity  DB shuts down"| PD[ Disk Full]
        PS -->|"Connector running"| PC[Consumer (advances slot)]
    end

    subgraph MongoDB[MongoDB  Time-Based Oplog Rollover]
        direction TB
        MW[Write  Oplog] --> MO[Capped Oplog (fixed size, rolls over)]
        MO -->|"Consumer lag > oplog size  cursor invalidated"| MF[ Must Re-Snapshot]
        MO -->|"Consumer within window"| MC[Consumer (resume token valid)]
    end

    subgraph SQLServer[SQL Server  Wall-Clock Retention]
        direction TB
        SW[Write  Transaction Log] --> SC[CDC Change Tables (time-limited: default 3 days)]
        SC -->|"Consumer offline > 3 days  changes discarded silently"| SF[ Silent Data Loss]
        SC -->|"Consumer within retention window"| SS[Consumer (reads change table)]
    end

Each database's failure mode has a different trigger and a different recovery path. PostgreSQL fails visibly (disk full, database stops). MongoDB fails silently but detectably (cursor invalidated, connector logs an error). SQL Server fails silently and permanently (old change rows deleted, no error, no recovery without re-snapshot). The operational implication is that monitoring lag is not optional — it is the difference between a recoverable situation and a data gap.

🌍 Where These CDC Patterns Power Production Systems at Scale

Understanding the mechanisms abstractly is useful, but seeing where these patterns appear in real production systems reveals why the configuration details matter.

LinkedIn's Databus (origin of modern CDC) was built to capture changes from Oracle databases and feed them to search indexes, social graph caches, and recommendations systems. Databus exposed the problem of Oracle LogMiner latency at scale: the redo log parsing latency meant that profile updates could be invisible to search for 15–30 seconds. LinkedIn ultimately moved portions of their stack to MySQL binlog-based CDC to reduce latency.

Shopify runs Debezium on over 200 PostgreSQL and MySQL tables. Their CDC pipeline feeds Kafka topics consumed by analytics, search, and fraud detection systems. Their primary operational challenge has been replication slot lag management — maintaining dedicated monitoring that alerts if any slot falls more than 5 minutes behind, and implementing a circuit breaker that pauses writes to the CDC output topic if a downstream system is unhealthy to prevent slot accumulation.

Airbnb uses MongoDB change streams to capture listing and booking mutations and propagate them to their Elasticsearch search index. Their documented operational challenge was oplog sizing — early deployments used default oplog sizes on clusters with high write throughput, leading to oplog rollover events during peak traffic. They increased oplog sizes to cover multiple hours of write volume at peak load.

Netflix uses a combination of MySQL CDC (via their Flink-based DBLog framework) and DynamoDB Streams for different parts of their data platform. Their DBLog system specifically addresses the snapshot-to-stream handoff problem: it coordinates a full table dump and the start of incremental log consumption so that no changes are missed and no changes are duplicated during the transition.

⚖️ Database CDC Capability Comparison

The table below summarizes the CDC mechanism, key requirements, and critical operational gotchas for each database covered in this post.

Database	CDC Mechanism	Log Type	Supports Deletes?	External Tool Required	Managed Cloud Ready	Key Gotcha
PostgreSQL	Logical Decoding	WAL	✅ Yes (tombstone event)	Debezium / pglogical	✅ Yes (RDS, Aurora, Cloud SQL)	Replication slot accumulates WAL — disk fill risk
MySQL / MariaDB	Binlog Streaming	Binlog	✅ Yes	Debezium	✅ Yes (RDS, Aurora MySQL)	`binlog_format = ROW` required; statement mode silently breaks CDC
MongoDB	Change Streams	Oplog (capped)	✅ Yes (REMOVE event)	Debezium / Atlas Streams	✅ Yes (Atlas)	Requires replica set; oplog rollover invalidates cursor
Oracle	LogMiner / XStream	Redo Log	✅ Yes	Debezium (LogMiner) / GoldenGate	Partial (LogMiner only on some RDS configs)	Supplemental logging required; high CPU cost
SQL Server	Built-in CDC	Transaction Log	✅ Yes	Debezium / JDBC	✅ Yes (RDS SQL Server, Azure SQL)	SQL Server Agent required; 3-day retention window — silent expiry
Cassandra	Commitlog CDC	Commitlog Segments	⚠️ Tombstones only	Custom reader	❌ Limited	Disk fill if consumer stops; no structured delete events
DynamoDB	DynamoDB Streams	Proprietary Streams	✅ Yes (REMOVE event)	AWS Lambda / KCL	✅ Managed by AWS	24-hour retention — silent loss after outage
BigQuery	None (native)	None exposed	❌ No	Watermark polling workaround	N/A	No log API; scheduled queries are the only near-CDC option
Redshift	None (native)	None exposed	❌ No	S3 staging / watermark	N/A	CDC out of Redshift is not natively supported
Snowflake	Snowflake Streams	Internal change tracking	✅ Within Snowflake only	N/A (internal only)	N/A	Streams are internal to Snowflake — cannot be consumed externally

🧭 Choosing Your CDC Approach: A Decision Guide by Database and Constraint

The right CDC approach is determined by four constraints: the source database engine, whether you are running on a managed cloud service (and which tier), how much lag is acceptable, and whether you need full delete semantics.

Start with your database engine. The database engine determines which log-based mechanisms are available. If your database is not in the left column of the comparison table above, log-based CDC is not available and you are limited to watermark polling or application-layer event emission.

Verify log access on your specific managed tier. AWS RDS for PostgreSQL requires setting rds.logical_replication = 1 in the parameter group and granting the rds_replication role to the CDC user. AWS RDS for MySQL requires enabling the binlog with binlog_format = ROW. Aurora MySQL exposes the binlog. Aurora PostgreSQL supports logical replication (including Debezium). Some shared-tier managed services do not expose these settings. Verify before committing to an architecture.

Match retention to your consumer reliability requirements. If your CDC consumer is not highly available, the log retention mechanism must be longer than your maximum acceptable downtime. PostgreSQL replication slots hold WAL indefinitely (but fill disk), so the risk is disk capacity. MongoDB oplog size must be large enough to cover the consumer's maximum downtime at peak write volume. SQL Server CDC retention must exceed the consumer's maximum downtime or you must accept that a long outage requires a re-snapshot.

Check delete requirements. If your use case requires delete propagation (search index remove, warehouse soft-delete, audit trail), rule out Cassandra CDC (no structured deletes), BigQuery (no CDC), and Redshift (no CDC). For Snowflake, internal Streams support delete tracking within Snowflake pipelines but cannot feed external systems.

The flowchart below is a condensed decision path for the most common scenarios:

flowchart TD
    A[What is your source database?] --> B{Relational or document?}
    B -->|"PostgreSQL"| C[Enable wal_level=logical Create replication slot Monitor slot lag daily]
    B -->|"MySQL / MariaDB"| D[Set binlog_format=ROW binlog_row_image=FULL Enable GTIDs]
    B -->|"MongoDB"| E{Replica set or standalone?}
    E -->|"Standalone"| F[Convert to single-member replica set first]
    E -->|"Replica set"| G[Use Change Streams Size oplog for max consumer lag Configure pre/post images]
    B -->|"SQL Server"| H[Enable CDC on target tables Start SQL Server Agent Set retention > consumer SLA]
    B -->|"Cassandra"| I[Use commitlog CDC Budget extra disk for CDC dir Do NOT rely on delete events]
    B -->|"DynamoDB"| J[Enable Streams with NEW_AND_OLD_IMAGES Build consumer within 24h SLA]
    B -->|"BigQuery / Redshift / Snowflake"| K[Log-based CDC unavailable Use watermark polling or application-layer events]

🧪 Worked Example: Three Databases, Three Silent Failures, One Pipeline

This example revisits the fintech team from the opening — their PostgreSQL, MongoDB, and Cassandra pipeline — and traces exactly where each failure originated and how it was diagnosed.

The scenario. The team's pipeline processed account ledger entries (PostgreSQL), user activity documents (MongoDB), and session key-value data (Cassandra). Their Debezium connectors ran on Kafka Connect, with three source connectors publishing to three separate Kafka topics. They monitored connector status via the Kafka Connect REST API — all three connectors showed RUNNING.

What the team was investigating. A downstream fraud detection service reported that deleted user sessions (Cassandra) were still appearing in its working set days after deletion. A user account document updated in MongoDB during a scheduled maintenance window was not reflected in the search index. And the data ops team received a PagerDuty alert: PostgreSQL disk was at 92% on a Saturday morning after the Kafka Connect cluster had an unrelated restart Friday night.

Diagnosing the PostgreSQL problem. The Friday night Kafka Connect restart caused all three connectors to stop and restart. The PostgreSQL connector reconnected and caught up within minutes. But during the outage, the replication slot had accumulated 8 hours of WAL at the database's transaction rate (~60 MB/min = ~28 GB). The disk alert was the replication slot lag materializing as retained WAL. The fix: the team added a Prometheus metric on pg_replication_slots.confirmed_flush_lsn lag with a 5-minute alert threshold.

Diagnosing the MongoDB problem. The maintenance window MongoDB updates were missing from the search index because that specific MongoDB cluster — hosting a subset of user documents — had never been converted from standalone to replica set. The Debezium MongoDB connector had been configured to connect to it but was silently failing on startup (no replica set → no oplog → connector initialization failed and entered a restart loop). The Kafka Connect status API showed the connector as RUNNING because it was in the middle of a retry cycle. The fix: the cluster was converted to a single-member replica set. The team added an explicit health check that verified the connector had actually produced at least one offset record, not just that its status was RUNNING.

Diagnosing the Cassandra problem. The Cassandra CDC commitlog contains tombstones, not DELETE events. The Debezium Cassandra connector translates tombstone writes into delete events in the Kafka topic, but only if the consumer is using a version of the connector and configuration that correctly interprets the tombstone's partition key and clustering columns as a row identifier. The team's connector configuration was missing the primary key column mapping that would allow the tombstone to be matched to the corresponding session record. The result: delete events were published to Kafka but with incomplete key information, causing the fraud detection service's idempotency logic to treat them as new records rather than deletions. The fix: a correct primary key column definition in the Cassandra connector configuration, plus a backfill job to reprocess the previous 7 days of events.

🛠️ Debezium: The Open-Source Engine That Reads Database Logs

Debezium is the primary open-source CDC framework used in production across all the database types covered in this post. It implements log-based CDC connectors for PostgreSQL (logical decoding), MySQL (binlog streaming), MongoDB (change streams), Oracle (LogMiner), SQL Server (built-in CDC API), and Cassandra (commitlog), among others. Debezium connectors run as Kafka Connect source connectors and publish change events to Kafka topics.

Each Debezium connector handles the database-specific protocol differences described in this post: creating and managing replication slots for PostgreSQL, managing binlog position and GTID sets for MySQL, managing change stream cursors and resume tokens for MongoDB, invoking LogMiner sessions for Oracle, and reading CDC change table functions for SQL Server. Debezium also manages the initial snapshot — the full read of the source table before incremental log capture begins — and the handoff from snapshot to log-based capture.

Debezium's primary operational surface is Kafka Connect configuration — connector configuration JSON or YAML specifying the database connection, the slot or binlog position, the tables to capture, the output topic naming strategy, and the schema registry for Avro serialization. The framework does not abstract away the database-specific requirements described in this post: wal_level = logical is still required in PostgreSQL, binlog_format = ROW is still required in MySQL, and a replica set is still required in MongoDB.

For a full deep-dive on Debezium configuration, snapshot handoff, schema evolution, and failure recovery, see Change Data Capture Pattern.

📚 Lessons Learned from Database CDC in Production

Lesson 1: Monitor log retention lag, not just connector status. Every database-level CDC mechanism has a retention boundary. Connector status (RUNNING) tells you the software is running; it does not tell you whether the consumer is keeping up. Replication slot lag (PostgreSQL), oplog cursor validity (MongoDB), and change table age (SQL Server) must be independently monitored.

Lesson 2: The managed cloud tier determines what CDC is possible. AWS RDS, Aurora, Cloud SQL, and Azure Database all support log-based CDC, but require specific parameter group settings that must be configured before any CDC connector is attached. Verify log access before designing the pipeline, not after.

Lesson 3: Deletes are always the hardest part. Every CDC mechanism handles deletes differently. PostgreSQL and MySQL provide clean DELETE events with before-images. MongoDB provides REMOVE events. SQL Server provides delete rows in change tables. Cassandra provides tombstones. BigQuery and Redshift provide nothing. Before choosing a CDC approach, know exactly what delete semantics your downstream consumers require.

Lesson 4: Standalone MongoDB is a CDC dead end. MongoDB CDC requires a replica set. A standalone instance has no oplog. If a team is deploying MongoDB and may add CDC later, they should initialize it as a replica set from day one — converting a standalone to a replica set is non-disruptive but adds operational steps that are easily forgotten.

Lesson 5: Log-based CDC is non-intrusive to writes but not free. Log-based CDC does not add latency to writes and does not require triggers or polling queries. But it does consume I/O, network, and compute resources to read and parse the log. For PostgreSQL, replication slot retention consumes disk. For Oracle LogMiner, log parsing consumes CPU. For Cassandra, the CDC directory consumes disk proportional to consumer lag.

Lesson 6: Never trust a CDC pipeline that has never been tested with a delete. In every production CDC deployment, the first thing to test after initial setup is whether a delete in the source database produces the expected tombstone or delete event in the downstream consumer. Deletes are the most commonly broken piece — and the most silently broken.

📌 TLDR

Change Data Capture is not a single technology — it is a different internal mechanism in every database engine, each with its own requirements and failure modes.

PostgreSQL uses logical decoding slots on the WAL. Requires wal_level = logical. Replication slot lag fills disk — mandatory monitoring.
MySQL / MariaDB uses binlog streaming. Requires binlog_format = ROW and binlog_row_image = FULL. Statement mode silently breaks CDC.
MongoDB uses the oplog via Change Streams. Requires a replica set (standalone does not work). Capped oplog can roll over, invalidating the consumer cursor.
Oracle uses LogMiner (high CPU cost, supplemental logging required) or XStream (requires GoldenGate license).
SQL Server has built-in CDC (not Change Tracking) that requires SQL Server Agent and has a time-based retention window.
Cassandra CDC accumulates log segments on disk. Deletes appear as tombstones, not structured DELETE events.
DynamoDB Streams provides managed CDC with a strict 24-hour retention window.
Warehouses (BigQuery, Redshift) and Snowflake Streams do not support external log-based CDC.
Monitoring lag — not connector status — is the difference between a recoverable outage and permanent data loss.

Change Data Capture Pattern — The pipeline side: Debezium, Kafka, snapshot handoff, schema evolution, and production failure modes
How Kafka Works — Kafka internals: topics, partitions, offsets, and consumer groups — the event bus that CDC pipelines publish to
BASE vs ACID — Consistency guarantees in distributed databases — understanding why CDC consumers need idempotent write semantics

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...

May 3, 2026•22 min read

SQL Partitioning: Range, Hash, List, and Composite Strategies Explained

TLDR: SQL partitioning divides one logical table into smaller physical child tables, all accessed through the parent table name. The query optimizer skips irrelevant child tables entirely — a process called partition pruning — turning a 30-second ful...

May 3, 2026•23 min read

Clock Skew and Causality Violations: Why Distributed Clocks Lie

TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...

May 3, 2026•18 min read

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...