All Posts

Dirty Write Explained: When Uncommitted Data Gets Overwritten

How uncommitted row overwrites corrupt cross-table state — and why row locks are the critical safeguard in every relational database.

Abstract AlgorithmsAbstract Algorithms
··33 min read
Cover Image for Dirty Write Explained: When Uncommitted Data Gets Overwritten
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: A dirty write occurs when Transaction B overwrites data that Transaction A has written but not yet committed. The result is not a rollback or an error — it is silently inconsistent committed data: one table reflects Transaction B's intent, another still reflects Transaction A's. Every mainstream relational database prevents dirty writes automatically via row-level write locks held until commit. The danger re-emerges in distributed and multi-master systems where those locks no longer span the entire write path.


🚨 One Car, Two Owners: The Auction Failure That Should Never Happen

It is 14:32:01 on a Saturday afternoon. A live car auction is in its final seconds. The prize: a 2019 BMW 3 Series, auction ID CAR-4521, with a winning bid of $28,500.

Two buyers submit their winning confirmations within 84 milliseconds of each other. Alice's browser hits the auction endpoint at 14:32:01.103. Bob's hits at 14:32:01.187. The auction service processes both in parallel threads.

Here is what happens in the database:

14:32:01.103 — Transaction T1 (Alice) begins. It writes cars.owner_id = 'alice_007' for CAR-4521. This write is not yet committed. T1 is still in progress.

14:32:01.187 — Transaction T2 (Bob) begins. It writes cars.owner_id = 'bob_042' for the same car CAR-4521 — directly overwriting Alice's uncommitted value. This is the dirty write: T2 has overwritten data that T1 has not yet committed.

14:32:01.201 — T1 continues and writes invoices.buyer_id = 'alice_007' with amount_usd = 28500 to the invoices table for INV-9901.

14:32:01.210 — T1 commits. At this point, cars.owner_id = 'bob_042' (T2's overwrite is still sitting there) and invoices.buyer_id = 'alice_007'.

14:32:01.312 — T2 commits. Final state: cars.owner_id = 'bob_042', invoices.buyer_id = 'alice_007'.

Bob owns the car. Alice's invoice says she paid $28,500 for it. Both transactions committed without error. No exception was thrown. No rollback occurred. The database is fully "consistent" by its own integrity checks — the foreign keys are valid, no NOT NULL constraints were violated. Yet the real-world state is completely wrong: Alice is billed for a car that belongs to Bob.

This is a dirty write. And it is the reason every production database acquires an exclusive row lock on the first write of a transaction and holds it until commit.


📖 What Makes a Write "Dirty": Uncommitted Data in the Crossfire

The word "dirty" in database terminology always means uncommitted. Data that has been written by a transaction but not yet committed exists in a state of uncertainty — it may be rolled back, it may be committed, but it has no permanent reality yet.

A dirty read occurs when Transaction B reads that uncertain, uncommitted data. It is looking at something that may never have existed.

A dirty write occurs when Transaction B overwrites that uncertain, uncommitted data. It is not reading something ephemeral — it is building on top of it, replacing the intermediate state before the first writer even gets a chance to finish.

The distinction matters because of what survives after commit. In a dirty read scenario, if T1 rolls back, T2 has acted on phantom data — but at least T2 can potentially detect and retry. In a dirty write scenario, both transactions can commit successfully. No one detects an error. The inconsistency is permanent, durable, and invisible to any error-handling path.

The practical difference from a normal update collision: If T1 and T2 both commit sequentially — T1 first, then T2 overwrites the committed value — that is a lost update anomaly. The timeline is clean: one committed write replaces another committed write. In a dirty write, T2 overwrites T1's in-progress write, meaning the final state is a mix of two incomplete transactions. The car table reflects T2's intent; the invoice table reflects T1's intent. No single transaction's view of the world is coherent.


⚙️ How Dirty Writes Corrupt Two Tables at Once: The Step-by-Step Mechanics

The sequence below shows exactly how the auction failure unfolds at the database level, without row-level locking protection. Each arrow is a database operation; the Notes show the intermediate state of the cars row.

The key moment is the second UPDATE by T2 Bob: it hits the cars row while T1 Alice's write is still open, overwriting it before T1 has any chance to commit or roll back. From that point on, T1 is writing to the invoices table based on the assumption that it owns the car — but the car row no longer reflects T1's intent.

sequenceDiagram
    participant T1 as T1 Alice
    participant DB as Database
    participant T2 as T2 Bob

    T1->>DB: BEGIN
    T1->>DB: UPDATE cars SET owner=alice WHERE id=CAR-4521
    Note over DB: cars row: owner=alice, uncommitted. T1 still open.
    T2->>DB: BEGIN
    T2->>DB: UPDATE cars SET owner=bob WHERE id=CAR-4521
    Note over DB: Dirty write. Bob overwrites Alice uncommitted row.
    T1->>DB: UPDATE invoices SET buyer=alice, amount=28500
    T1->>DB: COMMIT
    Note over DB: cars.owner=bob, invoices.buyer=alice. Both committed.
    T2->>DB: COMMIT
    Note over DB: Final state is permanently inconsistent across tables.

After both transactions commit, the database contains two rows that reflect contradictory business intent: the cars table says Bob owns CAR-4521, while the invoices table says Alice paid $28,500 for it. Neither rollback occurred. Neither transaction errored. The inconsistency is not detectable by querying either table in isolation — only a join across both tables reveals the contradiction.


🧠 Deep Dive: Why Cross-Table Dirty Writes Create Permanently Inconsistent State

The auction example involves two tables: cars and invoices. This is not incidental — it is what makes dirty writes particularly devastating in practice.

When a transaction spans multiple tables, its business logic depends on all writes succeeding or failing together. T1's entire purpose is to atomically assign ownership in cars and record the corresponding invoice in invoices. The transaction is the unit of consistency: both writes should reflect the same buyer, or neither should.

A dirty write breaks this atomicity at the inter-transaction level. T2 seizes the cars row mid-transaction, replacing T1's in-flight write. T1 never sees this — it has no visibility into T2's uncommitted write (just as T2 had none into T1's). T1 completes its own invoice write and commits, genuinely believing it has completed a consistent cross-table operation. T2 also commits, believing it has successfully taken ownership of the car.

The result is a commit-order mismatch: the cars row is dominated by the last writer (T2), while the invoices row is committed by T1. Their logical intents — "Alice wins" versus "Bob wins" — are split across tables, with no transaction's coherent view surviving in the final state.

This is the critical difference from anomalies that affect a single row. A phantom read or non-repeatable read affects what a transaction sees. A dirty write corrupts what the database stores — permanently, across tables, without any signal.

The Internals: How Two-Phase Locking Keeps Cross-Table Writes Serializable

Under two-phase locking (2PL), the database engine acquires an exclusive (X) lock on the row's physical storage location at the first write of a transaction — the B-tree leaf page in PostgreSQL, or the clustered index record in MySQL InnoDB. This lock is held until the transaction either commits or rolls back. No second transaction can write the same row while the lock is held.

When a business transaction writes to two tables — first cars, then invoices — it holds X locks on rows in both tables simultaneously until commit. This is the 2PL growing phase: locks accumulate and are never released early. The consistent cross-table state is guaranteed because no other transaction can write to either locked row until the full business transaction finishes.

The lock acquisition order matters for deadlock prevention. If T1 locks cars then invoices, and T2 locks invoices then cars, both may block each other in a deadlock cycle. The database detects this via a wait-for graph and rolls back the younger transaction. Consistent lock ordering across all transactions that touch these two tables eliminates the deadlock risk — a pattern called resource hierarchy locking.

Performance Analysis: The Throughput Cost of Write Contention on Hot Rows

Write serialization via 2PL has a quantifiable throughput ceiling on high-contention rows. If the cars.owner_id field is written by every bid confirmation in the final seconds of an auction, competing transactions queue behind the current lock holder. Throughput on that single row is capped at one write per average transaction duration — if each transaction takes 8ms end-to-end, no more than approximately 125 writes per second can succeed on that row regardless of hardware capacity.

This ceiling matters in three practical scenarios. First, hot rows such as auction car records, account balances, or shared counters become serialization bottlenecks under simultaneous write pressure. Second, long-running transactions that hold locks across network round trips or external API calls starve competing writers for far longer than necessary. Third, deadlocks introduce transaction retry overhead that compounds under high concurrency, reducing effective throughput further.

The engineering countermeasure is to minimize lock hold time: acquire locks as late as possible within the transaction, complete all writes atomically, and commit immediately. Deferring validation reads until inside the transaction boundary — rather than performing a read, doing application logic, then starting a transaction — keeps the write-lock window as short as possible.


🏗️ Concurrent Buyer Requests, One Database Row: The System Architecture View

The following diagram shows how the auction service's two concurrent request threads interact with the cars and invoices databases. The two UPDATE arrows converging on the Car DB node represent the dirty write — both threads are writing to the same row without serialization, producing the inconsistent committed state at the bottom.

This architecture-level view makes clear why application-layer validation is insufficient: both threads passed the business validation step (both confirmed the car was available), and both proceeded to write. The race condition lives entirely within the database write path.

graph TD
    A[Buyer A Browser] -->|POST /auction/bid at T=103ms| B[Auction Service]
    C[Buyer B Browser] -->|POST /auction/bid at T=187ms| B
    B -->|T1: UPDATE cars SET owner=alice| D[Car DB - CAR-4521 row]
    B -->|T2: overwrite owner=bob before T1 commits| D
    B -->|T1: UPDATE invoices SET buyer=alice| E[Invoice DB - INV-9901 row]
    D -->|After both commits| F[cars.owner = bob]
    E -->|After T1 commit| G[invoices.buyer = alice]
    F --> H[Permanently inconsistent state across tables]
    G --> H

The architecture diagram exposes two root causes acting simultaneously. First, the auction service makes no attempt to serialize the two bid confirmations — both threads proceed independently. Second, the database has no row-level locking enforced (in this hypothetical scenario), so T2's write succeeds even though T1 has an open, uncommitted write on the same row. Fixing either layer breaks the race condition: proper database locking would block T2 until T1 commits, and application-level serialization would prevent both threads from reaching the DB write path simultaneously.


🔒 Why Every SQL Database Blocks Dirty Writes by Default

The dirty write scenario above cannot happen in PostgreSQL, MySQL InnoDB, Oracle, or SQL Server under any isolation level above Read Uncommitted — and Read Uncommitted is the default in exactly zero production databases.

The mechanism is straightforward: exclusive row-level write locks.

When a transaction executes its first UPDATE on a row, the database engine acquires an exclusive (X) lock on that specific row. This lock is held for the entire duration of the transaction — it is not released when the write completes, only when the transaction commits or rolls back. This is two-phase locking (2PL): locks are acquired during the growing phase, and released only at the shrinking phase (commit/rollback).

When T2 attempts to write the same row that T1 already holds an X lock on, the database does not overwrite T1's uncommitted data. It blocks: T2's write sits in the lock wait queue until T1 either commits (releasing the lock, at which point T2 proceeds to overwrite the now-committed value) or rolls back (releasing the lock, at which point T2 writes to the original pre-T1 state).

In either case, T2 never touches uncommitted data. The dirty write is structurally impossible.

This is why the ANSI SQL standard's definition of Read Uncommitted is somewhat misleading: the standard allows dirty reads at this level, but real databases still acquire write locks, meaning dirty writes are prevented even at the lowest isolation level in all major engines. Read Uncommitted is a theoretical level that exists for audit/reporting workloads that explicitly tolerate stale reads — it has never been a license for dirty writes.


📊 Which Isolation Levels Prevent Dirty Writes — and Which Anomalies They Still Allow

Dirty writes sit at the absolute bottom of the anomaly severity scale: they are prevented by every practical isolation level. The table below shows the full picture across all standard levels.

Isolation LevelDirty WriteDirty ReadNon-Repeatable ReadPhantom ReadRead SkewWrite SkewLost Update
Read UncommittedPossible*PossiblePossiblePossiblePossiblePossiblePossible
Read CommittedPreventedPreventedPossiblePossiblePossiblePossiblePossible
Repeatable ReadPreventedPreventedPreventedPossible†PossiblePossiblePrevented†
Snapshot IsolationPreventedPreventedPreventedPreventedPreventedPossiblePrevented
SerializablePreventedPreventedPreventedPreventedPreventedPreventedPrevented

* In theory. In practice, even Read Uncommitted in PostgreSQL, MySQL, and SQL Server acquires write locks, making dirty writes impossible regardless of stated isolation level.

† MySQL InnoDB Repeatable Read prevents phantom reads via gap locks. PostgreSQL's Repeatable Read uses MVCC, which prevents lost updates via first-writer-wins conflict detection.

The takeaway: if you are reasoning about dirty writes in a standard relational database, the answer is always "prevented." The more interesting engineering question is which anomalies above dirty writes — write skew, phantom reads, lost updates — your application still needs to guard against.


⚖️ Choosing Write Isolation: Consistency Guarantees vs. Concurrent Throughput

Preventing dirty writes is not free. Row-level exclusive locks introduce contention, and choosing a higher isolation level compounds that cost. The table above shows what each level prevents; this section explains what each level costs in throughput terms.

Read Committed (default for PostgreSQL, Oracle, SQL Server). Acquires write locks per-statement but releases read locks immediately after each read. The lightest practical isolation. Under moderate write concurrency, lock contention is low because each transaction holds write locks only for the duration of its own writes. A dirty write cannot occur — T2 blocks on T1's uncommitted row — but write skew and lost updates remain possible. The right default for most OLTP workloads where dirty-write prevention alone is the goal.

Repeatable Read (default for MySQL InnoDB). Read locks are held for the full transaction duration. Phantom reads are additionally prevented in MySQL via gap locks on index ranges. Lock contention is higher because read locks prevent other transactions from modifying rows your transaction has already read. Use when your business logic depends on values remaining stable between a read and a subsequent write in the same transaction — for example, checking a seat count and then booking it.

Serializable. The database behaves as if all transactions ran one at a time. Implemented via predicate locking (PostgreSQL SSI) or lock escalation (classic 2PL). Under high write contention, transaction abort rates rise: PostgreSQL's Serializable Snapshot Isolation may retry or abort transactions that would violate full serializability. The right choice when preventing write skew — two concurrent bookings that both checked availability and both decided to book — is non-negotiable.

The distributed trade-off. In eventually consistent stores (Cassandra, DynamoDB without CAS), all isolation guarantees are relaxed in exchange for availability and partition tolerance under the CAP theorem. You gain write throughput by accepting that two concurrent writes to the same key may coexist — but you must add CAS guards to restore dirty-write prevention for the records where correctness is required.

Isolation LevelDirty WriteWrite SkewLost UpdateThroughput Impact
Read CommittedPreventedPossiblePossibleMinimal — write locks only
Repeatable ReadPreventedPossiblePrevented (MySQL)Moderate — read locks held
Snapshot IsolationPreventedPossiblePreventedLow — MVCC, no read locks
SerializablePreventedPreventedPreventedHighest — predicate locks or retries
Eventual Consistency (no CAS)PossiblePossiblePossibleMaximum — no locking overhead

🌍 When Dirty Writes Re-Emerge in the Real World: Distributed System Edge Cases

Within a single relational database, dirty writes are a solved problem. The challenge reappears when data is distributed across nodes, services, or stores that lack a unified lock manager.

Multi-master replication. In a multi-master setup (CouchDB, MySQL Group Replication in multi-primary mode, Cassandra with multiple coordinators), two nodes can accept conflicting writes to the same logical record concurrently. Each node acquires its local row lock, preventing dirty writes on that node, but there is no global lock across nodes. Both writes commit locally and are replicated. The conflict is detected at replication time and resolved via a last-write-wins rule, a vector clock merge, or application-level conflict resolution. The result is semantically equivalent to a dirty write at the business logic level: two partial updates coexist in the replicated state.

Cassandra and Lightweight Transactions. Cassandra's default writes are not serializable. Two concurrent writes to the same partition key with the same clustering columns will be merged by the last-write-wins rule based on client-supplied timestamps. If two coordinators accept concurrent writes within the same microsecond window, the outcome depends on timestamp ordering, not business logic ordering. Cassandra's Lightweight Transactions (LWT) use a Paxos-based compare-and-swap protocol to serialize writes to a single partition — preventing this class of dirty-write-like conflict. LWT is opt-in and carries a 4-round-trip overhead per write.

DynamoDB conditional writes. DynamoDB is eventually consistent by default. Without a ConditionExpression, two concurrent writes to the same item will silently overwrite each other — last write wins, determined by internal timestamp. A ConditionExpression adds CAS semantics: the write only succeeds if the current attribute value matches the expected value, preventing blind overwrites of intermediate state.

Microservices with separate databases. When the auction service writes to a cars database and an invoices database independently — each transaction locally ACID, but no distributed transaction coordinating them — a service crash between the two writes leaves the system in a state where one write committed and the other did not. This is not a dirty write in the strict sense, but it produces the same cross-service inconsistency. The dual-write problem requires patterns like the Outbox Pattern or Change Data Capture to guarantee atomic cross-service state updates.

The common thread: wherever there is no unified lock manager or compare-and-swap primitive guarding the full write path, some variant of dirty-write-like inconsistency can emerge.


🧭 All Seven Concurrency Anomalies: Where Dirty Write Fits in the Map

Understanding dirty writes fully requires placing them in the context of the complete anomaly taxonomy. The table below maps every known concurrency anomaly across the four dimensions that determine where it sits in the prevention hierarchy.

AnomalySame Row?Involves Uncommitted Data?Write Involved?Minimum Level to Prevent
Dirty ReadYesReads uncommitted dataNo write by readerRead Committed
Dirty WriteYesOverwrites uncommitted dataYes — overwrites in-flight writeRead Committed
Non-Repeatable ReadYesNo — both reads hit committed dataNoRepeatable Read
Lost UpdateYesNo — both writes hit committed dataYes — second write silently discards firstSerializable or atomic CAS
Read SkewMulti-objectNoNoSnapshot Isolation
Write SkewDifferent rowsNoYes — each transaction writes a different rowSerializable
Phantom ReadNo — rangeNoNoSerializable

Dirty write and dirty read are the only two anomalies that involve uncommitted data. They are prevented earliest in the isolation hierarchy (Read Committed). Every other anomaly involves only committed data but arises because of inadequate inter-transaction coordination during concurrent reads and writes. Write skew and lost update are the most insidious at the application layer because they require serializable isolation (or explicit CAS) to prevent, and most databases default to Read Committed or Repeatable Read.


🔍 Spotting Dirty-Write Symptoms Before They Become Production Incidents

In a well-configured relational database, true dirty writes do not occur. What you are more likely to encounter in production is one of three dirty-write-like symptoms that suggest either a missing isolation control or a distributed consistency gap:

Cross-table foreign key mismatches. A cars row references a buyer that does not match the invoices row for the same transaction. This is the direct fingerprint of the auction failure scenario. A scheduled reconciliation job that joins cars and invoices on transaction_id and checks cars.owner_id = invoices.buyer_id will surface these quickly.

Audit log causality violations. If your system writes to an audit log table as part of the same transaction, an inconsistency between the audit entry and the business table it audits is a strong signal of a concurrent write that violated expected ordering. The audit row was written by T1 but the business row reflects T2's overwrite.

Stale or conflicting reads in distributed stores. In Cassandra or DynamoDB, a read after a write sometimes returns the pre-write value, or a value that reflects one write but not a companion write to a related item. This is consistent with the replication-layer dirty-write-like behavior described in the distributed systems section above.

Application-level constraint exceptions on retry. If your retry logic reports that a record that "should not exist" was found, or that a conditional update failed unexpectedly, a concurrent write may have modified the row between your read and your write — a dirty-write-adjacent scenario that your CAS guard successfully caught.

Monitoring for these symptoms proactively — via reconciliation queries, structured audit log analysis, and alerting on unexpected conditional write failures — turns invisible data corruption into a detectable signal.


🧪 Practical Walkthrough: Investigating a Cross-Table Inconsistency in Production

The following walkthrough traces how a post-incident investigation would unfold after the car auction data corruption surfaces. It demonstrates the diagnostic pattern for any dirty-write-like cross-table mismatch — not a single-database edge case but a repeatable investigation procedure.

Observation phase: what the on-call engineer sees.

The billing team raises a ticket: "Invoice INV-9901 shows buyer_id = alice_007, but the car record CAR-4521 shows owner_id = bob_042. Both entries were created during the same auction event at 14:32:01." No error logs. No rollbacks in the database audit trail. Both transactions committed successfully with 200 OK responses at the application layer.

Hypothesis phase: narrowing the root cause.

A cross-table mismatch without any rollback or exception is a strong signal of a concurrent write race rather than a data corruption bug or application logic error. The engineer looks for three conditions that confirm the dirty-write pattern: (1) concurrent transaction timestamps within the same sub-second window, (2) a shared row touched by both transactions — the cars row for CAR-4521 — and (3) evidence that one transaction's write was overwritten before the first transaction committed.

Timeline reconstruction using database and application logs.

PostgreSQL's pg_stat_activity and slow query log, MySQL's general query log, and distributed tracing systems such as Jaeger or Zipkin can reconstruct the per-transaction timeline. The critical data points are: when each transaction began, which specific rows each transaction wrote, and when each transaction committed relative to the other. A timeline showing T2's UPDATE cars executing before T1's COMMIT — within the same sub-second window — is the definitive fingerprint of the dirty-write race.

Root cause classification.

Three distinct root causes produce this fingerprint in production: (1) a write issued outside proper transaction management — a migration script or bulk-load job that bypassed the application's transaction wrapper and issued bare writes, allowing T2 to write without waiting for T1's lock; (2) a read from an eventually consistent secondary (replica, materialized view, distributed cache) that served both T1 and T2 a stale "car available" state, allowing both to enter the write path simultaneously; (3) a distributed write across two separate services where each service's local database enforced isolation independently, but no cross-service lock or CAS guard coordinated the two-service write path.

Remediation verification.

After applying the fix — enabling row-level locking in the correct transaction scope, adding a CAS conditional write, or introducing the Outbox Pattern for cross-service coordination — the reconciliation check across cars and invoices should return zero mismatched rows. Monitoring the conditional write failure rate (for CAS-based fixes) serves as a leading indicator under load: a spike in CAS failures during bid storms confirms the fix is working and catching the races that previously produced silent corruption.


🛠️ PostgreSQL, MySQL InnoDB, and DynamoDB: How Each Database Blocks Dirty Writes

Understanding the locking mechanism at the database level — not just the isolation level name — helps you reason about edge cases and configure distributed systems correctly.

PostgreSQL: Exclusive Row Locks via 2PL

PostgreSQL acquires an ExclusiveLock at the tuple (row) level on the first UPDATE or DELETE within a transaction. This lock is held until COMMIT or ROLLBACK. Any second transaction attempting to write the same tuple blocks immediately and waits in the lock queue.

The following transaction pair shows the blocking behavior. T2's UPDATE will not execute until T1 commits:

-- T1: acquires ExclusiveLock on CAR-4521 at first UPDATE
BEGIN;
UPDATE cars SET owner_id = 'alice_007' WHERE car_id = 'CAR-4521';

-- T2 (concurrent session): blocks here until T1 commits or rolls back
-- UPDATE cars SET owner_id = 'bob_042' WHERE car_id = 'CAR-4521';

-- T1 continues with invoice write
UPDATE invoices
  SET buyer_id = 'alice_007', amount_usd = 28500
WHERE invoice_id = 'INV-9901';

COMMIT;
-- ExclusiveLock released. T2 unblocks and sees owner_id = 'alice_007' (committed).

You can inspect active row locks at any point with:

SELECT pid, relation::regclass, mode, granted
FROM pg_locks
WHERE relation = 'cars'::regclass;

PostgreSQL's lock manager grants the lock to T2 only after T1's commit. T2 then writes over a committed value — which is the normal, safe last-write-wins update, not a dirty write.

MySQL InnoDB: Record Locks on the Clustered Index

InnoDB acquires an exclusive record lock (X lock) on the clustered index entry (primary key) for every row touched by a write statement. Like PostgreSQL, this lock is held for the full transaction duration.

-- Set isolation level explicitly (READ COMMITTED is sufficient to prevent dirty writes)
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- T1: InnoDB acquires X lock on primary key CAR-4521
START TRANSACTION;
UPDATE cars SET owner_id = 'alice_007' WHERE car_id = 'CAR-4521';

-- T2 (concurrent session): LOCK WAIT — InnoDB places T2 in the lock wait queue
-- SHOW ENGINE INNODB STATUS will show:
-- TRANSACTION ... LOCK WAIT 1 lock struct(s) ...
-- waiting for this lock to be granted: RECORD LOCKS ... index PRIMARY of table cars

COMMIT;
-- T2 proceeds after T1 commits. No dirty write.

InnoDB's innodb_lock_wait_timeout (default 50 seconds) controls how long T2 waits before returning a lock timeout error. For high-concurrency auction systems, tuning this value — and adding retry logic at the application layer — is an important operational consideration.

DynamoDB: Conditional Writes for Compare-and-Swap

DynamoDB does not use traditional row-level locks. By default, two concurrent PutItem or UpdateItem calls on the same primary key are both accepted; the last one wins based on internal timestamp ordering. To prevent a dirty-write-like blind overwrite, use a ConditionExpression that enforces a compare-and-swap (CAS) guard:

-- First buyer: acquire ownership only if the item has no owner yet
ConditionExpression: "attribute_not_exists(owner_id)"

-- Subsequent buyer: the condition fails; DynamoDB returns ConditionalCheckFailedException
-- The application retries or returns an error — no silent overwrite occurs

For the auction scenario where you need to guarantee that exactly one buyer wins, the attribute_not_exists(owner_id) condition ensures that the second concurrent write fails atomically, preventing both the dirty-write-like overwrite and the cross-table inconsistency it would cause.

DynamoDB Transactions (TransactWriteItems) extend this to the cross-table case: you can atomically update the cars item and the invoices item in a single transaction with coordinated condition checks, giving you the equivalent of the relational database's two-table atomic commit.


📚 Lessons Learned from Real Dirty-Write Incidents

1. Default isolation levels in relational databases silently prevent dirty writes — but that protection ends at the database boundary. Every SQL database at Read Committed or above blocks dirty writes. The failure mode engineers miss is not within the database — it is when writes span two services, two databases, or two nodes without a unified lock manager.

2. "Both transactions committed successfully" is not proof of consistency. The car auction scenario ends with two successful COMMITs and permanent data corruption. Error-free execution does not mean correct execution when concurrent writes overlap across tables.

3. Application-layer validation cannot substitute for database-level write serialization. Both auction service threads checked car.available == true before writing. Both saw true. The race condition lives after the check, in the window between the read and the write. Only the database's lock-based serialization closes that window.

4. Multi-master replication trades dirty-write prevention for availability. When you enable multi-master replication or use an eventually consistent store without CAS, you are explicitly accepting a weaker consistency model. The trade-off is valid for some workloads (analytics, logging, catalog reads) and catastrophic for others (financial transactions, inventory reservations, auction ownership).

5. CAS (Compare-and-Swap) is the distributed equivalent of a row-level write lock. In any system without a global lock manager — DynamoDB, Cassandra LWT, Redis WATCH/MULTI, etcd transactions — CAS operations are the mechanism that prevents dirty-write-like concurrent overwrites. Adopting a distributed store without understanding its CAS offering is the most common path to re-introducing the anomaly that relational databases solved decades ago.

6. Cross-table consistency requires explicit coordination, not assumed atomicity. Writing to cars and then to invoices in two separate statements inside one transaction works because both writes share the same transaction boundary. As soon as those tables live in separate services or databases, that atomicity guarantee disappears. The Saga, Outbox, and CDC patterns exist specifically to restore it.


📌 Summary & Key Takeaways

A dirty write is the result of Transaction B overwriting Transaction A's uncommitted data before T1 finishes. Because both transactions ultimately commit, the resulting inconsistency is permanent: different tables reflect the intents of different transactions, with no error, no rollback, and no automatic detection.

Every mainstream relational database prevents dirty writes through exclusive row-level locks, acquired at the first write of a transaction and held until commit. This protection is active at all practical isolation levels — Read Uncommitted exists in the standard but is not the default in any production database engine.

The anomaly re-emerges in distributed systems: multi-master replication conflicts, eventually consistent stores without CAS, and microservice architectures that cross database boundaries without distributed transaction coordination. In these contexts, the solution is always some form of CAS semantics — whether via DynamoDB's ConditionExpression, Cassandra's Lightweight Transactions, or the Outbox Pattern for cross-service writes.

When debugging data inconsistencies in production, the signature of a dirty-write-like problem is a committed cross-table mismatch: two rows that should agree on the same business fact (the auction winner) but do not. Proactive reconciliation queries and conditional write monitoring are the detection layer; the right isolation level and locking strategy are the prevention layer.


📝 Practice Quiz

Test your understanding of dirty writes, locking mechanics, and distributed consistency.


  1. What is the defining characteristic of a dirty write?

    • A) Transaction B reads data that Transaction A has not yet committed
    • B) Transaction B overwrites data that Transaction A has written but not yet committed
    • C) Two transactions commit in the wrong order, reversing business logic
    • D) A transaction reads a stale snapshot from before a checkpoint

    Correct Answer: B


  1. In the car auction scenario, which operation triggers the dirty write?

    • A) T1 Alice issuing a COMMIT statement
    • B) T2 Bob's UPDATE overwriting Alice's uncommitted row on cars
    • C) T1 Alice's SELECT confirming the car was available
    • D) T1 Alice writing to the invoices table

    Correct Answer: B


  1. What is the minimum isolation level that prevents dirty writes in the ANSI SQL standard?

    • A) Read Uncommitted
    • B) Read Committed
    • C) Repeatable Read
    • D) Serializable

    Correct Answer: B


  1. How does a row-level exclusive lock prevent a dirty write from occurring?

    • A) It prevents any transaction from reading the locked row until it commits
    • B) It forces the second writer to wait in a lock queue until the first transaction commits or rolls back
    • C) It automatically rolls back the conflicting transaction with a deadlock error
    • D) It copies the row to a shadow buffer so both writes can coexist safely

    Correct Answer: B


  1. What makes a dirty write more dangerous than a dirty read in terms of data durability?

    • A) Dirty writes require serializable isolation to prevent; dirty reads only need Read Uncommitted disabled
    • B) Dirty writes produce permanently inconsistent committed state across tables; dirty reads can be discarded on rollback
    • C) Dirty writes only affect single rows; dirty reads affect entire table scans
    • D) Dirty reads cause index corruption; dirty writes only affect transaction logs

    Correct Answer: B


  1. Looking at the seven-anomaly comparison table, which anomaly shares the most properties with a dirty write?

    • A) Phantom Read — also involves uncommitted data across a range of rows
    • B) Dirty Read — also involves uncommitted data on the same row, but with a read instead of a write
    • C) Write Skew — also involves two writes to different rows without serialization
    • D) Read Skew — also involves multi-object reads of inconsistent state

    Correct Answer: B


  1. A distributed auction system uses Cassandra as its primary store. By default, without additional configuration, which behavior is Cassandra most likely to exhibit under concurrent writes to the same car record?

    • A) Blocking the second writer in a lock queue until the first write is acknowledged
    • B) Applying last-write-wins based on client timestamp, potentially overwriting the first write silently
    • C) Throwing a ConditionalCheckFailedException and rejecting the second write
    • D) Applying a Paxos-based consensus round to serialize the writes automatically

    Correct Answer: B


  1. Which DynamoDB mechanism provides the closest functional equivalent to a relational database's row-level exclusive write lock?

    • A) DynamoDB Streams, which capture all writes for downstream processing
    • B) A ConditionExpression that checks the current attribute value before allowing the write to proceed
    • C) DynamoDB's partition key uniqueness constraint, which prevents duplicate primary keys
    • D) Provisioned throughput capacity units, which throttle concurrent writes automatically

    Correct Answer: B


  1. A team migrates from a single PostgreSQL database to a microservices architecture with two separate PostgreSQL databases: one for cars and one for invoices. Each individual database is configured at Serializable isolation. Which statement is most accurate?

    • A) Serializable isolation across both databases automatically prevents dirty-write-like cross-service inconsistency
    • B) Serializable isolation prevents anomalies within each database, but cross-service writes still lack a unified transaction boundary and require the Outbox Pattern or Saga to stay consistent
    • C) The foreign key constraint between cars.owner_id and invoices.buyer_id enforces cross-database consistency automatically
    • D) The two-phase commit (2PC) protocol is built into PostgreSQL and activates automatically when two separate databases are involved in the same application request

    Correct Answer: B


  1. (Open-Ended) A ride-sharing platform uses a microservices architecture with separate Trip Service and Driver Service, each backed by its own PostgreSQL database at Read Committed isolation. A driver accepts a trip request at the same moment the Trip Service marks the trip as cancelled. Describe the conditions under which a dirty-write-like cross-service inconsistency could result in a permanently inconsistent state — where the Trip Service records the trip as cancelled but the Driver Service records the driver as assigned to it. What would the cross-table symptom look like, and which architectural pattern would you recommend to prevent it?

Explore the broader context of database anomalies, isolation levels, and distributed consistency:

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms