Database Anomalies: How SQL and NoSQL Handle Dirty Reads, Phantom Reads, and Write Skew

A deep comparison of how relational and NoSQL databases detect, prevent, and accept different classes of data anomalies

System Design Interview Prep

Abstract Algorithms

·Apr 5, 2026·29 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 29 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Database anomalies are the predictable side-effects of concurrent transactions — dirty reads, phantom reads, write skew, and lost updates. SQL databases use MVCC and isolation levels to prevent them; PostgreSQL's Serializable Snapshot Isolation (SSI) stops even write skew. NoSQL databases — Cassandra, DynamoDB, MongoDB, Cosmos DB — deliberately accept some anomalies in exchange for throughput and availability. The right choice is not "which database is safer" but "which anomalies can your workload afford to accept."

📖 The Banking Scenario That Exposes Every Hidden Database Assumption

Imagine a bank account with a balance of $200. Two wire transfers are initiated at the same moment: Alice wants to send $150 to her landlord, and Bob's payroll system is deducting $100 for a loan payment. Both transactions read the balance — both see $200 and decide there is enough money. Both proceed. Alice's transfer writes the new balance as $50. Bob's transfer writes it as $100. The database commits both. The account ends up at $100 — but it should be at -$50 with one transfer rejected.

No error was thrown. No hardware failed. Both transactions followed correct application logic. The problem is a database anomaly: an undesired outcome caused not by a bug in any single transaction, but by the interaction between two concurrent ones.

This specific scenario is called a lost update. It is one of seven well-defined anomaly classes that emerge when databases allow multiple transactions to run simultaneously without coordinating their reads and writes. These anomalies range from inconvenient (reading a slightly stale value) to catastrophic (a hotel booking system that allows two guests to occupy the same room on the same night).

Every database — relational or NoSQL, single-node or globally distributed — makes explicit or implicit choices about which anomalies it will prevent and which it will allow in exchange for performance or availability. Understanding those choices is one of the most practically useful skills in system design. It tells you when to reach for a serializable isolation level, when a lower isolation level is safe, and when a NoSQL database's "eventual consistency" is perfectly acceptable for your workload.

This post defines all seven anomaly classes, shows exactly what each major database guarantees, and gives you a decision framework for choosing the right tool.

🔍 ACID and Isolation: The Contract That Keeps Concurrent Transactions Honest

Before cataloguing anomalies, we need to understand the contract that prevents them: ACID.

Atomicity: A transaction either commits all its writes or rolls all of them back. There is no partial commit.
Consistency: A transaction transitions the database from one valid state to another. Application-defined invariants — like "account balance must never go negative" — must hold before and after every transaction.
Isolation: Concurrent transactions behave as if they ran serially. One transaction's intermediate state is not visible to another.
Durability: Once a transaction commits, its writes survive node failures.

The isolation property is where anomalies live. Full isolation — meaning every transaction sees only committed data and runs as if it had exclusive access to the database — is called serializability. It is the strongest guarantee, and it is expensive. Databases offer weaker isolation levels that trade anomaly safety for concurrency and throughput.

The SQL standard defines four isolation levels from weakest to strongest:

Isolation Level	What changes between consecutive reads?
Read Uncommitted	A transaction can read another transaction's uncommitted writes
Read Committed	A transaction reads only committed data, but the data can change between two reads
Repeatable Read	A row read twice within a transaction returns the same value — but new rows can appear
Serializable	Full isolation; all anomalies prevented

NoSQL databases often operate outside this four-level framework entirely. They may offer consistency levels (DynamoDB's TransactWriteItems, Cosmos DB's five consistency tiers) or per-partition guarantees (Cassandra's Lightweight Transactions) that map only loosely onto SQL isolation levels. Understanding what those guarantees actually mean for your data requires a bottom-up look at the anomalies themselves.

⚙️ The Seven Core Database Anomalies That Break Production Systems

Each anomaly below has a precise definition, a timeline showing how it occurs, and a concrete consequence in a real system.

Dirty Read

What happens: Transaction T2 reads a row that Transaction T1 has written but not yet committed. If T1 then rolls back, T2 has acted on data that was never durably stored — data that never officially existed.

Concrete scenario:

The following sequence diagram shows T1 writing an account balance that T2 reads before the commit. T1 then rolls back, leaving T2 holding a value that no longer exists in the database.

sequenceDiagram
    participant T1 as Transaction 1
    participant DB as Database
    participant T2 as Transaction 2
    T1->>DB: BEGIN
    T1->>DB: UPDATE account SET balance=50 WHERE id=1
    Note over DB: Row written, not yet committed
    T2->>DB: BEGIN
    T2->>DB: SELECT balance FROM account WHERE id=1
    DB-->>T2: Returns 50 (dirty read)
    T1->>DB: ROLLBACK
    Note over DB: Balance restored to original 100
    Note over T2: T2 acted on value 50 which never committed

T2 read the balance as 50, perhaps approved a second wire transfer of $40 based on that, and is now holding the wrong view of the world. The actual committed balance was never 50.

Production impact: Dirty reads can cause money to be transferred based on rolled-back debits, inventory to be reserved against cancelled orders, or approval decisions to be made on data that was never durably written.

Non-Repeatable Read

What happens: Transaction T1 reads a row, then reads it again. Between those two reads, T2 updates and commits the row. T1 gets a different value on its second read even though it is still in the same transaction.

Production impact: Audit logs that read a record twice in one transaction end up with inconsistent entries. Price calculations that read a product price at the start of checkout and again at payment confirmation can yield different totals if a price update commits in between.

Phantom Read

What happens: T1 queries a set of rows matching a predicate — say, all orders placed today. T2 inserts a new order and commits. T1 re-runs the same query and now gets a different count. No existing row changed; a new row appeared.

Production impact: A report counting active sessions or pending orders produces different totals within a single batch operation. An access-control check that lists all users with admin privileges can miss a newly inserted admin.

Read Skew

What happens: T1 reads Account A (balance 100), then reads Account B (balance 200). Between those two reads, T2 moves $50 from A to B — setting A to 50 and B to 250 and committing. T1 has now seen A's old value and B's new value. Its combined view shows a total of 300, but it is reading two different moments in time: A from before the transfer and B from after.

This is sometimes called the Alice and Bob problem in Martin Kleppmann's Designing Data-Intensive Applications. The classic example involves two users whose combined on-call shift must always cover the hospital. Each user sees a consistent snapshot of their own record, but neither sees a consistent snapshot of the system as a whole.

Production impact: Financial reports showing an incorrect total balance. Inventory reconciliation that counts items across multiple tables and gets an inconsistent sum. Any cross-entity aggregate that reads from multiple rows or tables in sequence.

Write Skew

What happens: T1 and T2 both read a shared condition — say, "how many doctors are currently on call?" Both see that the invariant is satisfied (at least one doctor on call). Both decide to take an action that modifies a different row: T1 marks Alice as off-call, T2 marks Bob as off-call. Both commit. The invariant — at least one doctor must always be on call — is now broken, even though every individual write was locally valid.

The following diagram shows this scenario with two concurrent transactions, each reading a safe state and writing a different row, arriving at an unsafe combined state.

sequenceDiagram
    participant A as Alice Transaction
    participant DB as Database
    participant B as Bob Transaction
    A->>DB: BEGIN
    B->>DB: BEGIN
    A->>DB: SELECT count(*) FROM doctors WHERE on_call=true
    DB-->>A: Returns 2 (both on call)
    B->>DB: SELECT count(*) FROM doctors WHERE on_call=true
    DB-->>B: Returns 2 (both on call)
    A->>DB: UPDATE doctors SET on_call=false WHERE name='Alice'
    B->>DB: UPDATE doctors SET on_call=false WHERE name='Bob'
    A->>DB: COMMIT
    B->>DB: COMMIT
    Note over DB: Invariant broken: 0 doctors on call

Write skew is the most insidious anomaly because it cannot be prevented by row-level locks alone. The conflict is on a predicate: the aggregate count of rows satisfying a condition. T1 and T2 write to different rows, so they never conflict at the row level. Prevention requires either serializable isolation or an explicit predicate lock covering the range query.

Production impact: Meeting room bookings where two people both check that the room is free and both proceed to book it. Double-spending in a wallet system where two withdrawals both check the balance and both execute. Any "check then act" pattern on a shared aggregate.

Lost Update

What happens: T1 and T2 both read a counter (value: 10). T1 computes 11 and writes it. T2 also computed 11 and writes it. One increment is lost — the counter should be 12 but is 11.

Prevention options: SELECT FOR UPDATE (acquires a row lock so T2 must wait), atomic increment operations (UPDATE counter SET value = value + 1), or optimistic locking via compare-and-swap (UPDATE counter SET value=11 WHERE value=10).

Production impact: Like counts, inventory decrement, concurrent seat bookings. Any read-modify-write operation where two transactions race.

Dirty Write

What happens: T1 writes to a row (not yet committed). T2 also writes to the same row and commits. T1 then rolls back — but T1's rollback restores the row to the value before T1's write, which means it overwrites T2's committed value. T2's committed change is silently lost.

Production impact: Dirty writes can corrupt application-level invariants that span multiple rows. If a purchase transaction writes both a billing record and an order record, a dirty write on the billing record during another transaction's rollback can leave an order record with no corresponding billing entry.

🧠 Deep Dive: How Databases Detect and Stop Anomalies Before They Corrupt Data

The Internals: MVCC Row Versioning, Snapshot Reads, and Lock Acquisition

The dominant technique for preventing read anomalies without blocking all concurrent reads is Multi-Version Concurrency Control (MVCC). Instead of storing a single current version of each row, MVCC stores multiple versions, each tagged with the transaction that created or deleted it.

In PostgreSQL, every row has two hidden system columns:

xmin: the transaction ID that inserted this version of the row
xmax: the transaction ID that deleted or updated this row (zero if the row is still current)

When a transaction begins, it receives a transaction ID (XID) and takes a snapshot of which XIDs are currently in-flight (started but not committed). The visibility rule is: a row version is visible to a transaction if its xmin is committed and less than the transaction's XID, and its xmax is either zero or not yet committed when the snapshot was taken.

This means reads never block writes and writes never block reads — each transaction is reading from its own consistent snapshot of the past. The consequences for anomaly prevention are direct:

Dirty reads are impossible because an uncommitted xmin is never considered visible.
Non-repeatable reads depend on when the snapshot is taken. Under Read Committed, each statement gets a fresh snapshot, so a row committed by T2 between two reads in T1 becomes visible on the second read. Under Repeatable Read, the snapshot is taken once at transaction start, so the same row is always read from the same snapshot — non-repeatable reads and phantom reads are both eliminated.

Phantom reads in range queries require an additional mechanism. A snapshot read prevents you from seeing new rows that committed after your snapshot, which prevents phantoms. However, MySQL InnoDB's Repeatable Read adds gap locks and next-key locks — a row lock plus a lock on the gap above the row — to prevent phantom reads even in the presence of inserts that target gaps in the locked range.

Write skew is fundamentally different. It arises from a read-write dependency that MVCC alone cannot detect. Transaction T1 reads a predicate (e.g., a count), makes a decision based on that read, and writes to a row that is not the row T2 also read and wrote. Because the rows involved in T1's read and T2's write are different, there is no version conflict to detect. MVCC sees no conflict. Both transactions commit. The invariant is broken.

Preventing write skew requires detecting the read-write anti-dependency cycle: T1 read data written by T2's transaction start → T2 read data written by T1's transaction start → both committed. PostgreSQL's Serializable Snapshot Isolation tracks these dependencies using SIReadLock structures and aborts one transaction in the cycle when a cycle is detected.

Performance Analysis: The Cost of Moving Up the Isolation Ladder

Stronger isolation costs more. The performance trade-offs are real and must be understood before choosing an isolation level:

Isolation Level	Concurrency overhead	When it stalls
Read Committed	Near-zero for reads	Writers briefly block conflicting writers
Repeatable Read (Postgres)	Snapshot held in memory	Transaction aborts on write-write conflict
Repeatable Read (MySQL)	Gap locks + next-key locks	Range queries lock gaps, blocking insertions
Serializable (Postgres SSI)	SIReadLock tracking + cycle detection	Abort and retry on rw-anti-dependency cycle
Serializable (MySQL)	All reads become `LOCK IN SHARE MODE`	High read-lock contention under heavy concurrency

PostgreSQL's SSI is notably more efficient than traditional two-phase locking (2PL) because it does not hold locks for the duration of the transaction — it tracks read sets and detects dangerous cycles only at commit time. Under low conflict workloads, SSI's overhead compared to Snapshot Isolation (Repeatable Read) is often under 10%. Under high write skew workloads, abort rates climb and throughput degrades.

The practical rule: default to Read Committed for most OLTP workloads. Move to Repeatable Read when you need consistent within-transaction aggregates. Move to Serializable only when your invariant spans multiple rows and write skew is a real risk for your access pattern.

📊 SQL Isolation Levels vs. Anomaly Prevention: PostgreSQL and MySQL InnoDB

The table below maps each isolation level to the anomalies it prevents or allows for the two dominant SQL MVCC implementations. "Possible" means the anomaly can occur at that level. "Prevented" means the database guarantees it cannot occur.

Anomaly	PG Read Committed	PG Repeatable Read	PG Serializable	MySQL RR (InnoDB)	MySQL Serializable
Dirty Read	Prevented	Prevented	Prevented	Prevented	Prevented
Non-Repeatable Read	Possible	Prevented	Prevented	Prevented	Prevented
Phantom Read	Possible	Prevented	Prevented	Prevented (gap lock)	Prevented
Read Skew	Possible	Prevented	Prevented	Prevented	Prevented
Write Skew	Possible	Possible	Prevented (SSI)	Possible	Prevented
Lost Update	Possible	Possible (auto-detect in some cases)	Prevented	Possible	Prevented
Dirty Write	Prevented	Prevented	Prevented	Prevented	Prevented

A few details worth understanding:

PostgreSQL Repeatable Read vs. MySQL Repeatable Read: Both use MVCC snapshots, but MySQL InnoDB's Repeatable Read adds gap locks to prevent phantoms that PostgreSQL Repeatable Read already prevents via snapshot visibility. MySQL's approach introduces more blocking under concurrent inserts into a locked range.

PostgreSQL Serializable uses SSI (not 2PL): PostgreSQL does not implement serializability by acquiring explicit read locks on every row. Instead, it tracks read-write dependencies in memory (SIReadLock) and detects dangerous cycles at commit time. This means a serializable transaction in PostgreSQL may run at full speed and only abort at the last moment if a cycle is detected. It also means serializable workloads that have no actual write skew risk pay very little overhead.

MySQL Serializable uses 2PL: Under Serializable isolation, MySQL converts every SELECT into SELECT ... LOCK IN SHARE MODE. This acquires shared read locks immediately and holds them until the transaction ends. Under heavy read concurrency, this can severely limit write throughput.

The following flowchart shows the decision path for choosing a PostgreSQL isolation level:

flowchart TD
    A[Starting a transaction] --> B{Do you need to prevent dirty reads?}
    B -->|Already prevented at all levels| C{Do reads need to be stable within the transaction?}
    C -->|No - reads can see new commits| D[Use Read Committed - default]
    C -->|Yes - same row must return same value| E{Does the transaction check a cross-row invariant?}
    E -->|No - single-entity reads only| F[Use Repeatable Read]
    E -->|Yes - aggregate or predicate check drives a write| G[Use Serializable - SSI]
    G --> H[Handle serialization failure with retry]

Read the flowchart from top to bottom. Most application transactions land on Read Committed. Serializable is reserved for workflows that explicitly check a shared condition and write based on it — the exact pattern that enables write skew.

🌍 How MongoDB, Cassandra, DynamoDB, and Cosmos DB Approach Anomaly Prevention

NoSQL databases were built to scale horizontally across commodity hardware and to favour availability over strict consistency. That design choice has direct consequences for anomaly prevention.

MongoDB: Document Atomicity With Optional Multi-Document Transactions

MongoDB guarantees atomic operations on a single document always — even without transactions. A document update that modifies multiple fields is either fully applied or not applied at all. This eliminates dirty writes and lost updates within a single document.

For operations that span multiple documents (or multiple collections), MongoDB v4.0 introduced multi-document transactions with Read Committed and Snapshot isolation levels. Using { readConcern: { level: 'snapshot' } } inside a session gives you point-in-time consistency across documents:

const session = client.startSession();
session.startTransaction({
  readConcern: { level: 'snapshot' },
  writeConcern: { w: 'majority' }
});
// All reads in this session see the same snapshot
// All writes are all-or-nothing
await session.commitTransaction();

What MongoDB prevents by default (single documents): dirty reads (with w: majority), dirty writes, lost updates.

What requires explicit transactions: read skew (reading across two documents), write skew (two transactions both read a shared aggregate and write to different documents), phantom reads in multi-document queries.

What MongoDB does not offer: predicate locks for range queries. A find({ status: 'pending' }) followed by an update to a matching document can race with another insert of a new pending document in a way that MongoDB snapshot isolation does not prevent.

Cassandra: Throughput First, Transactions by Exception

Cassandra has no multi-row transactions in its base data model. By design. Each write operation targets a single partition and is applied by the coordinator node using a last-write-wins (LWW) model. This means:

Dirty writes: Prevented within a partition because Cassandra uses quorum writes and a last-write-wins policy — there is no concept of an uncommitted write.
Read skew: Accepted. A read across two partitions has no snapshot guarantee. The two partitions can reflect writes from different points in time.
Write skew: Accepted. There is no mechanism to atomically read a cross-partition condition and conditionally write based on it.
Phantom reads: Accepted. Range queries across partitions have no predicate lock.
Lost updates: Accepted at the row level unless you use Lightweight Transactions.

Cassandra's Lightweight Transactions (LWT) use the Paxos consensus algorithm to provide compare-and-set semantics on a single partition:

INSERT INTO seats (id, holder) VALUES (5, 'alice') IF NOT EXISTS;
UPDATE accounts SET balance = 150 WHERE id = 1 IF balance = 200;

LWT prevents lost updates and double-booking within a single partition. It is Cassandra's only mechanism for conditional writes and it comes at significant latency cost — a round of Paxos adds 2–4 network round trips. For most high-throughput Cassandra use cases, LWT is used sparingly for idempotency guarantees on critical paths, not as a general transaction mechanism.

The Cassandra documentation explicitly states that "Cassandra does not support multi-partition transactions. LWT is limited to single-partition operations." This is not an oversight; it is the architectural trade-off that enables Cassandra's linear write scalability across hundreds of nodes.

DynamoDB: Single-Item Atomicity Plus Transactional API

DynamoDB guarantees atomic operations on a single item always. For multi-item operations, DynamoDB's TransactWriteItems provides serializable isolation across up to 25 items in a single transaction:

{
  "TransactItems": [
    {
      "Update": {
        "TableName": "Accounts",
        "Key": { "id": { "S": "account-1" } },
        "UpdateExpression": "SET balance = balance - :amount",
        "ConditionExpression": "balance >= :amount",
        "ExpressionAttributeValues": { ":amount": { "N": "100" } }
      }
    },
    {
      "Update": {
        "TableName": "Accounts",
        "Key": { "id": { "S": "account-2" } },
        "UpdateExpression": "SET balance = balance + :amount",
        "ExpressionAttributeValues": { ":amount": { "N": "100" } }
      }
    }
  ]
}

The ConditionExpression on the debit prevents a lost update — if the balance is too low, the entire transaction fails atomically. The pair of updates is serializable: no other transaction sees account-1 debited without account-2 credited.

What DynamoDB prevents with TransactWriteItems: dirty reads, dirty writes, lost updates, read skew within the transaction scope.

What DynamoDB accepts: Phantom reads in Scan operations without transactions. Read skew across non-transactional reads. DynamoDB has no predicate locking — a Scan on items matching a filter and a concurrent Put of a new matching item can produce phantom reads outside of a transaction.

Cosmos DB: Five Consistency Levels That Map Directly to Anomaly Exposure

Cosmos DB is unique in that it exposes its consistency model as an explicit configuration choice with five levels. The level you choose determines exactly which anomalies are possible.

Consistency Level	Anomaly Exposure
Strong	No anomalies. Linearizable reads. Highest latency and cost.
Bounded Staleness	Reads may see data up to K versions or T seconds old. Phantoms and read skew possible across the staleness window.
Session	Within your session: read-your-writes, monotonic reads. Cross-session: read skew and phantoms possible.
Consistent Prefix	Writes are seen in order, but reads may lag. No dirty reads. Stale reads, read skew, and phantoms possible.
Eventual	All anomalies except dirty writes are possible. Maximum throughput.

Multi-document ACID transactions in Cosmos DB are supported within a single logical partition, either via stored procedures (JavaScript running atomically on the server) or via the SDK's transactional batch API. Cross-partition transactions are not natively supported — you must implement the Saga pattern or use an application-level distributed transaction protocol.

Full Anomaly Comparison Across All Six Systems

Anomaly	PostgreSQL Serializable	PostgreSQL RR	MySQL Serializable	MongoDB (Snapshot Tx)	Cassandra (LWT)	DynamoDB (TransactWrite)	Cosmos DB Strong	Cosmos DB Eventual
Dirty Read	Prevented	Prevented	Prevented	Prevented	Prevented	Prevented	Prevented	Prevented
Non-Repeatable Read	Prevented	Prevented	Prevented	Prevented	Possible	Prevented	Prevented	Possible
Phantom Read	Prevented	Prevented	Prevented	Partial	Possible	Possible (scan)	Prevented	Possible
Read Skew	Prevented	Prevented	Prevented	Prevented (in tx)	Possible	Prevented (in tx)	Prevented	Possible
Write Skew	Prevented	Possible	Prevented	Possible	Possible	Prevented (in tx)	Prevented	Possible
Lost Update	Prevented	Possible	Prevented	Prevented (in tx)	Partial (LWT)	Prevented (CAS)	Prevented	Possible

"Partial" means the anomaly is prevented within certain boundaries (single partition, single document, single transaction) but not universally.

⚖️ Consistency vs. Throughput: What Every NoSQL Trade-off Actually Costs You

Understanding the anomaly table above is only half the picture. The other half is understanding why NoSQL databases accept these anomalies and what they gain in exchange.

Cassandra's trade-off is availability under network partition. Under the CAP theorem, Cassandra chooses AP: it remains available and accepts writes on both sides of a network partition, reconciling data via read repair and Merkle trees after the partition heals. This means there is no single point of truth during a partition — two coordinators can accept concurrent writes to the same partition and produce last-write-wins semantics. Preventing write skew in this environment would require a consensus round that is incompatible with Cassandra's availability guarantee.

DynamoDB's trade-off is operational simplicity at global scale. DynamoDB is a multi-region key-value store that provides consistent single-item operations and optional multi-item transactions — but only across items that fit within the 25-item limit and within the throughput capacity of the target partition group. Cross-table transactions exist but incur additional latency and 2x the write capacity unit cost. The design reflects a deliberate choice: make the common case (single-item atomic writes) extremely fast and cheap, and make the exceptional case (multi-item transactions) available but priced to discourage overuse.

MongoDB's trade-off is flexibility. The default document model encourages denormalization — embedding related data in a single document to make single-document atomicity sufficient for most operations. Multi-document transactions were added for cases where denormalization is not feasible. Their existence does not change the fact that using them widely in a schema that was not designed for it produces poor performance — MongoDB's transaction overhead is higher than PostgreSQL's because it requires distributed coordination across replica sets.

Cosmos DB's trade-off is user-configurable. By exposing five consistency levels, Cosmos DB lets you make the trade-off explicitly at the operation level. A shopping cart read can use Session consistency for fast, read-your-writes behavior. A payment ledger read can use Strong consistency to prevent all anomalies. The cost is paying for both the global replication infrastructure and the higher per-operation latency when Strong is selected.

🧭 Choosing the Right Database for Your Anomaly Tolerance

The right database choice depends on answering two questions first: which anomalies does your workload generate? And which anomalies can your workload afford to accept?

Use Case	Anomaly Risk	Recommended Approach
Financial ledger, double-spend prevention	Write skew, lost update	PostgreSQL Serializable or DynamoDB TransactWriteItems
On-call scheduling, seat booking	Write skew	PostgreSQL Serializable or Cosmos DB Strong within a partition
User session state, shopping cart	Non-repeatable read acceptable	PostgreSQL Read Committed, MongoDB default, DynamoDB single-item
Event log, time-series append	No reads during write; last-write-wins acceptable	Cassandra, DynamoDB
Product catalog, read-heavy	Stale reads acceptable	Cosmos DB Session or Eventual, DynamoDB eventually consistent read
Inventory reservation	Lost update risk	SELECT FOR UPDATE, DynamoDB CAS ConditionExpression, MongoDB session

The following flowchart guides the database decision for workloads where anomaly prevention is a primary concern:

flowchart TD
    A[What anomaly risk does your workload have?] --> B{Can your workload tolerate write skew?}
    B -->|No - invariant must hold across rows| C{Multi-region required?}
    B -->|Yes - single-row or LWW is fine| D{High write throughput required?}
    C -->|Yes| E[Cosmos DB Strong or Google Spanner]
    C -->|No| F[PostgreSQL Serializable with retry on abort]
    D -->|Yes - millions of writes per second| G[Cassandra with LWT on critical paths]
    D -->|No - thousands of writes per second| H{Need multi-item atomicity?}
    H -->|Yes - up to 25 items| I[DynamoDB TransactWriteItems]
    H -->|No - single document sufficient| J[MongoDB default or DynamoDB single-item]

Follow the flowchart from top to bottom. The left branch captures workloads that cannot tolerate write skew — financial systems, booking systems, any domain with a shared invariant across multiple rows. The right branch captures the much larger space of workloads where some anomaly exposure is acceptable in exchange for throughput and simplicity.

🧪 Write Skew Under the Microscope: The On-Call Doctors Scenario Across Databases

The write skew scenario with doctors is the clearest illustration of how different databases handle the same logical invariant differently. The invariant: at least one doctor must always be on call. The failure mode: two doctors simultaneously decide to go off-call.

In PostgreSQL at Read Committed: Both transactions read count(*) = 2, both proceed, invariant breaks. No protection.

In PostgreSQL at Repeatable Read: Both transactions read the same snapshot (count(*) = 2), both proceed, invariant breaks. MVCC snapshot prevents read anomalies but does not detect write skew because the two transactions write to different rows.

In PostgreSQL at Serializable (SSI): PostgreSQL detects the rw-anti-dependency cycle. Both transactions read the on-call count, both write a different doctor row. The SSI tracker detects that T1's read depends on T2's write and T2's read depends on T1's write — a dangerous cycle. One transaction is aborted with error code 40001 (serialization failure) and must be retried. After the retry, the system sees that only one doctor is available and correctly rejects the second request to go off-call.

In MongoDB (no explicit transaction): Two concurrent updateOne operations on different documents complete independently. The invariant breaks. To prevent this, the application must use a multi-document session transaction with Snapshot isolation level and check the aggregate count inside the transaction — which MongoDB does not perform automatically.

In Cassandra: There is no mechanism to prevent write skew across two partitions (one per doctor). Lightweight Transactions only help if the entire invariant lives in a single partition row. The application would need to serialize doctor schedule updates through a single partition or implement an application-level locking scheme.

In DynamoDB with TransactWriteItems: The application can use ConditionExpression on a shared counter or use a single item that holds the count of on-call doctors, which the transaction checks and decrements atomically. This prevents write skew when designed for it, but it requires the schema to encode the shared invariant as a checkable condition on a DynamoDB item.

The lesson: write skew prevention is a database-schema-and-code design problem, not just a database selection problem. PostgreSQL Serializable is the only system in this list that prevents write skew automatically without application-level schema accommodations.

🛠️ PostgreSQL SSI: How Serializable Snapshot Isolation Prevents Write Skew Without Locking Rows

PostgreSQL's Serializable Snapshot Isolation (SSI), introduced in PostgreSQL 9.1, is one of the most significant advances in practical database concurrency control. It delivers full serializability — preventing all seven anomaly classes — without the blocking overhead of two-phase locking (2PL).

SSI vs. 2PL: Traditional serializable isolation using 2PL acquires a shared lock on every row read and an exclusive lock on every row written. Lock contention under concurrent read-heavy workloads is severe. SSI takes a different approach: it tracks read-write dependencies (rw-antidependencies) between transactions and only intervenes when a cycle is detected.

How SIReadLocks work: When a transaction reads a row or a predicate range under SSI, PostgreSQL records a SIReadLock — not a blocking lock, but a memory structure tracking "this transaction has a read dependency on this data." When a second transaction writes data that overlaps with the first transaction's read, PostgreSQL creates an rw-antidependency edge between them. When a cycle of two or more such edges forms (T1 read → T2 wrote, T2 read → T1 wrote), PostgreSQL aborts one of the transactions in the cycle.

Enabling SSI in PostgreSQL and handling retries:

-- Set isolation level for the session or transaction
SET default_transaction_isolation = 'serializable';

-- Or per-transaction:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
  SELECT count(*) FROM doctors WHERE on_call = true;
  UPDATE doctors SET on_call = false WHERE name = 'Alice';
COMMIT;
-- If write skew is detected, PostgreSQL returns:
-- ERROR: could not serialize access due to read/write dependencies
-- SQLSTATE: 40001 (serialization_failure)
-- Application must catch this error and retry the transaction.

The retry loop is a mandatory part of using serializable isolation. A well-written application wraps every serializable transaction in a retry loop that catches SQLSTATE 40001 and re-executes the transaction from the beginning. Under low contention, retries are rare — the performance overhead of SSI is primarily from the in-memory dependency tracking, not from aborts.

Practical guidance: Enable Serializable isolation for your most sensitive financial or scheduling transactions. Leave Read Committed as the default for everything else. The overhead of Serializable on transactions that have no actual write skew risk is low because SSI is optimistic — it only pays the abort cost when an actual conflict is detected.

For a full deep-dive on PostgreSQL's transaction internals, see the PostgreSQL documentation on Serializable Snapshot Isolation.

📚 Lessons Learned: What Production Systems Actually Teach You About Anomalies

Most systems use Read Committed for 95% of their transactions — and pay for it occasionally. The default isolation level in PostgreSQL, MySQL, and Oracle is Read Committed. Under this level, non-repeatable reads and read skew are possible. Most OLTP applications tolerate this because the window for these anomalies is small and the business impact is low. When the impact is high (financial reconciliation, booking systems), an explicit upgrade to a higher isolation level is always preferable to application-level workarounds.

Write skew is the anomaly that most developers do not know to look for. Dirty reads and lost updates are intuitive. Write skew is subtle because it involves no conflicting rows — both transactions write to rows the other never touched. It surfaces as a business invariant violation days or weeks after the transaction code was written and reviewed. The pattern to watch for: any "check a shared condition, then write based on the result" logic that spans multiple rows.

NoSQL databases are not "unsafe" — they make their trade-offs explicit. Cassandra's lack of cross-partition transactions is not a bug; it is the architectural commitment that makes linear write scalability possible. DynamoDB's 25-item transaction limit is not arbitrary; it is the boundary within which DynamoDB can provide serializable isolation without a distributed 2PL protocol. Understand the boundary and design your schema to stay within it.

Schema design in NoSQL is anomaly design. In Cassandra and DynamoDB, the partition is the unit of consistency. If your invariant spans multiple partitions, your application is responsible for enforcing it. This means embedding shared invariant state into a single partition where possible — a counter item, a status flag — so that a single atomic operation can check and update it.

Serialization failures are not errors — they are retries. When PostgreSQL aborts a serializable transaction due to an rw-antidependency cycle, it is doing exactly what it is supposed to do. The application's responsibility is to catch SQLSTATE 40001 and retry. Systems that treat serialization failures as unexpected errors and crash instead of retrying are defeating the purpose of the isolation level.

📌 TLDR & Key Takeaways

Seven anomaly classes exist on a spectrum from dirty reads (reading uncommitted data) to write skew (two transactions both read a shared invariant and independently break it).
SQL isolation levels map directly to anomaly prevention: Read Committed stops dirty reads; Repeatable Read stops non-repeatable reads and phantoms; Serializable stops write skew.
PostgreSQL SSI prevents write skew without row-level locking by tracking read-write dependency cycles in memory — the most efficient serializable implementation in production databases.
MySQL InnoDB uses gap locks and next-key locks under Repeatable Read to prevent phantoms, and 2PL under Serializable — more blocking than PostgreSQL's SSI under concurrent reads.
MongoDB is atomic per-document always; multi-document write skew prevention requires explicit session transactions with Snapshot isolation.
Cassandra accepts read skew, write skew, and phantom reads by design. Lightweight Transactions (Paxos) prevent lost updates and enable compare-and-set within a single partition only.
DynamoDB prevents all anomalies within a TransactWriteItems call (up to 25 items). Outside transactions, single-item atomicity and ConditionExpressions are your tools.
Cosmos DB makes the trade-off explicit through five consistency levels. Strong prevents all anomalies; Eventual prevents only dirty writes.
The practical default: Read Committed for most transactions; Serializable for financial and booking critical paths; NoSQL for high-throughput workloads where the invariant fits in a single partition.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...

May 3, 2026•23 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...

May 3, 2026•22 min read

Clock Skew and Causality Violations: Why Distributed Clocks Lie

TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...

May 3, 2026•18 min read

Split Brain Explained: When Two Nodes Both Think They Are Leader

TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...

May 3, 2026•20 min read