Database Anomalies: How SQL and NoSQL Handle Dirty Reads, Phantom Reads, and Write Skew
A deep comparison of how relational and NoSQL databases detect, prevent, and accept different classes of data anomalies
Abstract AlgorithmsTLDR: Database anomalies are the predictable side-effects of concurrent transactions β dirty reads, phantom reads, write skew, and lost updates. SQL databases use MVCC and isolation levels to prevent them; PostgreSQL's Serializable Snapshot Isolation (SSI) stops even write skew. NoSQL databases β Cassandra, DynamoDB, MongoDB, Cosmos DB β deliberately accept some anomalies in exchange for throughput and availability. The right choice is not "which database is safer" but "which anomalies can your workload afford to accept."
π The Banking Scenario That Exposes Every Hidden Database Assumption
Imagine a bank account with a balance of $200. Two wire transfers are initiated at the same moment: Alice wants to send $150 to her landlord, and Bob's payroll system is deducting $100 for a loan payment. Both transactions read the balance β both see $200 and decide there is enough money. Both proceed. Alice's transfer writes the new balance as $50. Bob's transfer writes it as $100. The database commits both. The account ends up at $100 β but it should be at -$50 with one transfer rejected.
No error was thrown. No hardware failed. Both transactions followed correct application logic. The problem is a database anomaly: an undesired outcome caused not by a bug in any single transaction, but by the interaction between two concurrent ones.
This specific scenario is called a lost update. It is one of seven well-defined anomaly classes that emerge when databases allow multiple transactions to run simultaneously without coordinating their reads and writes. These anomalies range from inconvenient (reading a slightly stale value) to catastrophic (a hotel booking system that allows two guests to occupy the same room on the same night).
Every database β relational or NoSQL, single-node or globally distributed β makes explicit or implicit choices about which anomalies it will prevent and which it will allow in exchange for performance or availability. Understanding those choices is one of the most practically useful skills in system design. It tells you when to reach for a serializable isolation level, when a lower isolation level is safe, and when a NoSQL database's "eventual consistency" is perfectly acceptable for your workload.
This post defines all seven anomaly classes, shows exactly what each major database guarantees, and gives you a decision framework for choosing the right tool.
π ACID and Isolation: The Contract That Keeps Concurrent Transactions Honest
Before cataloguing anomalies, we need to understand the contract that prevents them: ACID.
- Atomicity: A transaction either commits all its writes or rolls all of them back. There is no partial commit.
- Consistency: A transaction transitions the database from one valid state to another. Application-defined invariants β like "account balance must never go negative" β must hold before and after every transaction.
- Isolation: Concurrent transactions behave as if they ran serially. One transaction's intermediate state is not visible to another.
- Durability: Once a transaction commits, its writes survive node failures.
The isolation property is where anomalies live. Full isolation β meaning every transaction sees only committed data and runs as if it had exclusive access to the database β is called serializability. It is the strongest guarantee, and it is expensive. Databases offer weaker isolation levels that trade anomaly safety for concurrency and throughput.
The SQL standard defines four isolation levels from weakest to strongest:
| Isolation Level | What changes between consecutive reads? |
| Read Uncommitted | A transaction can read another transaction's uncommitted writes |
| Read Committed | A transaction reads only committed data, but the data can change between two reads |
| Repeatable Read | A row read twice within a transaction returns the same value β but new rows can appear |
| Serializable | Full isolation; all anomalies prevented |
NoSQL databases often operate outside this four-level framework entirely. They may offer consistency levels (DynamoDB's TransactWriteItems, Cosmos DB's five consistency tiers) or per-partition guarantees (Cassandra's Lightweight Transactions) that map only loosely onto SQL isolation levels. Understanding what those guarantees actually mean for your data requires a bottom-up look at the anomalies themselves.
βοΈ The Seven Core Database Anomalies That Break Production Systems
Each anomaly below has a precise definition, a timeline showing how it occurs, and a concrete consequence in a real system.
Dirty Read
What happens: Transaction T2 reads a row that Transaction T1 has written but not yet committed. If T1 then rolls back, T2 has acted on data that was never durably stored β data that never officially existed.
Concrete scenario:
The following sequence diagram shows T1 writing an account balance that T2 reads before the commit. T1 then rolls back, leaving T2 holding a value that no longer exists in the database.
sequenceDiagram
participant T1 as Transaction 1
participant DB as Database
participant T2 as Transaction 2
T1->>DB: BEGIN
T1->>DB: UPDATE account SET balance=50 WHERE id=1
Note over DB: Row written, not yet committed
T2->>DB: BEGIN
T2->>DB: SELECT balance FROM account WHERE id=1
DB-->>T2: Returns 50 (dirty read)
T1->>DB: ROLLBACK
Note over DB: Balance restored to original 100
Note over T2: T2 acted on value 50 which never committed
T2 read the balance as 50, perhaps approved a second wire transfer of $40 based on that, and is now holding the wrong view of the world. The actual committed balance was never 50.
Production impact: Dirty reads can cause money to be transferred based on rolled-back debits, inventory to be reserved against cancelled orders, or approval decisions to be made on data that was never durably written.
Non-Repeatable Read
What happens: Transaction T1 reads a row, then reads it again. Between those two reads, T2 updates and commits the row. T1 gets a different value on its second read even though it is still in the same transaction.
Production impact: Audit logs that read a record twice in one transaction end up with inconsistent entries. Price calculations that read a product price at the start of checkout and again at payment confirmation can yield different totals if a price update commits in between.
Phantom Read
What happens: T1 queries a set of rows matching a predicate β say, all orders placed today. T2 inserts a new order and commits. T1 re-runs the same query and now gets a different count. No existing row changed; a new row appeared.
Production impact: A report counting active sessions or pending orders produces different totals within a single batch operation. An access-control check that lists all users with admin privileges can miss a newly inserted admin.
Read Skew
What happens: T1 reads Account A (balance 100), then reads Account B (balance 200). Between those two reads, T2 moves $50 from A to B β setting A to 50 and B to 250 and committing. T1 has now seen A's old value and B's new value. Its combined view shows a total of 300, but it is reading two different moments in time: A from before the transfer and B from after.
This is sometimes called the Alice and Bob problem in Martin Kleppmann's Designing Data-Intensive Applications. The classic example involves two users whose combined on-call shift must always cover the hospital. Each user sees a consistent snapshot of their own record, but neither sees a consistent snapshot of the system as a whole.
Production impact: Financial reports showing an incorrect total balance. Inventory reconciliation that counts items across multiple tables and gets an inconsistent sum. Any cross-entity aggregate that reads from multiple rows or tables in sequence.
Write Skew
What happens: T1 and T2 both read a shared condition β say, "how many doctors are currently on call?" Both see that the invariant is satisfied (at least one doctor on call). Both decide to take an action that modifies a different row: T1 marks Alice as off-call, T2 marks Bob as off-call. Both commit. The invariant β at least one doctor must always be on call β is now broken, even though every individual write was locally valid.
The following diagram shows this scenario with two concurrent transactions, each reading a safe state and writing a different row, arriving at an unsafe combined state.
sequenceDiagram
participant A as Alice Transaction
participant DB as Database
participant B as Bob Transaction
A->>DB: BEGIN
B->>DB: BEGIN
A->>DB: SELECT count(*) FROM doctors WHERE on_call=true
DB-->>A: Returns 2 (both on call)
B->>DB: SELECT count(*) FROM doctors WHERE on_call=true
DB-->>B: Returns 2 (both on call)
A->>DB: UPDATE doctors SET on_call=false WHERE name='Alice'
B->>DB: UPDATE doctors SET on_call=false WHERE name='Bob'
A->>DB: COMMIT
B->>DB: COMMIT
Note over DB: Invariant broken: 0 doctors on call
Write skew is the most insidious anomaly because it cannot be prevented by row-level locks alone. The conflict is on a predicate: the aggregate count of rows satisfying a condition. T1 and T2 write to different rows, so they never conflict at the row level. Prevention requires either serializable isolation or an explicit predicate lock covering the range query.
Production impact: Meeting room bookings where two people both check that the room is free and both proceed to book it. Double-spending in a wallet system where two withdrawals both check the balance and both execute. Any "check then act" pattern on a shared aggregate.
Lost Update
What happens: T1 and T2 both read a counter (value: 10). T1 computes 11 and writes it. T2 also computed 11 and writes it. One increment is lost β the counter should be 12 but is 11.
Prevention options: SELECT FOR UPDATE (acquires a row lock so T2 must wait), atomic increment operations (UPDATE counter SET value = value + 1), or optimistic locking via compare-and-swap (UPDATE counter SET value=11 WHERE value=10).
Production impact: Like counts, inventory decrement, concurrent seat bookings. Any read-modify-write operation where two transactions race.
Dirty Write
What happens: T1 writes to a row (not yet committed). T2 also writes to the same row and commits. T1 then rolls back β but T1's rollback restores the row to the value before T1's write, which means it overwrites T2's committed value. T2's committed change is silently lost.
Production impact: Dirty writes can corrupt application-level invariants that span multiple rows. If a purchase transaction writes both a billing record and an order record, a dirty write on the billing record during another transaction's rollback can leave an order record with no corresponding billing entry.
π§ Deep Dive: How Databases Detect and Stop Anomalies Before They Corrupt Data
The Internals: MVCC Row Versioning, Snapshot Reads, and Lock Acquisition
The dominant technique for preventing read anomalies without blocking all concurrent reads is Multi-Version Concurrency Control (MVCC). Instead of storing a single current version of each row, MVCC stores multiple versions, each tagged with the transaction that created or deleted it.
In PostgreSQL, every row has two hidden system columns:
xmin: the transaction ID that inserted this version of the rowxmax: the transaction ID that deleted or updated this row (zero if the row is still current)
When a transaction begins, it receives a transaction ID (XID) and takes a snapshot of which XIDs are currently in-flight (started but not committed). The visibility rule is: a row version is visible to a transaction if its xmin is committed and less than the transaction's XID, and its xmax is either zero or not yet committed when the snapshot was taken.
This means reads never block writes and writes never block reads β each transaction is reading from its own consistent snapshot of the past. The consequences for anomaly prevention are direct:
- Dirty reads are impossible because an uncommitted
xminis never considered visible. - Non-repeatable reads depend on when the snapshot is taken. Under Read Committed, each statement gets a fresh snapshot, so a row committed by T2 between two reads in T1 becomes visible on the second read. Under Repeatable Read, the snapshot is taken once at transaction start, so the same row is always read from the same snapshot β non-repeatable reads and phantom reads are both eliminated.
Phantom reads in range queries require an additional mechanism. A snapshot read prevents you from seeing new rows that committed after your snapshot, which prevents phantoms. However, MySQL InnoDB's Repeatable Read adds gap locks and next-key locks β a row lock plus a lock on the gap above the row β to prevent phantom reads even in the presence of inserts that target gaps in the locked range.
Write skew is fundamentally different. It arises from a read-write dependency that MVCC alone cannot detect. Transaction T1 reads a predicate (e.g., a count), makes a decision based on that read, and writes to a row that is not the row T2 also read and wrote. Because the rows involved in T1's read and T2's write are different, there is no version conflict to detect. MVCC sees no conflict. Both transactions commit. The invariant is broken.
Preventing write skew requires detecting the read-write anti-dependency cycle: T1 read data written by T2's transaction start β T2 read data written by T1's transaction start β both committed. PostgreSQL's Serializable Snapshot Isolation tracks these dependencies using SIReadLock structures and aborts one transaction in the cycle when a cycle is detected.
Performance Analysis: The Cost of Moving Up the Isolation Ladder
Stronger isolation costs more. The performance trade-offs are real and must be understood before choosing an isolation level:
| Isolation Level | Concurrency overhead | When it stalls |
| Read Committed | Near-zero for reads | Writers briefly block conflicting writers |
| Repeatable Read (Postgres) | Snapshot held in memory | Transaction aborts on write-write conflict |
| Repeatable Read (MySQL) | Gap locks + next-key locks | Range queries lock gaps, blocking insertions |
| Serializable (Postgres SSI) | SIReadLock tracking + cycle detection | Abort and retry on rw-anti-dependency cycle |
| Serializable (MySQL) | All reads become LOCK IN SHARE MODE | High read-lock contention under heavy concurrency |
PostgreSQL's SSI is notably more efficient than traditional two-phase locking (2PL) because it does not hold locks for the duration of the transaction β it tracks read sets and detects dangerous cycles only at commit time. Under low conflict workloads, SSI's overhead compared to Snapshot Isolation (Repeatable Read) is often under 10%. Under high write skew workloads, abort rates climb and throughput degrades.
The practical rule: default to Read Committed for most OLTP workloads. Move to Repeatable Read when you need consistent within-transaction aggregates. Move to Serializable only when your invariant spans multiple rows and write skew is a real risk for your access pattern.
π SQL Isolation Levels vs. Anomaly Prevention: PostgreSQL and MySQL InnoDB
The table below maps each isolation level to the anomalies it prevents or allows for the two dominant SQL MVCC implementations. "Possible" means the anomaly can occur at that level. "Prevented" means the database guarantees it cannot occur.
| Anomaly | PG Read Committed | PG Repeatable Read | PG Serializable | MySQL RR (InnoDB) | MySQL Serializable |
| Dirty Read | Prevented | Prevented | Prevented | Prevented | Prevented |
| Non-Repeatable Read | Possible | Prevented | Prevented | Prevented | Prevented |
| Phantom Read | Possible | Prevented | Prevented | Prevented (gap lock) | Prevented |
| Read Skew | Possible | Prevented | Prevented | Prevented | Prevented |
| Write Skew | Possible | Possible | Prevented (SSI) | Possible | Prevented |
| Lost Update | Possible | Possible (auto-detect in some cases) | Prevented | Possible | Prevented |
| Dirty Write | Prevented | Prevented | Prevented | Prevented | Prevented |
A few details worth understanding:
PostgreSQL Repeatable Read vs. MySQL Repeatable Read: Both use MVCC snapshots, but MySQL InnoDB's Repeatable Read adds gap locks to prevent phantoms that PostgreSQL Repeatable Read already prevents via snapshot visibility. MySQL's approach introduces more blocking under concurrent inserts into a locked range.
PostgreSQL Serializable uses SSI (not 2PL): PostgreSQL does not implement serializability by acquiring explicit read locks on every row. Instead, it tracks read-write dependencies in memory (SIReadLock) and detects dangerous cycles at commit time. This means a serializable transaction in PostgreSQL may run at full speed and only abort at the last moment if a cycle is detected. It also means serializable workloads that have no actual write skew risk pay very little overhead.
MySQL Serializable uses 2PL: Under Serializable isolation, MySQL converts every SELECT into SELECT ... LOCK IN SHARE MODE. This acquires shared read locks immediately and holds them until the transaction ends. Under heavy read concurrency, this can severely limit write throughput.
The following flowchart shows the decision path for choosing a PostgreSQL isolation level:
flowchart TD
A[Starting a transaction] --> B{Do you need to prevent dirty reads?}
B -->|Already prevented at all levels| C{Do reads need to be stable within the transaction?}
C -->|No - reads can see new commits| D[Use Read Committed - default]
C -->|Yes - same row must return same value| E{Does the transaction check a cross-row invariant?}
E -->|No - single-entity reads only| F[Use Repeatable Read]
E -->|Yes - aggregate or predicate check drives a write| G[Use Serializable - SSI]
G --> H[Handle serialization failure with retry]
Read the flowchart from top to bottom. Most application transactions land on Read Committed. Serializable is reserved for workflows that explicitly check a shared condition and write based on it β the exact pattern that enables write skew.
π How MongoDB, Cassandra, DynamoDB, and Cosmos DB Approach Anomaly Prevention
NoSQL databases were built to scale horizontally across commodity hardware and to favour availability over strict consistency. That design choice has direct consequences for anomaly prevention.
MongoDB: Document Atomicity With Optional Multi-Document Transactions
MongoDB guarantees atomic operations on a single document always β even without transactions. A document update that modifies multiple fields is either fully applied or not applied at all. This eliminates dirty writes and lost updates within a single document.
For operations that span multiple documents (or multiple collections), MongoDB v4.0 introduced multi-document transactions with Read Committed and Snapshot isolation levels. Using { readConcern: { level: 'snapshot' } } inside a session gives you point-in-time consistency across documents:
const session = client.startSession();
session.startTransaction({
readConcern: { level: 'snapshot' },
writeConcern: { w: 'majority' }
});
// All reads in this session see the same snapshot
// All writes are all-or-nothing
await session.commitTransaction();
What MongoDB prevents by default (single documents): dirty reads (with w: majority), dirty writes, lost updates.
What requires explicit transactions: read skew (reading across two documents), write skew (two transactions both read a shared aggregate and write to different documents), phantom reads in multi-document queries.
What MongoDB does not offer: predicate locks for range queries. A find({ status: 'pending' }) followed by an update to a matching document can race with another insert of a new pending document in a way that MongoDB snapshot isolation does not prevent.
Cassandra: Throughput First, Transactions by Exception
Cassandra has no multi-row transactions in its base data model. By design. Each write operation targets a single partition and is applied by the coordinator node using a last-write-wins (LWW) model. This means:
- Dirty writes: Prevented within a partition because Cassandra uses quorum writes and a last-write-wins policy β there is no concept of an uncommitted write.
- Read skew: Accepted. A read across two partitions has no snapshot guarantee. The two partitions can reflect writes from different points in time.
- Write skew: Accepted. There is no mechanism to atomically read a cross-partition condition and conditionally write based on it.
- Phantom reads: Accepted. Range queries across partitions have no predicate lock.
- Lost updates: Accepted at the row level unless you use Lightweight Transactions.
Cassandra's Lightweight Transactions (LWT) use the Paxos consensus algorithm to provide compare-and-set semantics on a single partition:
INSERT INTO seats (id, holder) VALUES (5, 'alice') IF NOT EXISTS;
UPDATE accounts SET balance = 150 WHERE id = 1 IF balance = 200;
LWT prevents lost updates and double-booking within a single partition. It is Cassandra's only mechanism for conditional writes and it comes at significant latency cost β a round of Paxos adds 2β4 network round trips. For most high-throughput Cassandra use cases, LWT is used sparingly for idempotency guarantees on critical paths, not as a general transaction mechanism.
The Cassandra documentation explicitly states that "Cassandra does not support multi-partition transactions. LWT is limited to single-partition operations." This is not an oversight; it is the architectural trade-off that enables Cassandra's linear write scalability across hundreds of nodes.
DynamoDB: Single-Item Atomicity Plus Transactional API
DynamoDB guarantees atomic operations on a single item always. For multi-item operations, DynamoDB's TransactWriteItems provides serializable isolation across up to 25 items in a single transaction:
{
"TransactItems": [
{
"Update": {
"TableName": "Accounts",
"Key": { "id": { "S": "account-1" } },
"UpdateExpression": "SET balance = balance - :amount",
"ConditionExpression": "balance >= :amount",
"ExpressionAttributeValues": { ":amount": { "N": "100" } }
}
},
{
"Update": {
"TableName": "Accounts",
"Key": { "id": { "S": "account-2" } },
"UpdateExpression": "SET balance = balance + :amount",
"ExpressionAttributeValues": { ":amount": { "N": "100" } }
}
}
]
}
The ConditionExpression on the debit prevents a lost update β if the balance is too low, the entire transaction fails atomically. The pair of updates is serializable: no other transaction sees account-1 debited without account-2 credited.
What DynamoDB prevents with TransactWriteItems: dirty reads, dirty writes, lost updates, read skew within the transaction scope.
What DynamoDB accepts: Phantom reads in Scan operations without transactions. Read skew across non-transactional reads. DynamoDB has no predicate locking β a Scan on items matching a filter and a concurrent Put of a new matching item can produce phantom reads outside of a transaction.
Cosmos DB: Five Consistency Levels That Map Directly to Anomaly Exposure
Cosmos DB is unique in that it exposes its consistency model as an explicit configuration choice with five levels. The level you choose determines exactly which anomalies are possible.
| Consistency Level | Anomaly Exposure |
| Strong | No anomalies. Linearizable reads. Highest latency and cost. |
| Bounded Staleness | Reads may see data up to K versions or T seconds old. Phantoms and read skew possible across the staleness window. |
| Session | Within your session: read-your-writes, monotonic reads. Cross-session: read skew and phantoms possible. |
| Consistent Prefix | Writes are seen in order, but reads may lag. No dirty reads. Stale reads, read skew, and phantoms possible. |
| Eventual | All anomalies except dirty writes are possible. Maximum throughput. |
Multi-document ACID transactions in Cosmos DB are supported within a single logical partition, either via stored procedures (JavaScript running atomically on the server) or via the SDK's transactional batch API. Cross-partition transactions are not natively supported β you must implement the Saga pattern or use an application-level distributed transaction protocol.
Full Anomaly Comparison Across All Six Systems
| Anomaly | PostgreSQL Serializable | PostgreSQL RR | MySQL Serializable | MongoDB (Snapshot Tx) | Cassandra (LWT) | DynamoDB (TransactWrite) | Cosmos DB Strong | Cosmos DB Eventual |
| Dirty Read | Prevented | Prevented | Prevented | Prevented | Prevented | Prevented | Prevented | Prevented |
| Non-Repeatable Read | Prevented | Prevented | Prevented | Prevented | Possible | Prevented | Prevented | Possible |
| Phantom Read | Prevented | Prevented | Prevented | Partial | Possible | Possible (scan) | Prevented | Possible |
| Read Skew | Prevented | Prevented | Prevented | Prevented (in tx) | Possible | Prevented (in tx) | Prevented | Possible |
| Write Skew | Prevented | Possible | Prevented | Possible | Possible | Prevented (in tx) | Prevented | Possible |
| Lost Update | Prevented | Possible | Prevented | Prevented (in tx) | Partial (LWT) | Prevented (CAS) | Prevented | Possible |
"Partial" means the anomaly is prevented within certain boundaries (single partition, single document, single transaction) but not universally.
βοΈ Consistency vs. Throughput: What Every NoSQL Trade-off Actually Costs You
Understanding the anomaly table above is only half the picture. The other half is understanding why NoSQL databases accept these anomalies and what they gain in exchange.
Cassandra's trade-off is availability under network partition. Under the CAP theorem, Cassandra chooses AP: it remains available and accepts writes on both sides of a network partition, reconciling data via read repair and Merkle trees after the partition heals. This means there is no single point of truth during a partition β two coordinators can accept concurrent writes to the same partition and produce last-write-wins semantics. Preventing write skew in this environment would require a consensus round that is incompatible with Cassandra's availability guarantee.
DynamoDB's trade-off is operational simplicity at global scale. DynamoDB is a multi-region key-value store that provides consistent single-item operations and optional multi-item transactions β but only across items that fit within the 25-item limit and within the throughput capacity of the target partition group. Cross-table transactions exist but incur additional latency and 2x the write capacity unit cost. The design reflects a deliberate choice: make the common case (single-item atomic writes) extremely fast and cheap, and make the exceptional case (multi-item transactions) available but priced to discourage overuse.
MongoDB's trade-off is flexibility. The default document model encourages denormalization β embedding related data in a single document to make single-document atomicity sufficient for most operations. Multi-document transactions were added for cases where denormalization is not feasible. Their existence does not change the fact that using them widely in a schema that was not designed for it produces poor performance β MongoDB's transaction overhead is higher than PostgreSQL's because it requires distributed coordination across replica sets.
Cosmos DB's trade-off is user-configurable. By exposing five consistency levels, Cosmos DB lets you make the trade-off explicitly at the operation level. A shopping cart read can use Session consistency for fast, read-your-writes behavior. A payment ledger read can use Strong consistency to prevent all anomalies. The cost is paying for both the global replication infrastructure and the higher per-operation latency when Strong is selected.
π§ Choosing the Right Database for Your Anomaly Tolerance
The right database choice depends on answering two questions first: which anomalies does your workload generate? And which anomalies can your workload afford to accept?
| Use Case | Anomaly Risk | Recommended Approach |
| Financial ledger, double-spend prevention | Write skew, lost update | PostgreSQL Serializable or DynamoDB TransactWriteItems |
| On-call scheduling, seat booking | Write skew | PostgreSQL Serializable or Cosmos DB Strong within a partition |
| User session state, shopping cart | Non-repeatable read acceptable | PostgreSQL Read Committed, MongoDB default, DynamoDB single-item |
| Event log, time-series append | No reads during write; last-write-wins acceptable | Cassandra, DynamoDB |
| Product catalog, read-heavy | Stale reads acceptable | Cosmos DB Session or Eventual, DynamoDB eventually consistent read |
| Inventory reservation | Lost update risk | SELECT FOR UPDATE, DynamoDB CAS ConditionExpression, MongoDB session |
The following flowchart guides the database decision for workloads where anomaly prevention is a primary concern:
flowchart TD
A[What anomaly risk does your workload have?] --> B{Can your workload tolerate write skew?}
B -->|No - invariant must hold across rows| C{Multi-region required?}
B -->|Yes - single-row or LWW is fine| D{High write throughput required?}
C -->|Yes| E[Cosmos DB Strong or Google Spanner]
C -->|No| F[PostgreSQL Serializable with retry on abort]
D -->|Yes - millions of writes per second| G[Cassandra with LWT on critical paths]
D -->|No - thousands of writes per second| H{Need multi-item atomicity?}
H -->|Yes - up to 25 items| I[DynamoDB TransactWriteItems]
H -->|No - single document sufficient| J[MongoDB default or DynamoDB single-item]
Follow the flowchart from top to bottom. The left branch captures workloads that cannot tolerate write skew β financial systems, booking systems, any domain with a shared invariant across multiple rows. The right branch captures the much larger space of workloads where some anomaly exposure is acceptable in exchange for throughput and simplicity.
π§ͺ Write Skew Under the Microscope: The On-Call Doctors Scenario Across Databases
The write skew scenario with doctors is the clearest illustration of how different databases handle the same logical invariant differently. The invariant: at least one doctor must always be on call. The failure mode: two doctors simultaneously decide to go off-call.
In PostgreSQL at Read Committed: Both transactions read count(*) = 2, both proceed, invariant breaks. No protection.
In PostgreSQL at Repeatable Read: Both transactions read the same snapshot (count(*) = 2), both proceed, invariant breaks. MVCC snapshot prevents read anomalies but does not detect write skew because the two transactions write to different rows.
In PostgreSQL at Serializable (SSI): PostgreSQL detects the rw-anti-dependency cycle. Both transactions read the on-call count, both write a different doctor row. The SSI tracker detects that T1's read depends on T2's write and T2's read depends on T1's write β a dangerous cycle. One transaction is aborted with error code 40001 (serialization failure) and must be retried. After the retry, the system sees that only one doctor is available and correctly rejects the second request to go off-call.
In MongoDB (no explicit transaction): Two concurrent updateOne operations on different documents complete independently. The invariant breaks. To prevent this, the application must use a multi-document session transaction with Snapshot isolation level and check the aggregate count inside the transaction β which MongoDB does not perform automatically.
In Cassandra: There is no mechanism to prevent write skew across two partitions (one per doctor). Lightweight Transactions only help if the entire invariant lives in a single partition row. The application would need to serialize doctor schedule updates through a single partition or implement an application-level locking scheme.
In DynamoDB with TransactWriteItems: The application can use ConditionExpression on a shared counter or use a single item that holds the count of on-call doctors, which the transaction checks and decrements atomically. This prevents write skew when designed for it, but it requires the schema to encode the shared invariant as a checkable condition on a DynamoDB item.
The lesson: write skew prevention is a database-schema-and-code design problem, not just a database selection problem. PostgreSQL Serializable is the only system in this list that prevents write skew automatically without application-level schema accommodations.
π οΈ PostgreSQL SSI: How Serializable Snapshot Isolation Prevents Write Skew Without Locking Rows
PostgreSQL's Serializable Snapshot Isolation (SSI), introduced in PostgreSQL 9.1, is one of the most significant advances in practical database concurrency control. It delivers full serializability β preventing all seven anomaly classes β without the blocking overhead of two-phase locking (2PL).
SSI vs. 2PL: Traditional serializable isolation using 2PL acquires a shared lock on every row read and an exclusive lock on every row written. Lock contention under concurrent read-heavy workloads is severe. SSI takes a different approach: it tracks read-write dependencies (rw-antidependencies) between transactions and only intervenes when a cycle is detected.
How SIReadLocks work: When a transaction reads a row or a predicate range under SSI, PostgreSQL records a SIReadLock β not a blocking lock, but a memory structure tracking "this transaction has a read dependency on this data." When a second transaction writes data that overlaps with the first transaction's read, PostgreSQL creates an rw-antidependency edge between them. When a cycle of two or more such edges forms (T1 read β T2 wrote, T2 read β T1 wrote), PostgreSQL aborts one of the transactions in the cycle.
Enabling SSI in PostgreSQL and handling retries:
-- Set isolation level for the session or transaction
SET default_transaction_isolation = 'serializable';
-- Or per-transaction:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM doctors WHERE on_call = true;
UPDATE doctors SET on_call = false WHERE name = 'Alice';
COMMIT;
-- If write skew is detected, PostgreSQL returns:
-- ERROR: could not serialize access due to read/write dependencies
-- SQLSTATE: 40001 (serialization_failure)
-- Application must catch this error and retry the transaction.
The retry loop is a mandatory part of using serializable isolation. A well-written application wraps every serializable transaction in a retry loop that catches SQLSTATE 40001 and re-executes the transaction from the beginning. Under low contention, retries are rare β the performance overhead of SSI is primarily from the in-memory dependency tracking, not from aborts.
Practical guidance: Enable Serializable isolation for your most sensitive financial or scheduling transactions. Leave Read Committed as the default for everything else. The overhead of Serializable on transactions that have no actual write skew risk is low because SSI is optimistic β it only pays the abort cost when an actual conflict is detected.
For a full deep-dive on PostgreSQL's transaction internals, see the PostgreSQL documentation on Serializable Snapshot Isolation.
π Lessons Learned: What Production Systems Actually Teach You About Anomalies
Most systems use Read Committed for 95% of their transactions β and pay for it occasionally. The default isolation level in PostgreSQL, MySQL, and Oracle is Read Committed. Under this level, non-repeatable reads and read skew are possible. Most OLTP applications tolerate this because the window for these anomalies is small and the business impact is low. When the impact is high (financial reconciliation, booking systems), an explicit upgrade to a higher isolation level is always preferable to application-level workarounds.
Write skew is the anomaly that most developers do not know to look for. Dirty reads and lost updates are intuitive. Write skew is subtle because it involves no conflicting rows β both transactions write to rows the other never touched. It surfaces as a business invariant violation days or weeks after the transaction code was written and reviewed. The pattern to watch for: any "check a shared condition, then write based on the result" logic that spans multiple rows.
NoSQL databases are not "unsafe" β they make their trade-offs explicit. Cassandra's lack of cross-partition transactions is not a bug; it is the architectural commitment that makes linear write scalability possible. DynamoDB's 25-item transaction limit is not arbitrary; it is the boundary within which DynamoDB can provide serializable isolation without a distributed 2PL protocol. Understand the boundary and design your schema to stay within it.
Schema design in NoSQL is anomaly design. In Cassandra and DynamoDB, the partition is the unit of consistency. If your invariant spans multiple partitions, your application is responsible for enforcing it. This means embedding shared invariant state into a single partition where possible β a counter item, a status flag β so that a single atomic operation can check and update it.
Serialization failures are not errors β they are retries. When PostgreSQL aborts a serializable transaction due to an rw-antidependency cycle, it is doing exactly what it is supposed to do. The application's responsibility is to catch SQLSTATE 40001 and retry. Systems that treat serialization failures as unexpected errors and crash instead of retrying are defeating the purpose of the isolation level.
π TLDR & Key Takeaways
- Seven anomaly classes exist on a spectrum from dirty reads (reading uncommitted data) to write skew (two transactions both read a shared invariant and independently break it).
- SQL isolation levels map directly to anomaly prevention: Read Committed stops dirty reads; Repeatable Read stops non-repeatable reads and phantoms; Serializable stops write skew.
- PostgreSQL SSI prevents write skew without row-level locking by tracking read-write dependency cycles in memory β the most efficient serializable implementation in production databases.
- MySQL InnoDB uses gap locks and next-key locks under Repeatable Read to prevent phantoms, and 2PL under Serializable β more blocking than PostgreSQL's SSI under concurrent reads.
- MongoDB is atomic per-document always; multi-document write skew prevention requires explicit session transactions with Snapshot isolation.
- Cassandra accepts read skew, write skew, and phantom reads by design. Lightweight Transactions (Paxos) prevent lost updates and enable compare-and-set within a single partition only.
- DynamoDB prevents all anomalies within a
TransactWriteItemscall (up to 25 items). Outside transactions, single-item atomicity and ConditionExpressions are your tools. - Cosmos DB makes the trade-off explicit through five consistency levels. Strong prevents all anomalies; Eventual prevents only dirty writes.
- The practical default: Read Committed for most transactions; Serializable for financial and booking critical paths; NoSQL for high-throughput workloads where the invariant fits in a single partition.
π Practice Quiz
A transaction T1 reads an account balance of $500, then reads it again later in the same transaction and gets $480 because T2 committed an update between the two reads. What anomaly is this, and which isolation level prevents it?
Correct Answer: Non-repeatable read. Repeatable Read isolation (or higher) prevents it by taking a snapshot at transaction start and always reading from that snapshot.
Two concurrent transactions both read
SELECT count(*) FROM seats WHERE occupied = false AND row = 5. Both see 1 available seat. Both insert a booking for that seat. Both commit. What anomaly is this, and why do row-level locks not prevent it?Correct Answer: Write skew. The two transactions write to different rows (two different booking records), so there is no row-level lock conflict to detect. Only predicate-level locking or Serializable isolation (SSI) can detect the rw-antidependency cycle and abort one transaction.
In PostgreSQL, what does
SQLSTATE 40001indicate and what should the application do when it receives it?Correct Answer:
SQLSTATE 40001is a serialization failure β PostgreSQL's SSI detected an rw-antidependency cycle and aborted the transaction to prevent write skew. The application must catch this error and retry the entire transaction from the beginning.Cassandra's Lightweight Transactions (LWT) use which consensus protocol, and what class of anomaly do they prevent within a single partition?
Correct Answer: LWT uses the Paxos consensus protocol. It provides compare-and-set (CAS) semantics β preventing lost updates β within a single partition. It does not prevent write skew or read skew across multiple partitions.
Open-ended challenge: A hotel booking system uses DynamoDB. A room's availability is tracked in one item, and each booking is a separate item in the same table. How would you design the DynamoDB access pattern to prevent two guests from booking the same room on the same night simultaneously, using only DynamoDB's native features? What are the trade-offs of your approach?
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions β with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose β range, hash, consistent hashing, or directory β determines whether range queries stay ch...
