23 min readDatabases Transactions Isolation Levels

Write Skew Explained: The Anomaly That Requires Serializable Isolation

Two transactions check a shared condition and write to different rows — individually correct, collectively wrong. Why Repeatable Read fails.

Abstract Algorithms/Apr 11, 2026/System Design Interview Prep

Executive TLDR

TLDR: Write skew is the hardest concurrency anomaly to reason about: two concurrent transactions each read a shared condition, decide they can safely proceed, and then write to different rows.
Yet together, their writes violate the invariant both checked.
Repeatable Read does not prevent write skew — only Serializable isolation (or explicit predicate locking) can catch it.
If your database runs at Read Committed or Repeatable Read, you may already have this bug in production.

Core mental model

Read this as a system of state, constraints, and failure boundaries.

Two transactions check a shared condition and write to different rows — individually correct, collectively wrong. Why Repeatable Read fails.

Explain simpler Compare tradeoffs

Key systems visualization

The article’s conceptual path

📖 The On-Call Disaster: When Two Correct Decisions Leave Zero Doctors Available

🔍 What Makes Write Skew Different from Lost Updates and Dirty Reads

⚙️ Step-by-Step: How Two Transactions Jointly Break a Shared Invariant

🧠 Deep Dive: How Serializable Snapshot Isolation Detects Write Skew

📊 From HTTP Request to Empty On-Call Roster: The Full System Architecture View

> TLDR: Write skew is the hardest concurrency anomaly to reason about: two concurrent transactions each read a shared condition, decide they can safely proceed, and then write to different rows. No individual operation is wrong. No row was overwritten. Yet together, their writes violate the invariant both checked. Repeatable Read does not prevent write skew — only Serializable isolation (or explicit predicate locking) can catch it. If your database runs at Read Committed or Repeatable Read, you may already have this bug in production.

At 2:47 AM, a patient arrives at City General Hospital in critical condition. The emergency desk pages the on-call doctor. The system returns zero results. There are no on-call doctors.

Six hours earlier, Alice and Bob were both on the roster. Both decided they could safely go off-call. Both ran the same check. Both got the same answer. Both committed their updates. Neither of them made an error. Yet the hospital ended up with nobody on duty — and the on-call scheduling database has no dirty reads, no overwritten rows, no data corruption visible in any log.

This is write skew. It is not the loudest database bug. It is the quietest one.

📖 The On-Call Disaster: When Two Correct Decisions Leave Zero Doctors Available

The hospital enforces a simple invariant: at all times, at least one doctor must have status = 'active' in the on_call table. The rule is checked by application logic before any doctor is allowed to go off-call.

Here is the timeline that breaks everything:

14:00:00 — Alice opens the staff portal and clicks "Go Off Call."
14:00:00 — Bob, in a separate browser tab, clicks "Go Off Call" simultaneously.
14:00:01 — Alice's transaction reads: SELECT COUNT(*) FROM on_call WHERE status = 'active' → returns 2 (Alice and Bob are both active).
14:00:01 — Bob's transaction reads the same query → also returns 2.
14:00:01 — Alice's code evaluates: 2 > 1, so it is safe to go off-call. Her transaction runs UPDATE on_call SET status = 'off_call' WHERE doctor = 'alice' and commits.
14:00:01 — Bob's code evaluates: 2 > 1, safe to go off-call. His transaction runs UPDATE on_call SET status = 'off_call' WHERE doctor = 'bob' and commits.
14:00:02 — The on-call roster now shows 0 active doctors.

Notice what did not happen: Alice did not read Bob's uncommitted data. Bob did not read Alice's uncommitted data. Neither transaction read a value that was later rolled back. Neither overwrote the other's row. Each transaction was perfectly isolated in the traditional sense — and together they produced a state the application explicitly forbade.

The consequence for the patient at 2:47 AM is real: the emergency scheduler queries the database, finds an empty roster, and routes the call to a regional dispatch center 40 minutes away.

🔍 What Makes Write Skew Different from Lost Updates and Dirty Reads

Before going deeper, it is worth pinning down exactly why write skew is its own category — distinct from every other anomaly you may already know.

Anomaly	What Happens	Same Row?	Shared Invariant Broken?
Dirty Read	T1 reads T2's uncommitted write	Yes	No
Non-Repeatable Read	T1 reads a row twice; T2 commits an update in between	Yes	No
Phantom Read	T1 re-runs a range query; T2 inserted a new row in between	Range	No
Lost Update	T1 and T2 both write to the same row; one write is silently discarded	Yes	Indirectly
Write Skew	T1 and T2 each read a shared condition and write to different rows	No	Yes

Write skew is uniquely dangerous because:

No row is overwritten. Both writes succeed and survive. The data is not corrupted in any conventional sense — both the alice row and the bob row correctly reflect their owners' actions.
No dirty data was read. Both transactions read committed, consistent snapshots.
The invariant lives across rows. The constraint (COUNT(*) >= 1) spans multiple rows. Each transaction checks it correctly against its own snapshot. The invariant is violated only when you look at the combined result of both commits.
Standard locking on individual rows does not help. Alice's lock on her row does not prevent Bob from reading or writing his own row, and vice versa.

This is what the database theory literature calls a check-then-act race on a multi-row predicate. Each actor checks a condition they do not own, acts on a row they do own, and leaves the shared condition violated.

⚙️ Step-by-Step: How Two Transactions Jointly Break a Shared Invariant

The sequence diagram below traces the exact interleaving that produces write skew. Both transactions are running under Snapshot Isolation (the default in PostgreSQL's REPEATABLE READ and many "snapshot" modes) — each sees a perfectly consistent point-in-time view of the database, which is precisely why neither detects a problem.

sequenceDiagram
    participant T1 as T1 - Alice
    participant DB as on_call table
    participant T2 as T2 - Bob

    T1->>DB: BEGIN; SELECT COUNT(*) WHERE status=active
    DB-->>T1: 2 (alice=active, bob=active)
    T2->>DB: BEGIN; SELECT COUNT(*) WHERE status=active
    DB-->>T2: 2 (alice=active, bob=active)
    Note over T1: count=2 > 1, safe to go off call
    Note over T2: count=2 > 1, safe to go off call
    T1->>DB: UPDATE on_call SET status=off_call WHERE doctor=alice
    T2->>DB: UPDATE on_call SET status=off_call WHERE doctor=bob
    T1->>DB: COMMIT
    T2->>DB: COMMIT
    Note over DB: 0 active doctors. Invariant violated.

Read the diagram from top to bottom. Both transactions start by reading the full active count from the same snapshot — both see 2. Each applies the safety check independently and passes. Then each writes to a different row (alice vs. bob) and commits. There is no contention, no lock conflict, no rejected write. The database accepts both commits without complaint because, in isolation, each write is valid. The violation only materialises when you evaluate the invariant across both committed states.

The check-then-act structure is the root of the problem:

Check: Read an aggregate across rows you do not own.
Act: Write to a row you do own, contingent on the check result.
Race: The snapshot used for the check becomes stale between the check and the act, with no mechanism to detect this.

🧠 Deep Dive: How Serializable Snapshot Isolation Detects Write Skew

Serializable Snapshot Isolation (SSI), implemented in PostgreSQL since version 9.1, detects write skew by tracking read-write anti-dependencies between concurrent transactions. Understanding the internals of this mechanism clarifies why snapshot isolation misses the anomaly and why SSI catches it.

The Internals: rw-Anti-Dependency Tracking Inside SSI

An rw-anti-dependency is formed when one transaction reads a version of a row that another concurrent transaction subsequently writes. The reading transaction depended on the "old" version — a version that no longer exists once the writing transaction commits. If two transactions each hold an rw-anti-dependency pointing at the other, a dangerous cycle exists: the result of their concurrent execution cannot be reproduced by any sequential execution order. SSI aborts one transaction when it detects this cycle.

Internally, PostgreSQL's SSI layer maintains a SIREAD lock (a read predicate lock that does not block writes) on every row or page scanned during a transaction. When a concurrent transaction writes to a row covered by a SIREAD lock held by another in-progress transaction, the engine creates an rw-anti-dependency edge in its conflict graph. If this new edge completes a cycle involving two or more rw-anti-dependency edges, the engine marks the involved transactions as unsafe and aborts the later committer.

graph TD
    T1[T1 reads all on_call rows including bob row]
    T2[T2 reads all on_call rows including alice row]
    W1[T1 writes alice row to off_call]
    W2[T2 writes bob row to off_call]

    T1 -->|rw-anti-dep: T1 read bob row that T2 will overwrite| W2
    T2 -->|rw-anti-dep: T2 read alice row that T1 will overwrite| W1
    W1 --> CYCLE[Dangerous cycle - SSI aborts one transaction]
    W2 --> CYCLE

Trace the two anti-dependency edges in this graph. T1 reads bob's row as part of its COUNT(*) scan; T2 then writes a new version of bob's row. This creates an rw-anti-dependency from T1 to T2. Symmetrically, T2 reads alice's row; T1 then writes alice's row, creating an rw-anti-dependency from T2 to T1. The two edges form a directed cycle: T1 → T2 → T1. SSI's dependency tracker detects this cycle and aborts the later-committing transaction (typically T2 — Bob), which then receives a serialization failure error (40001) and must retry. Alice's update succeeds; the invariant is preserved.

This is the key insight of SSI: it does not prevent all concurrent execution — it detects and aborts only the specific interleaving patterns that would produce a non-serializable result, while allowing all safe concurrent transactions to proceed without extra locking overhead.

Performance Analysis: The Cost of Serializable Isolation on Write Skew Workloads

SSI is not free, but its cost is often overstated. The relevant performance dimensions are:

Abort rate and retry overhead. In low-contention workloads — where the same predicate rows are not simultaneously read and written by dozens of concurrent transactions — SSI abort rates are typically below 1%. The hospital on-call scenario is a perfect example: doctors go off-call a handful of times per day, so the overlap window is narrow and aborts are rare. High-contention scenarios (e.g., a rate limiter touching the same user's request count from 50 concurrent nodes) can see abort rates that exceed the retry overhead budget. For those cases, SELECT FOR UPDATE is more appropriate.

SIREAD lock memory overhead. PostgreSQL tracks SIREAD locks in shared memory. Long-running Serializable transactions that scan large tables can accumulate many SIREAD locks, consuming max_pred_locks_per_transaction slots. Queries that read millions of rows (e.g., full-table analytics scans) should not be run in Serializable transactions if they can be avoided — use a separate connection at a lower isolation level for read-only analytics.

Throughput at Serializable vs. Repeatable Read. For write-heavy OLTP workloads with no write skew risk, PostgreSQL's REPEATABLE READ delivers slightly higher throughput because it skips SIREAD lock tracking entirely. The gap narrows for read-heavy workloads since SI's snapshot reads are lockless in both modes. As a practical guideline: start with Serializable for any table that enforces a multi-row invariant, measure abort rates in staging under realistic load, and only fall back to targeted SELECT FOR UPDATE if abort rates materially impact latency SLOs.

📊 From HTTP Request to Empty On-Call Roster: The Full System Architecture View

Write skew does not only exist in toy examples. In a production on-call management system, the anomaly can propagate through multiple layers before causing visible harm. The architecture diagram below shows where write skew injects itself into the request flow.

graph TD
    HTTP1[Alice - HTTP go-off-call request]
    HTTP2[Bob - HTTP go-off-call request]
    SVC[On-Call Availability Service]
    DB[(on_call table - PostgreSQL)]
    SCHED[Emergency Scheduler - polls on-call roster]
    DISP[Emergency Dispatch System]
    ERR[Patient routed with zero on-call doctors]

    HTTP1 --> SVC
    HTTP2 --> SVC
    SVC -->|check + update under Snapshot Isolation| DB
    DB --> SCHED
    SCHED --> DISP
    DISP -->|empty roster query result| ERR

The write skew injection point is the SVC → DB edge. Both concurrent requests reach the On-Call Availability Service, which opens independent database transactions. Under Snapshot Isolation, both transactions read the same consistent snapshot and pass the safety check independently. The Emergency Scheduler downstream polls the database long after both transactions have committed, so it never sees anything anomalous in its own read — it simply finds an empty roster and forwards the request to dispatch with no active doctors attached.

This illustrates why write skew is an operational hazard in any service that enforces multi-row invariants without Serializable isolation: the anomaly is invisible to every component individually and only emerges in the aggregate system state.

⚖️ Why Snapshot Isolation and Repeatable Read Are Not Enough

Snapshot Isolation is widely praised — and for good reason. It eliminates dirty reads, non-repeatable reads, and phantom reads in a single stroke by giving every transaction a consistent point-in-time view of the database. It does this without locking read rows, which gives it excellent throughput characteristics. PostgreSQL's REPEATABLE READ is implemented as Snapshot Isolation. Oracle's default is also effectively Snapshot Isolation.

But Snapshot Isolation has a blind spot: it has no awareness of what other concurrent transactions are about to commit. When Alice's transaction reads the on-call count, it sees a correct, committed snapshot. When she commits her update, the database engine checks only whether anyone else has written to her row (alice) since her snapshot was taken — and nobody has. The engine has no mechanism to check whether Bob's simultaneous update to his row (bob) will, in combination with hers, violate a multi-row invariant.

The core problem is that Snapshot Isolation validates write-write conflicts per row, not read-write conflicts across predicates. Write skew requires a write-skew check: "does the combined effect of all concurrent writes that touch the rows I read violate the condition I checked?" SI was never designed to perform that check. The ANSI SQL standard does not even name write skew as an anomaly SI is supposed to prevent.

This is why the often-heard advice "use Repeatable Read for stronger consistency" is insufficient. REPEATABLE READ prevents a transaction from seeing changed values if it re-reads the same rows. It does nothing to prevent another transaction from writing to a different set of rows that affects the same logical invariant.

Only SERIALIZABLE isolation guarantees that write skew cannot occur.

🧭 Isolation Level Prevention Table: Choosing the Right Defense

The table below maps each standard isolation level to the anomalies it prevents. Read this as a capability matrix — a ✅ means the anomaly is prevented at this level; a ❌ means it can still occur.

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Lost Update	Write Skew
Read Uncommitted	❌	❌	❌	❌	❌
Read Committed	✅	❌	❌	❌	❌
Repeatable Read	✅	✅	Varies*	✅	❌
Serializable	✅	✅	✅	✅	✅

* PostgreSQL's REPEATABLE READ (implemented as Snapshot Isolation) prevents phantom reads. Standard SQL's REPEATABLE READ (as in MySQL InnoDB) does not.

The critical call-out: Write skew is only prevented at SERIALIZABLE. Every level below it — including REPEATABLE READ — still allows write skew. If your application enforces any multi-row invariant in application logic (not via a database constraint), and your isolation level is below SERIALIZABLE, you have a potential write skew vulnerability.

When to use each level:

Situation	Recommended Level	Reasoning
High-throughput reads, no multi-row invariants	Read Committed	Default for most OLTP; safe for single-row operations
Single-row check-then-act (e.g., balance updates)	Repeatable Read	Prevents non-repeatable reads; lost updates handled via row locks
Multi-row invariants checked in app code	Serializable	Only level that prevents write skew
Rare conflicts; retry cost acceptable	Serializable + retry	SSI aborts infrequently; retry logic is usually straightforward
High-contention writes; retry cost too high	Repeatable Read + SELECT FOR UPDATE	Materialises the conflict explicitly; see next section

🌍 Write Skew in the Wild: Booking, Rate Limiting, and Inventory Scenarios

The hospital scenario is illustrative, but write skew appears in any system that enforces a shared capacity or quota across multiple rows or entities.

Double-Booking a Seat

An airline booking system has a reservations table. A business rule states that no flight can be overbooked beyond its seat capacity. Two booking agents simultaneously check the current reservation count for flight AA-101: both see 179 reservations against a 180-seat capacity. Both decide it is safe to add one more. Agent A creates a reservation for passenger Maria; Agent B creates a reservation for passenger James. Both commit. The flight now has 181 reservations for 180 seats, but neither agent's row overwrites the other — write skew.

Rate Limiter Overshoot

A distributed rate limiter enforces a cap of 100 requests per user per minute. Two API gateway nodes each check the request count for user u-9982: both see 99. Both decide the next request is within the limit. Both insert a request log row and increment the counter. The user has now made 101 requests in the minute. The shared invariant (count <= 100) was checked by both nodes, but each wrote to a different log row — write skew.

Inventory Reservation Race

A flash sale system allows reserving limited-edition items. Two checkout services simultaneously query the remaining stock for item SKU-4411: both see 1 unit available. Both create a reservation record for different users. Both commit. Zero units exist, but two users now have confirmed reservations — write skew.

All three scenarios share the same structure: a shared aggregate or count is the invariant, each transaction reads the aggregate, and each transaction writes to a row in the aggregate's domain without touching any other transaction's row. The database sees no conflict. The business sees a violated rule.

🧪 Preventing Write Skew Without Full Serializability: SELECT FOR UPDATE and Optimistic Locking

When you cannot or will not pay the performance cost of full Serializable isolation, several targeted techniques can prevent write skew for specific use cases.

SELECT FOR UPDATE — Materialising the Conflict

The most direct fix is to lock the rows that define the invariant when you read them, not just when you write. By adding FOR UPDATE (or FOR SHARE) to the query that checks the condition, you force the two transactions to serialise on the rows they both need. This is called materialising the conflict — you convert an invisible read-write conflict into a visible write-write conflict, which every isolation level handles correctly.

Instead of:

-- Unsafe under Snapshot Isolation: reads without locking
SELECT COUNT(*) FROM on_call WHERE status = 'active';

Use:

-- Locks the on_call rows matching the predicate — forces serialization
SELECT COUNT(*) FROM on_call WHERE status = 'active' FOR UPDATE;

Now when Alice's transaction runs this query, it acquires locks on all active on-call rows. Bob's transaction attempts to run the same query and blocks until Alice's transaction commits or rolls back. The check is no longer performed against a stale snapshot — it is performed against the locked, current state. The invariant is preserved.

The trade-off: SELECT FOR UPDATE on a large predicate acquires many locks and reduces concurrency. For the hospital scenario, the on-call table is small and this is entirely acceptable. For a rate limiter checking millions of request rows, it is not.

Optimistic Locking with Version Numbers

An alternative for lower-contention scenarios is to track a version column on the rows that constitute the invariant. Each transaction reads the version at check time and includes the version in its update's WHERE clause. If the version has changed by commit time (because another transaction modified those rows), the update affects zero rows, the application detects this, and retries. This avoids database-level locking while still catching concurrent modifications — at the cost of application-level retry logic.

Advisory Locks

PostgreSQL's pg_advisory_xact_lock(key) allows applications to acquire application-defined named locks within a transaction. Any transaction that needs to modify the on-call roster acquires the same advisory lock by a shared key before reading and writing. This serialises all on-call modifications without requiring Serializable isolation for the entire session.

🛠️ PostgreSQL, CockroachDB, and MySQL: What Real Databases Do About Write Skew

Different databases take fundamentally different approaches to write skew. Understanding the defaults is essential before deploying any application that relies on multi-row invariants.

PostgreSQL — SSI Available, Not Default

PostgreSQL's default isolation level is READ COMMITTED, which allows write skew. REPEATABLE READ in PostgreSQL uses Snapshot Isolation and still allows write skew. Only SERIALIZABLE — which uses SSI — prevents it.

-- Enable Serializable isolation for a single transaction
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT COUNT(*) FROM on_call WHERE status = 'active';
UPDATE on_call SET status = 'off_call' WHERE doctor_id = $1;
COMMIT;

-- Set Serializable as the default for an application role
ALTER ROLE on_call_service SET default_transaction_isolation TO 'serializable';

Under SSI, if PostgreSQL detects a dangerous rw-anti-dependency cycle, it aborts one of the conflicting transactions with error code 40001 (serialization failure). Applications must implement a retry loop for this error — typically a simple exponential backoff. For low-contention workloads like on-call management, the abort rate is negligible.

If full SSI is too expensive, use SELECT ... FOR UPDATE to materialise the conflict for the specific rows that define the invariant:

BEGIN;
-- Lock all active on-call rows before counting them
SELECT doctor_id FROM on_call WHERE status = 'active' FOR UPDATE;
-- Now check the count on the locked set
-- (use a second query or evaluate the result set size in application code)
UPDATE on_call SET status = 'off_call' WHERE doctor_id = $1;
COMMIT;

CockroachDB — Serializable by Default

CockroachDB runs at SERIALIZABLE isolation by default, using its own distributed SSI implementation. Write skew is prevented without any configuration. Transactions that form dangerous cycles are automatically aborted and must be retried. CockroachDB's architecture is designed around the expectation of retries and provides a client-side retry protocol. If your stack uses CockroachDB, write skew is not a concern — but your application must still implement retry logic for serialization failures.

-- CockroachDB: no configuration needed; SERIALIZABLE is the default
SHOW TRANSACTION ISOLATION LEVEL;
-- Returns: serializable

MySQL InnoDB — No SSI; Explicit Locking Required

MySQL's default isolation level is REPEATABLE READ, which is implemented using MVCC snapshots — not SSI. This means MySQL is vulnerable to write skew at its default level. Setting SERIALIZABLE in MySQL turns every consistent read into a locking read (effectively SELECT ... FOR SHARE), which prevents write skew but introduces significant contention.

-- MySQL: set SERIALIZABLE to prevent write skew (adds locking to all reads)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Alternative: use explicit SELECT FOR UPDATE on the invariant predicate
START TRANSACTION;
SELECT COUNT(*) FROM on_call WHERE status = 'active' FOR UPDATE;
UPDATE on_call SET status = 'off_call' WHERE doctor_id = ?;
COMMIT;

Oracle — Snapshot Isolation, Vulnerable to Write Skew

Oracle's default isolation level is READ COMMITTED. Its SERIALIZABLE mode is actually Snapshot Isolation, not true serializability — a naming choice that has caused confusion for decades. Under Oracle SERIALIZABLE, write skew is still possible. The only reliable prevention strategy on Oracle is SELECT ... FOR UPDATE on the predicate rows.

📚 Lessons Learned

1. Repeatable Read is not a safety guarantee for multi-row invariants. The name is misleading. REPEATABLE READ guarantees that individual rows you have read will not change if you read them again within your transaction. It says nothing about shared invariants across rows that other transactions are writing to. Every team that confidently assumes REPEATABLE READ = safe and uses check-then-act in application code has a latent write skew bug.

2. Write skew is invisible in logs and traces. Unlike a lost update, no value is overwritten. Unlike a dirty read, no uncommitted data is exposed. The database never records an error. The anomaly only exists as a logical state violation — the kind that only surfaces when a patient arrives at 2:47 AM or a seat is sold twice. Add explicit invariant checks (database constraints, post-commit verification queries, or monitoring assertions) for any multi-row business rule.

3. The fix is not always Serializable isolation. Serializable is the safest choice and should be the default for any system with multi-row invariants. But SELECT FOR UPDATE on the specific predicate rows is often a proportionate and lower-overhead fix for isolated use cases. Understand the trade-off: SSI costs retry logic; SELECT FOR UPDATE costs concurrency on the locked rows.

4. Database-level constraints are the most reliable defense. If the invariant can be expressed as a CHECK constraint, a deferred foreign key, or a partial unique index, let the database enforce it. A CHECK (COUNT(*) >= 1) is not directly expressible in SQL, but patterns like CHECK (is_active_count_valid()) using a stable function, or a trigger-based enforcement, push the check inside the transaction boundary where the database can guarantee atomicity.

5. Distributed systems multiply the risk. In a multi-database or multi-node setup (multiple shards, microservices with separate databases), write skew can occur across service boundaries where no single database's isolation level is even in scope. The hospital scenario played out across two booking services with separate database connections is a distributed write skew — and it requires distributed coordination (two-phase locking, saga compensation, or application-level idempotency) to prevent.

6. Retry logic is not optional when using SSI. PostgreSQL and CockroachDB abort one transaction when they detect a dangerous cycle. If your application does not handle 40001 (PostgreSQL) or RETRY_WRITE_TOO_OLD (CockroachDB) errors with a proper retry loop, the abort will surface as an application error and the invariant will remain intact — but the user's request will fail without retrying. Always implement exponential backoff with jitter for serialization failures.

📌 Summary & Key Takeaways

Write skew is the anomaly that exposes the gap between "individually correct" and "collectively consistent." The five things to carry away:

Write skew requires two conditions: a shared predicate read by multiple concurrent transactions, and distinct writes by each transaction that jointly violate the predicate. If either condition is absent, you have a different anomaly.
Snapshot Isolation (Repeatable Read) does not prevent write skew. Both PostgreSQL's REPEATABLE READ and Oracle's SERIALIZABLE use snapshot isolation and are vulnerable. Only PostgreSQL's SERIALIZABLE (SSI) and CockroachDB's default provide true protection.
The check-then-act pattern is the attack surface. Any time application code reads an aggregate across rows and then writes based on that result, a write skew window exists. Map every such pattern in your codebase to a mitigation: SSI, SELECT FOR UPDATE, or a database constraint.
SELECT FOR UPDATE is the pragmatic workaround. When you cannot change the isolation level globally, locking the predicate rows at read time materialises the conflict and forces serialization on exactly the rows that matter. It is heavier than SSI but more predictable under high contention.
Serializable with retry logic is the correct default for business-critical invariants. The abort rate under SSI for well-designed, low-contention workloads is typically below 1%. The cost of not using it, if a write skew anomaly reaches production, is measured in double-booked flights, overworked doctors, and oversold inventory.

One memorable rule: if your invariant spans more than one row and lives in application code, your isolation level must be Serializable.

Quiet AI help

Explain simpler Compare approaches What next?

Article metadata

Written by

Abstract Algorithms

@abstractalgorithms

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Related deep dives

Softmax Function Explained: From Raw Scores to Probabilities

23 min read

Clock Skew and Causality Violations: Why Distributed Clocks Lie

19 min read

Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural Networks

22 min read

ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQ

14 min · Ann · best next step

Open Collection