All Posts

Isolation Levels in Databases: Read Committed, Repeatable Read, Snapshot, and Serializable Explained

What read committed commits to, why repeatable read still allows phantoms in some databases, and when you actually need serializable isolation

Abstract AlgorithmsAbstract Algorithms
Β·Β·27 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Isolation levels control which concurrency anomalies a transaction can see. Read Committed (PostgreSQL and Oracle's default) prevents dirty reads but still silently allows non-repeatable reads, write skew, and lost updates. Repeatable Read adds snapshot consistency but does not always prevent write skew. Serializable eliminates every known anomaly at the cost of occasional transaction retries. Pick your isolation level by the specific anomaly you need to prevent β€” not by what sounds strongest.


Your application has a bug. Two users simultaneously book the last available seat on a flight. Both see the seat as available. Both complete the purchase. You've now sold the same seat twice.

The fix isn't more validation logic. You could add a check-then-act pattern in your application layer, but two threads can still pass the check simultaneously before either reaches the act. The fix is choosing the right isolation level. Most databases default to Read Committed, which silently allows exactly this kind of double-booking. Understanding why β€” and knowing which level actually stops it β€” is the difference between a system that is correct and one that only looks correct under low traffic.


πŸ“– Why Isolation Exists: The Concurrency Problem at the Heart of Every Database

Isolation is the I in ACID. Atomicity guarantees that a transaction is all-or-nothing. Durability guarantees that committed data survives crashes. Consistency guarantees that the database moves from one valid state to another. But none of those properties help you when two transactions run at the same time and interfere with each other's logic.

Without isolation, concurrent transactions can do devastating things. Transaction A reads a balance, Transaction B immediately updates and commits that balance, and Transaction A reads the balance again β€” and gets a different number. Nothing crashed. No data was lost. The database is technically in a consistent state. But Transaction A made a business decision (say, approving a loan) based on a value that no longer exists by the time the decision executes.

The ideal solution β€” running every transaction one at a time in perfect sequence β€” is called serializable execution. It's completely safe, but it means no two transactions ever overlap, which kills throughput on any database handling hundreds of requests per second. The real world demands a compromise: run transactions concurrently, but provide configurable guarantees about what each transaction is allowed to see.

That compromise is what isolation levels define. Each level draws a different line between safety and speed, and the ANSI SQL standard formalised four of them. Understanding the anomalies each level permits β€” not just what it prevents β€” is the practical skill.

The core tension:

Stronger isolationWeaker isolation
Fewer anomaliesMore anomalies permitted
More locking or conflict detectionLess overhead
Lower throughput under contentionHigher throughput
Easier to reason about application logicRequires careful manual compensation

πŸ” The Four Concurrency Anomalies That Isolation Levels Are Designed to Prevent

Before picking an isolation level, you need a precise vocabulary for what can go wrong. Every isolation level is defined by which anomalies it eliminates. If you skip this section and jump straight to the level names, you'll misapply them.

Dirty Read: Transaction A reads a row that Transaction B has modified but not yet committed. If B later rolls back, A has acted on data that never existed. This is universally considered unacceptable in OLTP systems and is prevented by every mainstream isolation level except Read Uncommitted.

Non-Repeatable Read (Read Skew): Transaction A reads a row, then reads the same row again in the same transaction and gets a different value because Transaction B committed an update in between. The row itself is real and committed both times β€” but A's two reads disagree. This is the "flight booking" read skew: the seat count was available on the first read and gone by the second.

Phantom Read: Transaction A executes a range query (e.g., SELECT * FROM orders WHERE total > 1000) and gets back a result set. Transaction B then inserts a new row that would qualify for that range and commits. If A re-runs the same range query in the same transaction, it sees a new "phantom" row that was not there before. This matters for any logic that depends on a set of rows, not just individual records.

Write Skew: The most subtle anomaly. Transaction A and Transaction B each read an overlapping set of rows to check a constraint, and each writes a non-overlapping row based on that check. Neither transaction conflicts with the other directly, so snapshot-based databases let both commit β€” but the combined result violates the constraint both transactions were trying to protect. The classic example is two doctors both going off call after checking that at least one doctor remains on call.

Lost Update: Transaction A reads a value, Transaction B reads the same value, both compute a new value, and both write back. One write overwrites the other. The result is correct for neither transaction. A bank balance incremented by two concurrent deposits can end up incremented only once.

AnomalyPlain-language descriptionBusiness impact
Dirty readReading uncommitted, potentially rolled-back dataActing on phantom state
Non-repeatable readSame row returns different values in one transactionDecisions based on stale signals
Phantom readA range query returns different row counts in one transactionConstraint checks that silently pass
Write skewBoth transactions pass a shared check; combined writes violate itDouble-booking, overcounting
Lost updateConcurrent read-then-write patterns overwrite each otherBalance miscalculation, inventory errors

βš™οΈ How the Four ANSI SQL Isolation Levels Close the Anomaly Gap

The ANSI SQL standard defines four isolation levels, each more restrictive than the last. They are not just theoretical labels β€” every major database ships with configurable support for at least three of them, and each has a specific implementation mechanism behind it.

Read Uncommitted: The Weakest Floor

Read Uncommitted permits dirty reads. A transaction can see rows written by other transactions that have not yet committed. This is almost never appropriate for transactional work.

It has one legitimate use: approximate analytical queries where a slightly stale or inconsistent snapshot is acceptable and the speed of bypassing any locking or MVCC overhead matters more than precision. A dashboard showing "roughly how many active sessions" can tolerate a small error. A payment processor cannot.

Notably, PostgreSQL does not actually implement Read Uncommitted. When you set it, PostgreSQL silently promotes the transaction to Read Committed. MySQL InnoDB does allow genuine Read Uncommitted behaviour.

-- Set for the current transaction (works on MySQL, SQL Server, Oracle)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN;
SELECT count(*) FROM active_sessions;
COMMIT;

Read Committed: The Deceptively Permissive Default

Read Committed is the default on PostgreSQL and Oracle, and it is the level most production applications run at without deliberately choosing it. It prevents dirty reads: every SELECT in a transaction reads only committed data.

The critical detail that surprises most developers is the word "every SELECT." Under Read Committed, each statement gets a fresh snapshot of the database at the moment it executes. If Transaction B commits an update between two SELECT statements inside Transaction A's same transaction, Transaction A's second SELECT will see B's committed change. The two reads within one transaction can return different values for the same row.

-- Transaction A (Read Committed, PostgreSQL default)
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- returns 100

-- Transaction B runs and commits: UPDATE accounts SET balance = 50 WHERE id = 1;

SELECT balance FROM accounts WHERE id = 1;  -- returns 50 now
-- Same transaction, two different values for the same row.
COMMIT;

This is non-repeatable read in practice. Any application logic that reads a value, makes a conditional decision, and then reads the same value again to act on it is vulnerable. The flight booking scenario from the opening is exactly this pattern: check-then-book without a lock or a higher isolation level.

Read Committed also allows phantom reads, write skew, and lost updates.

When to use it: Most workloads where individual statements are idempotent and consistent reads within one transaction are not required. Pair it with SELECT FOR UPDATE on the rows you intend to modify when you need per-row correctness without the overhead of a higher level.

Repeatable Read: The Snapshot at Transaction Start

Repeatable Read fixes the non-repeatable read problem by taking a snapshot at the start of the transaction. Every SELECT within that transaction reads from the same point-in-time view of the database, regardless of what other transactions commit during its lifetime. The same row will return the same value no matter how many times you read it within a transaction.

-- Set isolation level before beginning the transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- returns 100

-- Transaction B commits an update to this row.

SELECT balance FROM accounts WHERE id = 1;  -- still returns 100 (snapshot)
COMMIT;

The implementation diverges significantly between databases here:

PostgreSQL implements Repeatable Read using MVCC. The snapshot is taken at the start of the transaction, and it naturally excludes new rows committed by other transactions β€” so PostgreSQL's Repeatable Read also prevents phantom reads. This makes PostgreSQL's Repeatable Read effectively equivalent to Snapshot Isolation.

MySQL InnoDB uses gap locks (also called next-key locks) to prevent phantoms. Gap locks prevent other transactions from inserting new rows into the range a query scanned. This is a locking mechanism rather than a version-based one, and it can cause more blocking than PostgreSQL's approach.

What Repeatable Read does not prevent in most databases is write skew. Two transactions can each snapshot the same data, pass the same constraint check, and write to non-overlapping rows that together violate the constraint. The snapshot gives each transaction a consistent view of the world as it was when the transaction started β€” but it cannot see the other transaction's uncommitted writes, and by the time both commit, the damage is done.

Serializable: The Strongest Guarantee

Serializable is the only level that prevents all anomalies, including write skew. A serializable execution behaves as if every transaction ran one at a time, even though they are actually running concurrently. The result is guaranteed to be equivalent to some sequential ordering of those transactions.

There are two main mechanisms for implementing Serializable in practice:

Two-Phase Locking (2PL): The classical approach, used by older databases and MySQL. A transaction acquires S-locks (shared) on every row it reads and X-locks (exclusive) on every row it writes. All locks are held until commit or rollback. This prevents any concurrent transaction from writing data that has been read (rw-conflict) and from reading data that has been written (wr-conflict). It is safe but creates significant blocking.

Serializable Snapshot Isolation (SSI): Used by PostgreSQL since version 9.1. SSI is optimistic: transactions run without acquiring read locks. Instead, the database tracks read-write dependencies between concurrent transactions. If a dangerous cycle is detected β€” meaning two transactions have read data that the other has written, in a pattern that cannot correspond to any serial execution β€” one transaction is aborted with a serialization failure (SQLSTATE 40001). The surviving transaction can be retried.

SSI has lower blocking overhead than 2PL in practice, but it requires applications to handle serialization failures gracefully with retry logic.

-- PostgreSQL: set serializable per transaction
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... your transaction logic ...
COMMIT;

When you need Serializable: Financial double-booking prevention, inventory reservation (the flight seat scenario), any logic that depends on a set of rows all satisfying a constraint before writing, and scheduling systems where overlapping time slots must be prohibited.


🧠 Deep Dive: MVCC, Snapshot Isolation, and Why PostgreSQL Prevents Phantoms Without Gap Locks

The Internals of MVCC: Row Versions, Snapshots, and xmin/xmax

Multi-Version Concurrency Control (MVCC) is the mechanism behind Repeatable Read and Snapshot Isolation in PostgreSQL, Oracle, and SQL Server with RCSI enabled. Instead of locking rows when reading, MVCC keeps multiple versions of each row simultaneously. Every transaction sees the version that was current when its snapshot was taken.

In PostgreSQL, every row in a heap file carries two hidden system columns:

  • xmin: the transaction ID (XID) of the transaction that inserted this row version
  • xmax: the transaction ID of the transaction that deleted (or updated, which creates a new version) this row

When a transaction with ID txid=500 starts, it gets a snapshot β€” a record of which transaction IDs were active (not yet committed) at that moment. A row is visible to transaction 500 if:

  1. xmin is committed and xmin < 500 (or xmin = 500 for rows the transaction itself inserted), and
  2. Either xmax is null (row has not been deleted) or xmax is from a transaction that was still in-progress at snapshot time (meaning the deletion hasn't committed from this transaction's perspective).

This visibility check is evaluated at read time. No lock is acquired on the row. Multiple readers never block writers and writers never block readers β€” they just see different versions of the same row.

When you set isolation level to Repeatable Read in PostgreSQL, the snapshot is taken once at the start of the transaction and reused for all subsequent reads. New rows inserted by other transactions have xmin values greater than the snapshot, so they are invisible. This is why phantom reads do not occur at PostgreSQL's Repeatable Read: new rows simply fail the visibility test.

A simplified visibility pseudocode captures the logic:

function is_visible(row, snapshot):
    if row.xmin not in snapshot.committed:
        return false   # inserting transaction not yet committed
    if row.xmin > snapshot.xmax_txid:
        return false   # inserted after snapshot was taken
    if row.xmax is null:
        return true    # row has not been deleted
    if row.xmax not in snapshot.committed:
        return true    # deleting transaction not yet committed
    if row.xmax > snapshot.xmax_txid:
        return true    # deleted after snapshot was taken
    return false       # row was deleted before our snapshot

Performance Analysis: Concurrency Costs at Each Isolation Level

The performance story of isolation levels is about two different bottlenecks: lock contention and serialization failures.

Read Committed achieves high throughput because each statement gets a fresh snapshot. No read locks are held between statements. Writes acquire row-level locks, but only for the duration of the write itself. The cost is paid in application complexity: you must reason about every read-modify-write pattern and protect it manually (typically via SELECT FOR UPDATE).

Repeatable Read / Snapshot Isolation has slightly more overhead because the snapshot must be maintained for the entire duration of the transaction. In PostgreSQL, this means MVCC version tracking lives for longer, and old row versions cannot be vacuumed until all transactions holding a snapshot that needs them have committed. Long-running Repeatable Read transactions can cause table bloat if not managed carefully. Still, there are no read locks, so readers never block writers and writers never block readers.

Serializable with 2PL (MySQL) holds shared locks on every read, which means readers can block writers. Under high contention workloads, this leads to lock queue buildup and significantly lower throughput. Deadlocks become more frequent.

Serializable with SSI (PostgreSQL) avoids most lock contention but introduces serialization failures. When SSI detects a dangerous rw-dependency cycle, it aborts one of the conflicting transactions. An application running at Serializable must implement a retry loop. Under low conflict rates (most real OLTP workloads), the failure rate is low and SSI throughput is close to Snapshot Isolation. Under adversarial contention, failure rates can spike and retry logic must implement exponential backoff.

LevelRead lock held?Write block readers?Serialization failures?Throughput (relative)
Read CommittedNoNoNoHighest
Repeatable Read (PG)NoNoNoHigh
Repeatable Read (MySQL)Gap locksOccasionallyNoModerate
Serializable (SSI, PG)SIRead tracksNoYesModerate
Serializable (2PL, MySQL)Yes (S-lock)YesNo (deadlocks instead)Lower

πŸ“Š Visualizing the Anomalies: Non-Repeatable Reads and Write Skew in Sequence

Two scenarios cause the most confusion in practice because they can't be seen by inspecting the data after the fact β€” only by tracing the interleaved execution of concurrent transactions.

Scenario 1: Non-Repeatable Read under Read Committed

Under Read Committed, each SELECT gets a new snapshot. Transaction 1 reads a balance, Transaction 2 commits an update, and Transaction 1's second read of the same row returns the new value. The diagram below shows the exact interleaving that produces this anomaly.

sequenceDiagram
    participant T1 as Transaction 1
    participant DB as Database
    participant T2 as Transaction 2

    T1->>DB: BEGIN (Read Committed)
    T1->>DB: SELECT balance WHERE id=1
    DB-->>T1: 100
    T2->>DB: BEGIN
    T2->>DB: UPDATE accounts SET balance=50 WHERE id=1
    T2->>DB: COMMIT
    T1->>DB: SELECT balance WHERE id=1
    DB-->>T1: 50
    Note over T1,DB: Same transaction sees two values for the same row
    T1->>DB: COMMIT

Transaction 1 sees balance = 100 on the first read and balance = 50 on the second, even though it never modified the row. If T1's logic branches on the first value, it may produce a result that is inconsistent with the committed state by the time it commits.

Scenario 2: Write Skew under Snapshot Isolation

Write skew is invisible to both transactions because each reads from a consistent snapshot that shows both doctors on call, and each writes to a different row (their own on-call status). Neither write individually violates any single-row constraint. The combined result β€” zero doctors on call β€” is only visible after both transactions commit.

sequenceDiagram
    participant DA as Doctor A Tx
    participant DB as Database
    participant DrB as Doctor B Tx

    DA->>DB: BEGIN (Repeatable Read)
    DrB->>DB: BEGIN (Repeatable Read)
    DA->>DB: SELECT count(*) FROM on_call WHERE shift=today
    DB-->>DA: count = 2
    DrB->>DB: SELECT count(*) FROM on_call WHERE shift=today
    DB-->>DrB: count = 2
    DA->>DB: UPDATE on_call SET status=off WHERE doctor=Alice
    DrB->>DB: UPDATE on_call SET status=off WHERE doctor=Bob
    DA->>DB: COMMIT
    DrB->>DB: COMMIT
    Note over DA,DrB: Zero doctors on call. Constraint is violated.

Both doctors checked and saw two on-call. Each individually saw a safe state. Their combined writes violate the invariant that at least one doctor must remain on call. Only Serializable isolation β€” which detects the rw-dependency cycle between these two transactions β€” would have aborted one of them.


🌍 How Isolation Levels Differ Across MySQL, PostgreSQL, Oracle, and Modern Distributed Databases

Not all databases implement the same levels with the same mechanisms, and some labels are misleading. Oracle calls its highest level "Serializable" but actually implements Snapshot Isolation, meaning write skew is still possible. Understanding the real behaviour of your specific database matters more than knowing the label.

DatabaseDefault LevelRead UncommittedSnapshot IsolationSSINotes
PostgreSQLRead CommittedMapped to RCYes (as Repeatable Read)Yes (Serializable)Repeatable Read = true SI
MySQL InnoDBRepeatable ReadYesPartial (via gap locks)NoGap locks prevent phantoms
OracleRead CommittedNoYesNo"Serializable" is actually SI; write skew possible
SQL ServerRead CommittedYesYes (RCSI / SI modes)NoRCSI enabled per database; SI opt-in
SQLiteSerializableNoLimited (WAL mode)NoSingle-writer model; default is effectively serializable
CockroachDBSerializableNoYesYes (default)SSI by default; designed for distributed correctness
MongoDB (multi-doc tx)SnapshotNoYesNoMulti-doc transactions require replica set
CosmosDB (Strong)LinearizableNoYesNoStrong consistency = serializable reads globally

The most important takeaway from this table: Oracle's "Serializable" is not ANSI Serializable. If your Oracle application relies on "Serializable" to prevent write skew, it will fail. PostgreSQL's "Serializable" is genuine SSI and does prevent write skew. CockroachDB makes SSI the default and is the only widely-used database where you cannot accidentally run at a weaker level.


βš–οΈ The Consistency vs. Throughput Trade-off Every Isolation Level Forces You to Accept

Every isolation level is a trade-off contract, not just a safety level you can ratchet up for free. Understanding what you are actually trading away at each step is essential for making the right choice.

Choosing Read Committed means accepting that your application logic must manually protect every read-modify-write cycle. You get the best throughput and the widest driver support, but you are responsible for adding SELECT FOR UPDATE wherever a concurrent modification would corrupt your logic. Miss one place and you have a silent bug that only manifests under load.

Choosing Repeatable Read means accepting that long-running transactions can hold snapshots that prevent PostgreSQL's vacuum process from reclaiming old row versions. Table bloat is a real operational concern on high-write tables. The benefit is that read consistency within a transaction is automatic β€” you don't need to reason about each SELECT.

Choosing Serializable means accepting serialization failures. PostgreSQL's SSI is optimistic: it aborts the transaction that would cause a non-serializable execution. Your application must implement retry logic, and that logic must be correct (idempotent, with exponential backoff). Forgetting to retry means your application silently fails operations under contention instead of producing wrong results β€” which is safer, but still an outage.

The false security of Snapshot Isolation: Many developers know that Repeatable Read prevents non-repeatable reads and phantoms, so they assume it is "safe enough." It is safe enough for many workloads. But any logic of the form "read a set of rows, verify a constraint holds across all of them, then write a new row based on that set" is vulnerable to write skew at Snapshot Isolation. If the constraint involves multiple rows and multiple concurrent writers, only Serializable is correct.


🧭 How to Choose the Right Isolation Level for Your Workload

The decision is not "use the strongest level everywhere." That would make Serializable the universal default, but Serializable with SSI still introduces serialization failures, and Serializable with 2PL causes more blocking. The right approach is to identify the specific anomaly you are protecting against and choose the minimum level that prevents it.

flowchart TD
    A[What anomaly are you protecting against?] --> B{Write skew risk - logic reads a set and writes based on it}
    B -->|Yes| C[Serializable - only level that prevents write skew]
    B -->|No| D{Need consistent reads within one transaction?}
    D -->|Yes| E[Repeatable Read - snapshot taken at transaction start]
    D -->|No| F{High throughput, per-row locking acceptable?}
    F -->|Yes| G[Read Committed plus SELECT FOR UPDATE on hot rows]
    F -->|No| H{Approximate reads acceptable - analytics only?}
    H -->|Yes| I[Read Uncommitted or Read Committed with no FOR UPDATE]
    H -->|No| G
    C --> J[Add retry loop for serialization failures SQLSTATE 40001]
    E --> K[Monitor for table bloat on long-running transactions]

This flowchart guides you from the anomaly down to the mechanism. The branching logic is:

  • Write skew? If your transaction reads a shared aggregate or set, verifies a constraint, and then writes a new row based on that check, write skew is your risk. Only Serializable stops it.
  • Consistent reads within a transaction? If your transaction makes business decisions based on two reads of the same row being consistent, you need at least Repeatable Read.
  • Per-row correctness at Read Committed? Use SELECT ... FOR UPDATE to lock the specific rows you intend to modify. This is the right tool for the flight booking scenario at Read Committed: lock the seat row before checking availability and booking.
  • Analytics approximation? Read Uncommitted or Read Committed without locks is fine for dashboards where a 0.01% error rate is acceptable.

Common real-world mappings:

Use caseRecommended levelRationale
Flight seat bookingRead Committed + SELECT FOR UPDATELock the specific seat row; prevents lost update
Bank transfer (debit+credit)SerializableWrite skew can corrupt balances when accounts are disjoint
Report generationRead Committed or Repeatable ReadConsistent snapshot; no writes
Inventory reservationSerializable or SELECT FOR UPDATEMust prevent oversell under concurrent demand
User preference updateRead CommittedSingle-row update; no cross-row constraint
Doctor on-call schedulingSerializableClassic write skew scenario

πŸ§ͺ The Doctors Problem: Why Snapshot Isolation Lets Two Correct Transactions Produce a Wrong Result

The doctors on-call scenario is the canonical write skew demonstration. It is worth tracing precisely because it reveals the exact gap between Snapshot Isolation and Serializable, and because the fix is not obvious.

Here is the setup: two doctors, Alice and Bob, are both on call for tonight's shift. Hospital policy requires at least one doctor to be on call at all times. Alice feels ill and wants to go off call. She checks that Bob is available and takes herself off. At the same moment, Bob checks that Alice is available and takes himself off. Both checks were correct at the time they ran. Both writes are valid in isolation. But the combined result β€” zero doctors on call β€” violates the hospital's invariant.

Under Snapshot Isolation (Repeatable Read in PostgreSQL), both transactions run against the same consistent snapshot. Each sees two doctors on call. Neither transaction writes to the row the other transaction reads. There is no row-level conflict. Both commits succeed.

Under Serializable SSI, the PostgreSQL engine tracks that Transaction A read data that Transaction B will write (on_call for Bob's row) and that Transaction B read data that Transaction A will write (on_call for Alice's row). This mutual rw-dependency is exactly the pattern SSI is designed to detect. When Transaction B tries to commit, PostgreSQL identifies the cycle and aborts it with serialization error 40001. Alice's transaction succeeds; Bob's is retried. On retry, Bob's transaction now sees that Alice is already off call, finds only one doctor remaining, and properly declines to go off call itself.

The fix at the application level (without upgrading to Serializable) is a materialised lock: explicitly insert a lock row or increment a shared counter before the check-and-write. For example, updating a shared shift row to increment a version column in both transactions forces a write conflict that the database's row-level locking can detect and serialize. This is a pattern used in booking and scheduling systems that cannot run at Serializable due to throughput constraints.


πŸ› οΈ PostgreSQL Serializable Snapshot Isolation in Practice

PostgreSQL implemented SSI in version 9.1 and it is the safest way to run complex transactional logic without introducing application-level locking patterns. Here is how to use it effectively.

Enabling SSI per transaction:

-- Set at the connection level for all transactions
SET default_transaction_isolation = 'serializable';

-- Or set per transaction (preferred β€” isolate only high-risk transactions)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM on_call WHERE shift = current_date AND status = 'active';
-- If count > 1, go off call
UPDATE on_call SET status = 'off' WHERE doctor_id = $1 AND shift = current_date;
COMMIT;

Handling serialization failures in Java with JDBC:

When SSI aborts a transaction due to a detected cycle, it raises SQLState 40001. Any correct application using Serializable must catch this and retry:

int maxRetries = 5;
for (int attempt = 0; attempt < maxRetries; attempt++) {
    try (Connection conn = dataSource.getConnection()) {
        conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
        conn.setAutoCommit(false);
        try {
            // application logic
            performOnCallUpdate(conn);
            conn.commit();
            return; // success
        } catch (SQLException e) {
            conn.rollback();
            if ("40001".equals(e.getSQLState())) {
                // Serialization failure β€” safe to retry
                long backoff = (long) Math.pow(2, attempt) * 100;
                Thread.sleep(backoff);
            } else {
                throw e; // non-retryable error
            }
        }
    }
}
throw new RuntimeException("Transaction failed after " + maxRetries + " retries");

Monitoring serialization conflicts:

PostgreSQL exposes serialization activity through pg_stat_database:

-- Check how many serialization failures have occurred database-wide
SELECT datname,
       conflicts,
       deadlocks
FROM pg_stat_database
WHERE datname = current_database();

A rising conflicts count is normal under SSI. A spike in conflicts combined with slow application response times signals that your retry logic is being exercised heavily and you should consider whether the conflicting transactions can be rewritten to reduce their rw-dependency footprint β€” for example, by narrowing the set of rows each transaction reads.

What SSI does NOT protect against:

SSI protects against concurrent transactions. It does not protect against application-level bugs that perform check-then-act logic across separate transactions. If your doctor on-call check and the update happen in two separate database transactions β€” even if both are at Serializable β€” there is a gap between commit and the start of the next transaction where another transaction can interleave. All correctness guarantees from SSI apply within a single transaction's boundaries.

For a deep-dive on PostgreSQL's transaction internals including WAL, MVCC vacuum, and snapshot management, see the companion post on How PostgreSQL's Write-Ahead Log Works when available.


πŸ“š Lessons Learned from Isolation Level Bugs in Production

The Read Committed + check-then-act trap: The most common production isolation bug is application code that reads a value, applies business logic, and writes back β€” all under Read Committed, with no SELECT FOR UPDATE. Under low load, transactions don't overlap. Under peak traffic, they do, and the lost update happens. The fix is always to add FOR UPDATE to the read that initiates the write, not to add more validation at the application layer.

"Serializable" in Oracle is not ANSI Serializable: Engineers who switch from PostgreSQL to Oracle and keep their isolation level at "Serializable" discover write skew bugs later. Oracle's Serializable is actually Snapshot Isolation. If your application depends on write skew prevention, it will fail on Oracle. Verify your database's actual implementation against the anomaly table β€” never trust the label alone.

Serializable everywhere breaks applications that don't handle retries: Turning on Serializable at the connection pool level without adding retry logic to every transaction turns serialization failures into application errors. Under any meaningful concurrency, SQLSTATE 40001 will appear. The fix is not to turn off Serializable β€” the fix is to make every transaction in your application retry-safe.

Long-running Repeatable Read transactions on PostgreSQL cause table bloat: A transaction running at Repeatable Read (or Serializable) holds a snapshot. PostgreSQL's vacuum process cannot reclaim row versions that are still visible to that snapshot. A background job that runs a 20-minute report under Repeatable Read while your tables are taking heavy writes will prevent vacuum from running and cause table bloat. Either run reports with shorter transactions, use SET LOCAL to switch isolation mid-session, or run reports against a read replica.

Snapshot Isolation prevents most common anomalies but write skew needs explicit testing: Most OLTP workloads at Repeatable Read are fine in practice. Write skew only manifests when multiple concurrent transactions share a constraint across non-overlapping rows. Test for write skew explicitly with concurrent load tests before concluding that Repeatable Read is sufficient for your workload.


πŸ“Œ TLDR & Key Takeaways

  • Isolation is the "I" in ACID. It controls which concurrency anomalies a transaction is allowed to see.
  • Read Committed (PostgreSQL and Oracle default) prevents dirty reads. It silently allows non-repeatable reads, phantom reads, write skew, and lost updates. Most OLTP bugs at scale trace back to this level.
  • Repeatable Read takes a snapshot at transaction start. PostgreSQL prevents phantoms via MVCC. MySQL uses gap locks. Neither prevents write skew in most implementations.
  • Snapshot Isolation (PostgreSQL's Repeatable Read, Oracle's Serializable, SQL Server RCSI) prevents most anomalies but is not equivalent to Serializable β€” write skew remains possible.
  • Serializable is the only level that prevents all anomalies, including write skew. PostgreSQL uses SSI (optimistic, cycle detection). MySQL uses 2PL (lock-based, more blocking).
  • SSI requires retry logic. Serialization failures (SQLSTATE 40001) are a normal operating condition, not errors. Applications that use Serializable must catch 40001 and retry.
  • Oracle's "Serializable" label is misleading. It implements Snapshot Isolation, not true Serializable. Write skew is still possible.
  • For the flight booking scenario: Read Committed + SELECT FOR UPDATE on the seat row is the practical fix. Serializable is correct but may be overkill for this single-row case.
  • For the doctors on-call scenario: Only Serializable prevents this. Snapshot Isolation (Repeatable Read) will allow both commits to succeed and leave zero doctors on call.

πŸ“ Practice Quiz

  1. A PostgreSQL application runs at the default isolation level. Transaction A reads an account balance of $500. Transaction B then commits a debit that reduces the balance to $300. Transaction A reads the same account again and sees $300. Which anomaly has occurred?

    • A) Dirty read
    • B) Non-repeatable read
    • C) Phantom read
    • D) Deadlock

    Correct Answer: B. Read Committed gives each statement a fresh snapshot, so Transaction A's second read sees Transaction B's committed update.

  2. An engineer argues that upgrading to Repeatable Read on PostgreSQL will prevent all anomalies except lost updates. What is the most critical error in this claim?

    • A) Repeatable Read on PostgreSQL also prevents lost updates
    • B) Repeatable Read does not prevent phantom reads on PostgreSQL
    • C) Repeatable Read on PostgreSQL still allows write skew
    • D) Repeatable Read is not supported on PostgreSQL

    Correct Answer: C. Write skew remains possible under Snapshot Isolation (PostgreSQL's Repeatable Read) because two transactions can each read overlapping data and write non-overlapping rows without creating a row-level conflict.

  3. What does SQLSTATE 40001 indicate in a PostgreSQL application running at Serializable isolation?

    • A) A syntax error in the SQL statement
    • B) A foreign key constraint violation
    • C) A serialization failure β€” the transaction was aborted due to a detected rw-dependency cycle
    • D) A deadlock between two transactions

    Correct Answer: C. SSI aborts one transaction in a dangerous rw-dependency cycle. The correct response is to retry the transaction, typically with exponential backoff.

  4. Oracle's documentation labels its highest isolation level "Serializable." Which anomaly does Oracle's Serializable still permit, which would be prevented by PostgreSQL's Serializable?

    • A) Dirty read
    • B) Non-repeatable read
    • C) Phantom read
    • D) Write skew

    Correct Answer: D. Oracle's "Serializable" is actually Snapshot Isolation. It prevents dirty reads, non-repeatable reads, and phantom reads, but write skew is still possible because there is no SSI cycle detection.

  5. Open-ended challenge: Your team runs a ticket reservation system where users reserve seats at a music venue. The system uses PostgreSQL at Read Committed. Under a surge of 500 concurrent users booking the last 10 seats, you observe that 15 users successfully complete purchases even though only 10 seats exist. Describe two different strategies β€” one that stays at Read Committed and one that uses Serializable β€” to prevent overselling. What are the operational trade-offs of each approach?


Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms