Isolation Levels in Databases: Read Committed, Repeatable Read, Snapshot, and Serializable Explained

What read committed commits to, why repeatable read still allows phantoms in some databases, and when you actually need serializable isolation

System Design Interview Prep

Abstract Algorithms

·Apr 5, 2026·25 min read

Cover Image for Isolation Levels in Databases: Read Committed, Repeatable Read, Snapshot, and Serializable Explained

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 25 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Isolation levels control which concurrency anomalies a transaction can see. Read Committed (PostgreSQL and Oracle's default) prevents dirty reads but still silently allows non-repeatable reads, write skew, and lost updates. Repeatable Read adds snapshot consistency but does not always prevent write skew. Serializable eliminates every known anomaly at the cost of occasional transaction retries. Pick your isolation level by the specific anomaly you need to prevent — not by what sounds strongest.

Your application has a bug. Two users simultaneously book the last available seat on a flight. Both see the seat as available. Both complete the purchase. You've now sold the same seat twice.

The fix isn't more validation logic. You could add a check-then-act pattern in your application layer, but two threads can still pass the check simultaneously before either reaches the act. The fix is choosing the right isolation level. Most databases default to Read Committed, which silently allows exactly this kind of double-booking. Understanding why — and knowing which level actually stops it — is the difference between a system that is correct and one that only looks correct under low traffic.

📖 Why Isolation Exists: The Concurrency Problem at the Heart of Every Database

Isolation is the I in ACID. Atomicity guarantees that a transaction is all-or-nothing. Durability guarantees that committed data survives crashes. Consistency guarantees that the database moves from one valid state to another. But none of those properties help you when two transactions run at the same time and interfere with each other's logic.

Without isolation, concurrent transactions can do devastating things. Transaction A reads a balance, Transaction B immediately updates and commits that balance, and Transaction A reads the balance again — and gets a different number. Nothing crashed. No data was lost. The database is technically in a consistent state. But Transaction A made a business decision (say, approving a loan) based on a value that no longer exists by the time the decision executes.

The ideal solution — running every transaction one at a time in perfect sequence — is called serializable execution. It's completely safe, but it means no two transactions ever overlap, which kills throughput on any database handling hundreds of requests per second. The real world demands a compromise: run transactions concurrently, but provide configurable guarantees about what each transaction is allowed to see.

That compromise is what isolation levels define. Each level draws a different line between safety and speed, and the ANSI SQL standard formalised four of them. Understanding the anomalies each level permits — not just what it prevents — is the practical skill.

The core tension:

Stronger isolation	Weaker isolation
Fewer anomalies	More anomalies permitted
More locking or conflict detection	Less overhead
Lower throughput under contention	Higher throughput
Easier to reason about application logic	Requires careful manual compensation

🔍 The Four Concurrency Anomalies That Isolation Levels Are Designed to Prevent

Before picking an isolation level, you need a precise vocabulary for what can go wrong. Every isolation level is defined by which anomalies it eliminates. If you skip this section and jump straight to the level names, you'll misapply them.

Dirty Read: Transaction A reads a row that Transaction B has modified but not yet committed. If B later rolls back, A has acted on data that never existed. This is universally considered unacceptable in OLTP systems and is prevented by every mainstream isolation level except Read Uncommitted.

Non-Repeatable Read (Read Skew): Transaction A reads a row, then reads the same row again in the same transaction and gets a different value because Transaction B committed an update in between. The row itself is real and committed both times — but A's two reads disagree. This is the "flight booking" read skew: the seat count was available on the first read and gone by the second.

Phantom Read: Transaction A executes a range query (e.g., SELECT * FROM orders WHERE total > 1000) and gets back a result set. Transaction B then inserts a new row that would qualify for that range and commits. If A re-runs the same range query in the same transaction, it sees a new "phantom" row that was not there before. This matters for any logic that depends on a set of rows, not just individual records.

Write Skew: The most subtle anomaly. Transaction A and Transaction B each read an overlapping set of rows to check a constraint, and each writes a non-overlapping row based on that check. Neither transaction conflicts with the other directly, so snapshot-based databases let both commit — but the combined result violates the constraint both transactions were trying to protect. The classic example is two doctors both going off call after checking that at least one doctor remains on call.

Lost Update: Transaction A reads a value, Transaction B reads the same value, both compute a new value, and both write back. One write overwrites the other. The result is correct for neither transaction. A bank balance incremented by two concurrent deposits can end up incremented only once.

Anomaly	Plain-language description	Business impact
Dirty read	Reading uncommitted, potentially rolled-back data	Acting on phantom state
Non-repeatable read	Same row returns different values in one transaction	Decisions based on stale signals
Phantom read	A range query returns different row counts in one transaction	Constraint checks that silently pass
Write skew	Both transactions pass a shared check; combined writes violate it	Double-booking, overcounting
Lost update	Concurrent read-then-write patterns overwrite each other	Balance miscalculation, inventory errors

⚙️ How the Four ANSI SQL Isolation Levels Close the Anomaly Gap

The ANSI SQL standard defines four isolation levels, each more restrictive than the last. They are not just theoretical labels — every major database ships with configurable support for at least three of them, and each has a specific implementation mechanism behind it.

Read Uncommitted: The Weakest Floor

Read Uncommitted permits dirty reads. A transaction can see rows written by other transactions that have not yet committed. This is almost never appropriate for transactional work.

It has one legitimate use: approximate analytical queries where a slightly stale or inconsistent snapshot is acceptable and the speed of bypassing any locking or MVCC overhead matters more than precision. A dashboard showing "roughly how many active sessions" can tolerate a small error. A payment processor cannot.

Notably, PostgreSQL does not actually implement Read Uncommitted. When you set it, PostgreSQL silently promotes the transaction to Read Committed. MySQL InnoDB does allow genuine Read Uncommitted behaviour.

-- Set for the current transaction (works on MySQL, SQL Server, Oracle)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN;
SELECT count(*) FROM active_sessions;
COMMIT;

Read Committed: The Deceptively Permissive Default

Read Committed is the default on PostgreSQL and Oracle, and it is the level most production applications run at without deliberately choosing it. It prevents dirty reads: every SELECT in a transaction reads only committed data.

The critical detail that surprises most developers is the word "every SELECT." Under Read Committed, each statement gets a fresh snapshot of the database at the moment it executes. If Transaction B commits an update between two SELECT statements inside Transaction A's same transaction, Transaction A's second SELECT will see B's committed change. The two reads within one transaction can return different values for the same row.

-- Transaction A (Read Committed, PostgreSQL default)
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- returns 100

-- Transaction B runs and commits: UPDATE accounts SET balance = 50 WHERE id = 1;

SELECT balance FROM accounts WHERE id = 1;  -- returns 50 now
-- Same transaction, two different values for the same row.
COMMIT;

This is non-repeatable read in practice. Any application logic that reads a value, makes a conditional decision, and then reads the same value again to act on it is vulnerable. The flight booking scenario from the opening is exactly this pattern: check-then-book without a lock or a higher isolation level.

Read Committed also allows phantom reads, write skew, and lost updates.

When to use it: Most workloads where individual statements are idempotent and consistent reads within one transaction are not required. Pair it with SELECT FOR UPDATE on the rows you intend to modify when you need per-row correctness without the overhead of a higher level.

Repeatable Read: The Snapshot at Transaction Start

Repeatable Read fixes the non-repeatable read problem by taking a snapshot at the start of the transaction. Every SELECT within that transaction reads from the same point-in-time view of the database, regardless of what other transactions commit during its lifetime. The same row will return the same value no matter how many times you read it within a transaction.

-- Set isolation level before beginning the transaction
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT balance FROM accounts WHERE id = 1;  -- returns 100

-- Transaction B commits an update to this row.

SELECT balance FROM accounts WHERE id = 1;  -- still returns 100 (snapshot)
COMMIT;

The implementation diverges significantly between databases here:

PostgreSQL implements Repeatable Read using MVCC. The snapshot is taken at the start of the transaction, and it naturally excludes new rows committed by other transactions — so PostgreSQL's Repeatable Read also prevents phantom reads. This makes PostgreSQL's Repeatable Read effectively equivalent to Snapshot Isolation.

MySQL InnoDB uses gap locks (also called next-key locks) to prevent phantoms. Gap locks prevent other transactions from inserting new rows into the range a query scanned. This is a locking mechanism rather than a version-based one, and it can cause more blocking than PostgreSQL's approach.

What Repeatable Read does not prevent in most databases is write skew. Two transactions can each snapshot the same data, pass the same constraint check, and write to non-overlapping rows that together violate the constraint. The snapshot gives each transaction a consistent view of the world as it was when the transaction started — but it cannot see the other transaction's uncommitted writes, and by the time both commit, the damage is done.

Serializable: The Strongest Guarantee

Serializable is the only level that prevents all anomalies, including write skew. A serializable execution behaves as if every transaction ran one at a time, even though they are actually running concurrently. The result is guaranteed to be equivalent to some sequential ordering of those transactions.

There are two main mechanisms for implementing Serializable in practice:

Two-Phase Locking (2PL): The classical approach, used by older databases and MySQL. A transaction acquires S-locks (shared) on every row it reads and X-locks (exclusive) on every row it writes. All locks are held until commit or rollback. This prevents any concurrent transaction from writing data that has been read (rw-conflict) and from reading data that has been written (wr-conflict). It is safe but creates significant blocking.

Serializable Snapshot Isolation (SSI): Used by PostgreSQL since version 9.1. SSI is optimistic: transactions run without acquiring read locks. Instead, the database tracks read-write dependencies between concurrent transactions. If a dangerous cycle is detected — meaning two transactions have read data that the other has written, in a pattern that cannot correspond to any serial execution — one transaction is aborted with a serialization failure (SQLSTATE 40001). The surviving transaction can be retried.

SSI has lower blocking overhead than 2PL in practice, but it requires applications to handle serialization failures gracefully with retry logic.

-- PostgreSQL: set serializable per transaction
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... your transaction logic ...
COMMIT;

When you need Serializable: Financial double-booking prevention, inventory reservation (the flight seat scenario), any logic that depends on a set of rows all satisfying a constraint before writing, and scheduling systems where overlapping time slots must be prohibited.

🧠 Deep Dive: MVCC, Snapshot Isolation, and Why PostgreSQL Prevents Phantoms Without Gap Locks

The Internals of MVCC: Row Versions, Snapshots, and xmin/xmax

Multi-Version Concurrency Control (MVCC) is the mechanism behind Repeatable Read and Snapshot Isolation in PostgreSQL, Oracle, and SQL Server with RCSI enabled. Instead of locking rows when reading, MVCC keeps multiple versions of each row simultaneously. Every transaction sees the version that was current when its snapshot was taken.

In PostgreSQL, every row in a heap file carries two hidden system columns:

xmin: the transaction ID (XID) of the transaction that inserted this row version
xmax: the transaction ID of the transaction that deleted (or updated, which creates a new version) this row

When a transaction with ID txid=500 starts, it gets a snapshot — a record of which transaction IDs were active (not yet committed) at that moment. A row is visible to transaction 500 if:

xmin is committed and xmin < 500 (or xmin = 500 for rows the transaction itself inserted), and
Either xmax is null (row has not been deleted) or xmax is from a transaction that was still in-progress at snapshot time (meaning the deletion hasn't committed from this transaction's perspective).

This visibility check is evaluated at read time. No lock is acquired on the row. Multiple readers never block writers and writers never block readers — they just see different versions of the same row.

When you set isolation level to Repeatable Read in PostgreSQL, the snapshot is taken once at the start of the transaction and reused for all subsequent reads. New rows inserted by other transactions have xmin values greater than the snapshot, so they are invisible. This is why phantom reads do not occur at PostgreSQL's Repeatable Read: new rows simply fail the visibility test.

A simplified visibility pseudocode captures the logic:

function is_visible(row, snapshot):
    if row.xmin not in snapshot.committed:
        return false   # inserting transaction not yet committed
    if row.xmin > snapshot.xmax_txid:
        return false   # inserted after snapshot was taken
    if row.xmax is null:
        return true    # row has not been deleted
    if row.xmax not in snapshot.committed:
        return true    # deleting transaction not yet committed
    if row.xmax > snapshot.xmax_txid:
        return true    # deleted after snapshot was taken
    return false       # row was deleted before our snapshot

Performance Analysis: Concurrency Costs at Each Isolation Level

The performance story of isolation levels is about two different bottlenecks: lock contention and serialization failures.

Read Committed achieves high throughput because each statement gets a fresh snapshot. No read locks are held between statements. Writes acquire row-level locks, but only for the duration of the write itself. The cost is paid in application complexity: you must reason about every read-modify-write pattern and protect it manually (typically via SELECT FOR UPDATE).

Repeatable Read / Snapshot Isolation has slightly more overhead because the snapshot must be maintained for the entire duration of the transaction. In PostgreSQL, this means MVCC version tracking lives for longer, and old row versions cannot be vacuumed until all transactions holding a snapshot that needs them have committed. Long-running Repeatable Read transactions can cause table bloat if not managed carefully. Still, there are no read locks, so readers never block writers and writers never block readers.

Serializable with 2PL (MySQL) holds shared locks on every read, which means readers can block writers. Under high contention workloads, this leads to lock queue buildup and significantly lower throughput. Deadlocks become more frequent.

Serializable with SSI (PostgreSQL) avoids most lock contention but introduces serialization failures. When SSI detects a dangerous rw-dependency cycle, it aborts one of the conflicting transactions. An application running at Serializable must implement a retry loop. Under low conflict rates (most real OLTP workloads), the failure rate is low and SSI throughput is close to Snapshot Isolation. Under adversarial contention, failure rates can spike and retry logic must implement exponential backoff.

Level	Read lock held?	Write block readers?	Serialization failures?	Throughput (relative)
Read Committed	No	No	No	Highest
Repeatable Read (PG)	No	No	No	High
Repeatable Read (MySQL)	Gap locks	Occasionally	No	Moderate
Serializable (SSI, PG)	SIRead tracks	No	Yes	Moderate
Serializable (2PL, MySQL)	Yes (S-lock)	Yes	No (deadlocks instead)	Lower

📊 Visualizing the Anomalies: Non-Repeatable Reads and Write Skew in Sequence

Two scenarios cause the most confusion in practice because they can't be seen by inspecting the data after the fact — only by tracing the interleaved execution of concurrent transactions.

Scenario 1: Non-Repeatable Read under Read Committed

Under Read Committed, each SELECT gets a new snapshot. Transaction 1 reads a balance, Transaction 2 commits an update, and Transaction 1's second read of the same row returns the new value. The diagram below shows the exact interleaving that produces this anomaly.

sequenceDiagram
    participant T1 as Transaction 1
    participant DB as Database
    participant T2 as Transaction 2

    T1->>DB: BEGIN (Read Committed)
    T1->>DB: SELECT balance WHERE id=1
    DB-->>T1: 100
    T2->>DB: BEGIN
    T2->>DB: UPDATE accounts SET balance=50 WHERE id=1
    T2->>DB: COMMIT
    T1->>DB: SELECT balance WHERE id=1
    DB-->>T1: 50
    Note over T1,DB: Same transaction sees two values for the same row
    T1->>DB: COMMIT

Transaction 1 sees balance = 100 on the first read and balance = 50 on the second, even though it never modified the row. If T1's logic branches on the first value, it may produce a result that is inconsistent with the committed state by the time it commits.

Scenario 2: Write Skew under Snapshot Isolation

Write skew is invisible to both transactions because each reads from a consistent snapshot that shows both doctors on call, and each writes to a different row (their own on-call status). Neither write individually violates any single-row constraint. The combined result — zero doctors on call — is only visible after both transactions commit.

sequenceDiagram
    participant DA as Doctor A Tx
    participant DB as Database
    participant DrB as Doctor B Tx

    DA->>DB: BEGIN (Repeatable Read)
    DrB->>DB: BEGIN (Repeatable Read)
    DA->>DB: SELECT count(*) FROM on_call WHERE shift=today
    DB-->>DA: count = 2
    DrB->>DB: SELECT count(*) FROM on_call WHERE shift=today
    DB-->>DrB: count = 2
    DA->>DB: UPDATE on_call SET status=off WHERE doctor=Alice
    DrB->>DB: UPDATE on_call SET status=off WHERE doctor=Bob
    DA->>DB: COMMIT
    DrB->>DB: COMMIT
    Note over DA,DrB: Zero doctors on call. Constraint is violated.

Both doctors checked and saw two on-call. Each individually saw a safe state. Their combined writes violate the invariant that at least one doctor must remain on call. Only Serializable isolation — which detects the rw-dependency cycle between these two transactions — would have aborted one of them.

🌍 How Isolation Levels Differ Across MySQL, PostgreSQL, Oracle, and Modern Distributed Databases

Not all databases implement the same levels with the same mechanisms, and some labels are misleading. Oracle calls its highest level "Serializable" but actually implements Snapshot Isolation, meaning write skew is still possible. Understanding the real behaviour of your specific database matters more than knowing the label.

Database	Default Level	Read Uncommitted	Snapshot Isolation	SSI	Notes
PostgreSQL	Read Committed	Mapped to RC	Yes (as Repeatable Read)	Yes (Serializable)	Repeatable Read = true SI
MySQL InnoDB	Repeatable Read	Yes	Partial (via gap locks)	No	Gap locks prevent phantoms
Oracle	Read Committed	No	Yes	No	"Serializable" is actually SI; write skew possible
SQL Server	Read Committed	Yes	Yes (RCSI / SI modes)	No	RCSI enabled per database; SI opt-in
SQLite	Serializable	No	Limited (WAL mode)	No	Single-writer model; default is effectively serializable
CockroachDB	Serializable	No	Yes	Yes (default)	SSI by default; designed for distributed correctness
MongoDB (multi-doc tx)	Snapshot	No	Yes	No	Multi-doc transactions require replica set
CosmosDB (Strong)	Linearizable	No	Yes	No	Strong consistency = serializable reads globally

The most important takeaway from this table: Oracle's "Serializable" is not ANSI Serializable. If your Oracle application relies on "Serializable" to prevent write skew, it will fail. PostgreSQL's "Serializable" is genuine SSI and does prevent write skew. CockroachDB makes SSI the default and is the only widely-used database where you cannot accidentally run at a weaker level.

⚖️ The Consistency vs. Throughput Trade-off Every Isolation Level Forces You to Accept

Every isolation level is a trade-off contract, not just a safety level you can ratchet up for free. Understanding what you are actually trading away at each step is essential for making the right choice.

Choosing Read Committed means accepting that your application logic must manually protect every read-modify-write cycle. You get the best throughput and the widest driver support, but you are responsible for adding SELECT FOR UPDATE wherever a concurrent modification would corrupt your logic. Miss one place and you have a silent bug that only manifests under load.

Choosing Repeatable Read means accepting that long-running transactions can hold snapshots that prevent PostgreSQL's vacuum process from reclaiming old row versions. Table bloat is a real operational concern on high-write tables. The benefit is that read consistency within a transaction is automatic — you don't need to reason about each SELECT.

Choosing Serializable means accepting serialization failures. PostgreSQL's SSI is optimistic: it aborts the transaction that would cause a non-serializable execution. Your application must implement retry logic, and that logic must be correct (idempotent, with exponential backoff). Forgetting to retry means your application silently fails operations under contention instead of producing wrong results — which is safer, but still an outage.

The false security of Snapshot Isolation: Many developers know that Repeatable Read prevents non-repeatable reads and phantoms, so they assume it is "safe enough." It is safe enough for many workloads. But any logic of the form "read a set of rows, verify a constraint holds across all of them, then write a new row based on that set" is vulnerable to write skew at Snapshot Isolation. If the constraint involves multiple rows and multiple concurrent writers, only Serializable is correct.

🧭 How to Choose the Right Isolation Level for Your Workload

The decision is not "use the strongest level everywhere." That would make Serializable the universal default, but Serializable with SSI still introduces serialization failures, and Serializable with 2PL causes more blocking. The right approach is to identify the specific anomaly you are protecting against and choose the minimum level that prevents it.

flowchart TD
    A[What anomaly are you protecting against?] --> B{Write skew risk - logic reads a set and writes based on it}
    B -->|Yes| C[Serializable - only level that prevents write skew]
    B -->|No| D{Need consistent reads within one transaction?}
    D -->|Yes| E[Repeatable Read - snapshot taken at transaction start]
    D -->|No| F{High throughput, per-row locking acceptable?}
    F -->|Yes| G[Read Committed plus SELECT FOR UPDATE on hot rows]
    F -->|No| H{Approximate reads acceptable - analytics only?}
    H -->|Yes| I[Read Uncommitted or Read Committed with no FOR UPDATE]
    H -->|No| G
    C --> J[Add retry loop for serialization failures SQLSTATE 40001]
    E --> K[Monitor for table bloat on long-running transactions]

This flowchart guides you from the anomaly down to the mechanism. The branching logic is:

Write skew? If your transaction reads a shared aggregate or set, verifies a constraint, and then writes a new row based on that check, write skew is your risk. Only Serializable stops it.
Consistent reads within a transaction? If your transaction makes business decisions based on two reads of the same row being consistent, you need at least Repeatable Read.
Per-row correctness at Read Committed? Use SELECT ... FOR UPDATE to lock the specific rows you intend to modify. This is the right tool for the flight booking scenario at Read Committed: lock the seat row before checking availability and booking.
Analytics approximation? Read Uncommitted or Read Committed without locks is fine for dashboards where a 0.01% error rate is acceptable.

Common real-world mappings:

Use case	Recommended level	Rationale
Flight seat booking	Read Committed + `SELECT FOR UPDATE`	Lock the specific seat row; prevents lost update
Bank transfer (debit+credit)	Serializable	Write skew can corrupt balances when accounts are disjoint
Report generation	Read Committed or Repeatable Read	Consistent snapshot; no writes
Inventory reservation	Serializable or `SELECT FOR UPDATE`	Must prevent oversell under concurrent demand
User preference update	Read Committed	Single-row update; no cross-row constraint
Doctor on-call scheduling	Serializable	Classic write skew scenario

🧪 The Doctors Problem: Why Snapshot Isolation Lets Two Correct Transactions Produce a Wrong Result

The doctors on-call scenario is the canonical write skew demonstration. It is worth tracing precisely because it reveals the exact gap between Snapshot Isolation and Serializable, and because the fix is not obvious.

Here is the setup: two doctors, Alice and Bob, are both on call for tonight's shift. Hospital policy requires at least one doctor to be on call at all times. Alice feels ill and wants to go off call. She checks that Bob is available and takes herself off. At the same moment, Bob checks that Alice is available and takes himself off. Both checks were correct at the time they ran. Both writes are valid in isolation. But the combined result — zero doctors on call — violates the hospital's invariant.

Under Snapshot Isolation (Repeatable Read in PostgreSQL), both transactions run against the same consistent snapshot. Each sees two doctors on call. Neither transaction writes to the row the other transaction reads. There is no row-level conflict. Both commits succeed.

Under Serializable SSI, the PostgreSQL engine tracks that Transaction A read data that Transaction B will write (on_call for Bob's row) and that Transaction B read data that Transaction A will write (on_call for Alice's row). This mutual rw-dependency is exactly the pattern SSI is designed to detect. When Transaction B tries to commit, PostgreSQL identifies the cycle and aborts it with serialization error 40001. Alice's transaction succeeds; Bob's is retried. On retry, Bob's transaction now sees that Alice is already off call, finds only one doctor remaining, and properly declines to go off call itself.

The fix at the application level (without upgrading to Serializable) is a materialised lock: explicitly insert a lock row or increment a shared counter before the check-and-write. For example, updating a shared shift row to increment a version column in both transactions forces a write conflict that the database's row-level locking can detect and serialize. This is a pattern used in booking and scheduling systems that cannot run at Serializable due to throughput constraints.

🛠️ PostgreSQL Serializable Snapshot Isolation in Practice

PostgreSQL implemented SSI in version 9.1 and it is the safest way to run complex transactional logic without introducing application-level locking patterns. Here is how to use it effectively.

Enabling SSI per transaction:

-- Set at the connection level for all transactions
SET default_transaction_isolation = 'serializable';

-- Or set per transaction (preferred — isolate only high-risk transactions)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM on_call WHERE shift = current_date AND status = 'active';
-- If count > 1, go off call
UPDATE on_call SET status = 'off' WHERE doctor_id = $1 AND shift = current_date;
COMMIT;

Handling serialization failures in Java with JDBC:

When SSI aborts a transaction due to a detected cycle, it raises SQLState 40001. Any correct application using Serializable must catch this and retry:

int maxRetries = 5;
for (int attempt = 0; attempt < maxRetries; attempt++) {
    try (Connection conn = dataSource.getConnection()) {
        conn.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
        conn.setAutoCommit(false);
        try {
            // application logic
            performOnCallUpdate(conn);
            conn.commit();
            return; // success
        } catch (SQLException e) {
            conn.rollback();
            if ("40001".equals(e.getSQLState())) {
                // Serialization failure — safe to retry
                long backoff = (long) Math.pow(2, attempt) * 100;
                Thread.sleep(backoff);
            } else {
                throw e; // non-retryable error
            }
        }
    }
}
throw new RuntimeException("Transaction failed after " + maxRetries + " retries");

Monitoring serialization conflicts:

PostgreSQL exposes serialization activity through pg_stat_database:

-- Check how many serialization failures have occurred database-wide
SELECT datname,
       conflicts,
       deadlocks
FROM pg_stat_database
WHERE datname = current_database();

A rising conflicts count is normal under SSI. A spike in conflicts combined with slow application response times signals that your retry logic is being exercised heavily and you should consider whether the conflicting transactions can be rewritten to reduce their rw-dependency footprint — for example, by narrowing the set of rows each transaction reads.

What SSI does NOT protect against:

SSI protects against concurrent transactions. It does not protect against application-level bugs that perform check-then-act logic across separate transactions. If your doctor on-call check and the update happen in two separate database transactions — even if both are at Serializable — there is a gap between commit and the start of the next transaction where another transaction can interleave. All correctness guarantees from SSI apply within a single transaction's boundaries.

For a deep-dive on PostgreSQL's transaction internals including WAL, MVCC vacuum, and snapshot management, see the companion post on How PostgreSQL's Write-Ahead Log Works when available.

📚 Lessons Learned from Isolation Level Bugs in Production

The Read Committed + check-then-act trap: The most common production isolation bug is application code that reads a value, applies business logic, and writes back — all under Read Committed, with no SELECT FOR UPDATE. Under low load, transactions don't overlap. Under peak traffic, they do, and the lost update happens. The fix is always to add FOR UPDATE to the read that initiates the write, not to add more validation at the application layer.

"Serializable" in Oracle is not ANSI Serializable: Engineers who switch from PostgreSQL to Oracle and keep their isolation level at "Serializable" discover write skew bugs later. Oracle's Serializable is actually Snapshot Isolation. If your application depends on write skew prevention, it will fail on Oracle. Verify your database's actual implementation against the anomaly table — never trust the label alone.

Serializable everywhere breaks applications that don't handle retries: Turning on Serializable at the connection pool level without adding retry logic to every transaction turns serialization failures into application errors. Under any meaningful concurrency, SQLSTATE 40001 will appear. The fix is not to turn off Serializable — the fix is to make every transaction in your application retry-safe.

Long-running Repeatable Read transactions on PostgreSQL cause table bloat: A transaction running at Repeatable Read (or Serializable) holds a snapshot. PostgreSQL's vacuum process cannot reclaim row versions that are still visible to that snapshot. A background job that runs a 20-minute report under Repeatable Read while your tables are taking heavy writes will prevent vacuum from running and cause table bloat. Either run reports with shorter transactions, use SET LOCAL to switch isolation mid-session, or run reports against a read replica.

Snapshot Isolation prevents most common anomalies but write skew needs explicit testing: Most OLTP workloads at Repeatable Read are fine in practice. Write skew only manifests when multiple concurrent transactions share a constraint across non-overlapping rows. Test for write skew explicitly with concurrent load tests before concluding that Repeatable Read is sufficient for your workload.

📌 TLDR & Key Takeaways

Isolation is the "I" in ACID. It controls which concurrency anomalies a transaction is allowed to see.
Read Committed (PostgreSQL and Oracle default) prevents dirty reads. It silently allows non-repeatable reads, phantom reads, write skew, and lost updates. Most OLTP bugs at scale trace back to this level.
Repeatable Read takes a snapshot at transaction start. PostgreSQL prevents phantoms via MVCC. MySQL uses gap locks. Neither prevents write skew in most implementations.
Snapshot Isolation (PostgreSQL's Repeatable Read, Oracle's Serializable, SQL Server RCSI) prevents most anomalies but is not equivalent to Serializable — write skew remains possible.
Serializable is the only level that prevents all anomalies, including write skew. PostgreSQL uses SSI (optimistic, cycle detection). MySQL uses 2PL (lock-based, more blocking).
SSI requires retry logic. Serialization failures (SQLSTATE 40001) are a normal operating condition, not errors. Applications that use Serializable must catch 40001 and retry.
Oracle's "Serializable" label is misleading. It implements Snapshot Isolation, not true Serializable. Write skew is still possible.
For the flight booking scenario: Read Committed + SELECT FOR UPDATE on the seat row is the practical fix. Serializable is correct but may be overkill for this single-row case.
For the doctors on-call scenario: Only Serializable prevents this. Snapshot Isolation (Repeatable Read) will allow both commits to succeed and leave zero doctors on call.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

SQL Partitioning: Range, Hash, List, and Composite Strategies Explained

TLDR: SQL partitioning divides one logical table into smaller physical child tables, all accessed through the parent table name. The query optimizer skips irrelevant child tables entirely — a process called partition pruning — turning a 30-second ful...

May 3, 2026•23 min read

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...

May 3, 2026•23 min read

Split Brain Explained: When Two Nodes Both Think They Are Leader

TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...

May 3, 2026•20 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...

May 3, 2026•22 min read

Isolation Levels in Databases: Read Committed, Repeatable Read, Snapshot, and Serializable Explained

Intermediate

📖 Why Isolation Exists: The Concurrency Problem at the Heart of Every Database

🔍 The Four Concurrency Anomalies That Isolation Levels Are Designed to Prevent

⚙️ How the Four ANSI SQL Isolation Levels Close the Anomaly Gap

Read Uncommitted: The Weakest Floor

Read Committed: The Deceptively Permissive Default

Repeatable Read: The Snapshot at Transaction Start

Serializable: The Strongest Guarantee

🧠 Deep Dive: MVCC, Snapshot Isolation, and Why PostgreSQL Prevents Phantoms Without Gap Locks

The Internals of MVCC: Row Versions, Snapshots, and xmin/xmax

Performance Analysis: Concurrency Costs at Each Isolation Level

📊 Visualizing the Anomalies: Non-Repeatable Reads and Write Skew in Sequence

🌍 How Isolation Levels Differ Across MySQL, PostgreSQL, Oracle, and Modern Distributed Databases

⚖️ The Consistency vs. Throughput Trade-off Every Isolation Level Forces You to Accept

🧭 How to Choose the Right Isolation Level for Your Workload

🧪 The Doctors Problem: Why Snapshot Isolation Lets Two Correct Transactions Produce a Wrong Result

🛠️ PostgreSQL Serializable Snapshot Isolation in Practice

📚 Lessons Learned from Isolation Level Bugs in Production

📌 TLDR & Key Takeaways

🔗 Related Posts

Test Your Knowledge

SQL Partitioning: Range, Hash, List, and Composite Strategies Explained

Stale Reads and Cascading Failures in Distributed Systems

Split Brain Explained: When Two Nodes Both Think They Are Leader

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data