All Posts

Lost Update Explained: When Two Writes Become One

How concurrent read-modify-write operations silently discard one transaction's changes — and four ways to prevent it from overselling your inventory.

Abstract AlgorithmsAbstract Algorithms
··44 min read
Cover Image for Lost Update Explained: When Two Writes Become One
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: A lost update occurs when two concurrent read-modify-write transactions both read the same committed value, both compute a new value from it, and both write back — with the second write silently discarding the first. No error is raised. Both transactions report success. The corruption is visible only in the final state. Atomic SQL operations (UPDATE SET val = val + 1) eliminate the vulnerability at any isolation level by collapsing read and write into one indivisible step. SELECT FOR UPDATE serializes competing reads. Optimistic locking with a version column detects conflicts at write time. Serializable isolation catches all remaining cases. Critical database caveat: PostgreSQL auto-detects lost updates at Repeatable Read via first-committer-wins. MySQL at Repeatable Read does not — you must use explicit locking or atomic SQL regardless of isolation level.


📖 The Flash Sale That Sold an Item It No Longer Had

At 09:00:00 UTC on a high-traffic Friday, FlashMart's engineering team triggered a flash sale on the last remaining unit of the Gaming Headset Pro (product_id = 7842). The item had been promoted to 400,000 newsletter subscribers for 48 hours. By 09:00:01, two purchase requests arrived simultaneously from different data centers.

Server A (us-east-1) received Customer Alice's request at 09:00:01.003. It opened a database transaction, queried the products table, and read inventory_count = 1. It prepared to set inventory to 0 and create a new order.

Server B (us-west-2) received Customer Bob's request at 09:00:01.007 — just 4 milliseconds later. It opened its own independent transaction, ran the same query, and also read inventory_count = 1. It prepared the identical update.

Server A wrote inventory_count = 0 and committed at 09:00:01.041. Server B wrote inventory_count = 0 and committed at 09:00:01.049. Both commits succeeded without any error.

The warehouse received two dispatch orders for product_id = 7842. Zero units were on the shelf. Customer Alice received an "Order Confirmed — Shipment in 2 days" email at 09:00:01.200. Customer Bob received the exact same email at 09:00:01.208. FlashMart's support queue received 37 escalation tickets by 09:05:00. Both customers had paid $149.99. Only one would receive a headset.

No error appeared in the database log. Both transactions were syntactically correct, ran at the default Read Committed isolation level, and completed successfully. The inventory_count was decremented — once. But two purchases were processed. One decrement was silently lost.

This is the lost update anomaly.


🔍 The Offline Spreadsheet Problem: What a Lost Update Actually Is

Before defining the anomaly precisely, consider an analogy that makes the mechanism obvious.

Two colleagues — Alice and Bob — each receive a printed copy of the same spreadsheet at 9 AM. The spreadsheet shows the marketing budget remaining is $5,000. Alice is told to deduct $2,000 for a conference booking. Bob is told to deduct $1,500 for a trade show deposit. Both work on their own printouts. Alice's version shows $3,000 remaining. Bob's version shows $3,500. The finance team receives both emails and saves whichever arrived last. Alice's $2,000 deduction is gone. The correct balance of $1,500 is never recorded.

A lost update is exactly this. Each person worked from an identical starting snapshot, computed an independent modification, and submitted their result — with the later submission overwriting the earlier one as if it never existed.

Formal definition: A lost update occurs when Transaction T1 reads value V from a row, Transaction T2 reads the same value V from the same row, T1 writes a new value V₁ derived from V and commits, and T2 writes a new value V₂ also derived from V and commits — overwriting V₁. T1's update is permanently gone. The database holds V₂, which is only correct under the assumption that T1 never happened.

The most dangerous property of a lost update is that both transactions succeed from their own perspective. Neither T1 nor T2 sees a conflict. Neither rolls back. Neither raises an error. The corruption is invisible at the transaction boundary — it becomes apparent only when a downstream system (the warehouse) or a human observer notices that the final state is inconsistent with the operations that were reported as successful.


⚙️ The Read-Modify-Write Gap: Where the Anomaly Is Born

Lost updates emerge from a universal application pattern: the read-modify-write cycle. A transaction reads the current state of a row, computes a new value in application memory, and writes the result back. This three-step sequence contains a structural vulnerability: the gap between the read and the write.

During that gap, another transaction can perform its own read of the same row. Both transactions now hold an identical snapshot of the row's value in their respective application memories. When they each proceed to write their independently-computed results, the second write does not "add to" or "reconcile with" the first — it blindly overwrites. The database has no way to know that the new value was derived from the same starting point as the previous write.

Non-atomic cycle — vulnerable to lost update:

  1. T1: READ  inventory_count = 1   → computes: 1 - 1 = 0
  2. T2: READ  inventory_count = 1   → computes: 1 - 1 = 0
  3. T1: WRITE inventory_count = 0   → committed
  4. T2: WRITE inventory_count = 0   → overwrites T1, committed
  Final: inventory_count = 0  (two sales processed, one decrement lost)

An atomic operation eliminates the gap entirely. The modification is expressed as a single SQL statement that the database engine evaluates and applies in one indivisible step. No other transaction can interleave a read between the evaluation and the write:

Atomic operation — no lost update gap:

  T1: UPDATE SET inventory_count = inventory_count - 1 WHERE inventory_count > 0
  T2: UPDATE SET inventory_count = inventory_count - 1 WHERE inventory_count > 0
  The database evaluates each expression against the committed row value at write time.
  Final: inventory_count = -1 prevented by the WHERE guard; exactly one sale succeeds.

With atomic operations, the expression inventory_count - 1 is evaluated inside the database engine using the row's current committed value at the moment of the write — not the value the application read earlier. The read and write are collapsed into a single database operation, and the vulnerability window ceases to exist.

Not all updates can be expressed atomically. Complex business logic — applying a tiered loyalty discount, computing a shipping cost from multiple table lookups, or running a fraud score — cannot always be collapsed into a single SQL expression. For those cases, the other three prevention strategies apply.


📊 Lost Update in Slow Motion: A Transaction-Level Sequence

The following sequence diagram traces the exact database-level events during the FlashMart flash sale. Pay attention to the overlap window between T1's SELECT and T2's SELECT — that is where the anomaly is born.

sequenceDiagram
    participant T1 as Transaction T1 - Server A
    participant DB as Stock Database
    participant T2 as Transaction T2 - Server B

    T1->>DB: BEGIN
    T2->>DB: BEGIN
    T1->>DB: SELECT inventory_count FROM products WHERE id=7842
    DB-->>T1: inventory_count = 1
    T2->>DB: SELECT inventory_count FROM products WHERE id=7842
    DB-->>T2: inventory_count = 1
    Note over T1: Computes 1 - 1 = 0, ready to write
    Note over T2: Computes 1 - 1 = 0, ready to write
    T1->>DB: UPDATE products SET inventory_count = 0 WHERE id=7842
    DB-->>T1: 1 row updated
    T1->>DB: COMMIT
    DB-->>T1: T1 committed successfully
    T2->>DB: UPDATE products SET inventory_count = 0 WHERE id=7842
    DB-->>T2: 1 row updated
    T2->>DB: COMMIT
    DB-->>T2: T2 committed successfully
    Note over DB: Final state: inventory_count = 0. T1 update overwritten.

The diagram shows the four critical moments that produce the lost update: (1) both T1 and T2 read inventory_count = 1 before either has written — they share an identical snapshot; (2) each independently computes the same target value of 0; (3) T1 commits cleanly, setting inventory to 0; (4) T2 also commits cleanly, writing 0 again — which overwrites T1's write even though T1 already committed. The database ends at inventory_count = 0, which is correct for one sale but wrong for two. T1's decrement is gone with no trace.


🔍 Lost Update, Dirty Write, and Write Skew: Knowing the Difference Under Pressure

Lost updates are frequently confused with two related anomalies — dirty writes and write skew — in interviews and code reviews. All three are write conflicts involving concurrent transactions, but they differ in the data each transaction reads and the rows each transaction writes.

AnomalyBoth transactions write the same row?Reads dirty (uncommitted) data?Involves different rows?Root cause
Lost Update✅ Yes❌ No — reads committed data❌ NoSecond committed write overwrites the first; gap between read and write
Dirty Write✅ Yes✅ Yes — T2 reads T1's uncommitted write❌ NoT2 overwrites T1's uncommitted value; if T1 rolls back, state is incoherent
Write Skew❌ No — each writes a different row❌ No — reads committed data✅ YesBoth check a shared constraint across multiple rows; both pass; combined result violates the constraint

Lost update is the only one of the three that involves exclusively committed reads. T1 and T2 both see valid, committed data — the anomaly arises purely from the temporal gap between their respective reads and writes. This is precisely why it survives at Read Committed isolation: there are no dirty (uncommitted) values in play, so the isolation level's dirty-read protection provides no benefit.

Dirty write is prevented by virtually all mainstream databases. Most engines refuse to let T2 overwrite T1's uncommitted row because T1 holds a write lock that is not released until T1 commits or rolls back. You are unlikely to encounter a dirty write in PostgreSQL, MySQL InnoDB, or CockroachDB under normal operation.

Write skew requires a constraint that spans multiple rows — the canonical example is two doctors both going off-call after independently checking that at least one remains on-call. Neither transaction writes the row that the other reads, so snapshot isolation does not detect the conflict. Write skew requires either Serializable isolation or application-level SELECT FOR UPDATE that covers every row in the constraint check.

The practical interview rule: if two transactions overwrite the same row using committed reads, it is a lost update. If they write different rows and together violate a shared constraint, it is write skew.


🧠 How Databases Actually Detect (and Miss) Lost Updates

The most consequential practical distinction in production systems is not which isolation level is configured in theory — it is whether your specific database implementation detects lost updates at that level.

Internals: How MVCC and Write Tracking Create (and Close) the Lost-Update Window

Every major database that supports snapshot-based isolation uses Multi-Version Concurrency Control (MVCC). When T1 reads a row, it does not acquire a traditional read lock — it receives a pointer to the most recently committed version of that row as of the transaction's start time. The row itself can continue to receive new committed versions from other transactions. This is what makes snapshot isolation highly concurrent: readers never block writers, and writers never block readers.

The lost-update window exists because MVCC, in its base form, only tracks which version was read — not whether the row was modified since the read. When T2 attempts to write, a basic MVCC engine checks that T2 is writing a row that exists (not a phantom) but does not automatically verify that the row's value has not changed since T2's snapshot was taken. The second write proceeds unaware of T1's committed change.

Databases that prevent lost updates extend MVCC with write-conflict tracking: when a transaction commits a write to a row, the engine records a "last-writer" marker on that row's latest version. When a subsequent transaction attempts to write the same row, the engine compares the current committed version with the version the transaction read. If they differ, the engine knows a write-write conflict has occurred — and can choose to abort the later transaction (first-committer-wins) rather than silently overwrite.

PostgreSQL implements this extension as part of its Serializable Snapshot Isolation (SSI) subsystem. MySQL InnoDB does not implement this extension at Repeatable Read — its MVCC records only the read snapshot, not write-conflict markers, leaving the gap open.

Performance Analysis: The Throughput Cost of Conflict Detection

Write-conflict tracking adds overhead in two dimensions. First, memory overhead: the engine must maintain additional metadata on modified rows — essentially a lightweight "who last wrote this, and when" record — across the transaction's lifetime. Under high concurrency, this metadata grows proportionally with the number of concurrent write transactions.

Second, abort overhead: when the engine detects a write-write conflict and aborts T2, T2's work is discarded. The application must retry from scratch — re-read, re-compute, re-write. Under high contention (many transactions competing for the same row), the effective commit rate can drop below what a pessimistic lock would achieve, because T2 gets further through its work before failing than it would have if it had simply blocked at the SELECT FOR UPDATE step.

The throughput ordering under high write contention for a single hot row is approximately: atomic SQL (no conflict possible) > SELECT FOR UPDATE (T2 blocks early, no wasted work) > optimistic locking / SSI (T2 reads and computes before being aborted, wasted work at high conflict rates). For read-heavy, low-conflict workloads the ordering inverts: optimistic locking and SSI impose near-zero overhead since conflicts rarely occur.

PostgreSQL: First-Committer-Wins at Repeatable Read

PostgreSQL's Repeatable Read isolation uses Snapshot Isolation (SI): each transaction operates on a point-in-time snapshot of the database taken at transaction start. All reads within the transaction see data as it existed at that moment, regardless of concurrent commits. This prevents non-repeatable reads.

PostgreSQL adds a critical extension: the first-committer-wins rule. When T2 attempts to write a row that T1 has already committed a write to since T2's snapshot was taken, PostgreSQL detects the write-write conflict and aborts T2 with:

ERROR:  could not serialize access due to concurrent update
DETAIL:  Process 12345 updated this row while this transaction was in progress.

The application must catch this error and retry the transaction from the beginning. This behavior means that PostgreSQL at Repeatable Read automatically prevents lost updates — no explicit locking is required for simple read-modify-write cycles.

At the default Read Committed level, this protection does not apply. Each statement gets a fresh snapshot, and write-write conflicts are not detected. Lost updates are possible at Read Committed on PostgreSQL.

MySQL InnoDB: Repeatable Read Does Not Auto-Detect

MySQL InnoDB also implements MVCC with a transaction-start snapshot at Repeatable Read, similar in name to PostgreSQL's behavior. The critical difference: MySQL does not apply the first-committer-wins rule. When T2 attempts to write a row that T1 already committed, MySQL allows T2's write to proceed without error. The lost update occurs silently.

This is one of the most consequential behavioral differences between the two most widely deployed open-source databases. Engineers migrating between them — or operating both — have discovered production data corruption bugs that were silently suppressed by one engine and silently allowed by the other.

The MySQL rule: Never rely on Repeatable Read alone to prevent lost updates in MySQL InnoDB. Always use explicit SELECT ... FOR UPDATE locking, atomic SQL operations, or application-level optimistic locking with a version column.

CockroachDB: Serializable by Default

CockroachDB enforces Serializable isolation (SSI) by default. Every transaction automatically detects write-write conflicts and aborts the losing transaction with a 40001 serialization_failure error. Lost updates are impossible without explicitly downgrading isolation — which CockroachDB strongly discourages. Applications must be written to retry on 40001, but they gain the guarantee that no lost update, write skew, or phantom read can occur.

Redis: Atomic at the Instance Level, Vulnerable in Cluster Mode

A single Redis instance processes commands serially — only one command executes at a time. Built-in commands like INCR and DECR are inherently atomic at the instance level, making them immune to lost updates. The read-modify-write gap does not exist when a single command encapsulates both operations.

In Redis Cluster, keys are distributed across shards. If a client reads from a replica and writes to a primary, or if related keys live on different shards, the same lost-update window re-emerges. Redis provides WATCH, MULTI, and EXEC as an optimistic concurrency mechanism at the client level: WATCH marks a key for monitoring, and the subsequent EXEC aborts the entire queued transaction block if the watched key changed since WATCH was issued — effectively implementing client-side optimistic locking.


📊 Isolation Level Prevention Matrix: Which Levels Actually Catch Lost Updates

Not all isolation levels prevent lost updates, and — as MySQL's behavior demonstrates — even the same named level can behave differently across databases. This table reflects both ANSI SQL standard semantics and the documented behavior of the three most common production databases.

Isolation LevelANSI SQL prevents lost updates?PostgreSQLMySQL InnoDBCockroachDB
Read Uncommitted❌ NoPromoted to Read Committed❌ NoNot supported
Read Committed❌ No❌ No❌ NoNot supported
Repeatable Read❌ No (ANSI does not require it)✅ Yes (first-committer-wins via SSI)❌ No (silent overwrite)Not supported
Snapshot Isolation✅ Yes (where implemented)✅ Yes❌ MySQL RI ≠ true SI for writes✅ Yes
Serializable✅ Yes✅ Yes✅ Yes (SSI enforced)✅ Yes (default)

The practical takeaway from this table: never assume your isolation level prevents lost updates without verifying your database's specific implementation. In MySQL, treat Repeatable Read as equivalent to Read Committed for lost-update purposes. In PostgreSQL, Repeatable Read provides first-committer-wins protection — but your application still needs retry logic on could not serialize access errors.


📊 The Full Purchase Flow When a Lost Update Strikes the Architecture

The following architecture diagram shows how two simultaneous purchase requests travel through a real-world e-commerce system and produce an oversell event. The lost update is not a single-database problem — its consequences cascade from the Inventory Service through the Order Service to Warehouse Dispatch.

graph TD
    A[Customer A - Purchase Request] --> LB[Load Balancer]
    B[Customer B - Purchase Request] --> LB
    LB --> S1[Server A - Inventory Service]
    LB --> S2[Server B - Inventory Service]
    S1 -->|T1: SELECT inventory=1| DB[(Stock Database)]
    S2 -->|T2: SELECT inventory=1| DB
    S1 -->|T1: UPDATE SET inventory=0| DB
    S2 -->|T2: UPDATE SET inventory=0| DB
    DB -->|Both commits succeed| ORD[Order Service]
    ORD -->|Order-1001 confirmed| WH[Warehouse Dispatch]
    ORD -->|Order-1002 confirmed| WH
    WH --> OVER[Zero items, two dispatch orders created]

The diagram traces the complete failure path from customer request to warehouse error. Two requests enter through the load balancer, each is handled by a separate Inventory Service process with its own database connection. Both processes read inventory = 1 before either has written. Both write inventory = 0. The Order Service receives two successful signals from the Inventory Service — it has no visibility into the collision — and creates two confirmed orders. The Warehouse Dispatch system receives two dispatch instructions for a product with zero remaining stock. The anomaly originates in the read overlap window at the Stock Database, but its effects propagate without any error signal through every downstream component.


⚙️ Four Ways to Prevent Lost Updates Before They Reach Production

Strategy 1: Atomic SQL Operations — The Default for All Counter and Quantity Updates

Atomic operations eliminate the read-modify-write gap at the database level. The modification is expressed as a single self-referential SQL statement. The database evaluates and applies the update in one indivisible operation — no other transaction can interleave a read between the evaluation and the write.

This strategy works at every isolation level, including Read Committed. It requires no version columns, no retry logic, and no lock management overhead. It is the fastest, simplest, and most reliable approach for any modification that can be expressed as a function of the current value.

The limitation is expressiveness: complex multi-step conditional business logic (tier-based pricing, multi-table constraint checks) cannot always be reduced to a single SQL expression. For those cases, move to strategies 2–4.

Applicable to: Counter increments, quantity decrements, running totals, score updates, balance adjustments — any scalar modification described as a pure function of the current value.

Strategy 2: SELECT FOR UPDATE — Serialize the Read Under High Contention

SELECT ... FOR UPDATE acquires a row-level exclusive write lock on the selected rows at the moment of the read. Any other transaction attempting to read those rows with FOR UPDATE blocks until the locking transaction commits or rolls back. The read-modify-write cycle is serialized: T2 cannot even read the row's current value until T1 is completely finished.

The lock is held for the duration of the transaction and released only at COMMIT or ROLLBACK. This eliminates the overlap window by making T2 wait for the gap to close before it can open one of its own.

Trade-off: Locks held for the full transaction duration can become throughput bottlenecks under high concurrency. A long-running transaction holding a FOR UPDATE lock on a hot row can cause other transactions to queue behind it. For low-to-medium contention workloads, optimistic locking (Strategy 3) is usually more throughput-efficient because reads proceed without blocking and conflicts are resolved only at write time.

Applicable to: High-contention rows with complex update logic — seat reservations, booking systems, financial transfers, any row touched by hundreds of concurrent transactions.

Strategy 3: Optimistic Locking with Version Columns — Low Overhead, Retry-Based Safety

Optimistic locking assumes conflicts are rare and checks for them at write time rather than preventing concurrent reads. Every row carries a version column (integer). The application reads the row including its current version, computes the new value, and writes back using a conditional UPDATE that includes WHERE version = :old_version in its predicate.

If another transaction committed a write between the read and the write, the version will have advanced. The conditional UPDATE matches zero rows. The application detects this (rows_affected = 0), treats it as a lost-update conflict, and retries the full read-compute-write cycle from the beginning.

Under low contention — where most transactions succeed on first attempt — optimistic locking delivers near-zero overhead. Reads are non-blocking. No lock infrastructure is required. The retry path is rarely taken.

Under high contention, the retry amplification effect can degrade throughput significantly. If 50 transactions simultaneously compete for the same version, 49 of them will fail on first attempt and retry — potentially failing again if the winning transaction is fast enough to repeat the cycle. At this contention level, SELECT FOR UPDATE (Strategy 2) serializes writes and avoids the retry storm.

Applicable to: Low-conflict record edits, user-facing forms that save data after a "think time" gap, API resources where clients infrequently edit the same record.

Strategy 4: Serializable Isolation — The Safety Net for Complex Workloads

Serializable isolation guarantees that the result of all concurrent transactions is equivalent to some valid sequential execution. Any transaction pattern that could produce a lost update — or write skew, or phantom read — causes one of the conflicting transactions to be aborted and retried automatically.

This is the highest-coverage approach. You do not need to identify specific hot rows, add version columns, or audit every code path for read-modify-write cycles. The database detects all anomaly patterns automatically.

The cost is throughput. PostgreSQL's Serializable Snapshot Isolation (SSI) is relatively efficient, but it requires the engine to track read and write dependencies across transactions. Under high conflict rates, more transactions abort and retry. For most OLTP systems, targeted application of Strategies 1–3 on the specific hot paths is more efficient than blanket Serializable isolation on every connection.

Applicable to: Systems where write skew is also a concern, compliance workloads requiring guaranteed correctness, or codebases where auditing every read-modify-write path is impractical.


🏗️ Lost Updates Across Microservices and Distributed Databases

The lost update problem becomes structurally harder in distributed systems because there is no single transaction boundary that spans multiple services or data stores.

Cross-service read-modify-write: In a microservices architecture, an Inventory Service and a Pricing Service may both read from the same Order record in a shared database and write back independently. If they use separate database connections, no shared transaction detects the write-write conflict. The solutions are: (a) designate a single service as the authoritative writer and route all Order record updates through it; (b) use an event-driven append-only pattern (the Outbox Pattern or Change Data Capture) where each service emits an event and a single downstream consumer applies state changes sequentially; or (c) embed version fields in the API contract — the PUT request body carries a version field, and the service refuses updates where version does not match the current database value.

DynamoDB Optimistic Concurrency Control: DynamoDB has no multi-row transactions at scale, but it provides conditional writes via ConditionExpression. Setting ConditionExpression: "version = :v" on a PutItem or UpdateItem call implements the exact same version-check pattern as SQL optimistic locking — the operation is rejected if the stored version does not match the expected value. The application retries on ConditionalCheckFailedException.

CRDTs for commutative operations: Conflict-Free Replicated Data Types (CRDTs) provide a mathematical framework for merging concurrent writes without coordination. A CRDT counter can accept increments from multiple nodes across a cluster and converge to the correct total without a coordinator — because addition is commutative. CRDTs are used in distributed caches and multi-leader replication systems (Riak, distributed Redis counters) where exact consistency is not required but monotonic progress is guaranteed. They do not solve lost updates for non-commutative writes (setting a value to a specific result of complex logic), which is the common application case.

Eventual consistency systems: Apache Cassandra and similar AP databases accept concurrent writes to the same cell by design. Last-Write-Wins (LWW) is the default conflict resolution — the write with the highest timestamp survives. This is a structural lost update by design. Applications built on Cassandra that need update safety must either use lightweight transactions (IF version = x), use a single-partition serial consistency level, or architect their writes as append-only events rather than read-modify-write updates.


🌍 Where Lost Updates Have Cost Real Companies Real Money

Inventory overselling in flash sales: The FlashMart scenario in the opening is representative of real incidents at Amazon, Shopify, and Ticketmaster during high-demand product launches and concert ticket drops. The common root cause is an inventory service operating at Read Committed with a read-modify-write cycle and no FOR UPDATE lock or atomic decrement. The business cost extends beyond the single oversold unit: customer service escalation, refund processing costs, reputational damage, and loss of repeat purchase intent from affected customers.

Loyalty points double-spending: A loyalty program that awards points on purchase and redeems them at checkout faces a double-spend risk when point balance updates are non-atomic. A customer redeems 500 points at a checkout terminal while a background job simultaneously awards 100 bonus points for a qualifying purchase. Both processes read the current balance of 600, both compute their modification, and the final balance reflects only the last write. The customer either receives phantom points or loses a legitimate redemption — neither outcome is acceptable, and neither raises a database error.

Comment counter and engagement metric corruption: High-traffic content platforms that track likes, share counts, or view counts via application-level read-modify-write operations experience persistent counter drift. A post receiving 10,000 simultaneous view events — each represented as a read-increment-write cycle — can drift its counter by thousands within minutes. The solution is well-known (UPDATE posts SET view_count = view_count + 1) but is frequently skipped in early implementations that scale unexpectedly beyond their original design assumptions.

Financial ledger discrepancies: In banking and fintech, a balance adjustment that reads the source account balance, subtracts a transfer amount, and writes back is a classic lost-update surface. Two simultaneous transfers from the same account can each read the full balance, each compute a valid deduction, and each commit — leaving the account with a balance that reflects only one of the two deductions. Banks mitigate this with row-level locking, debit/credit ledger entries (append-only writes that avoid read-modify-write entirely), and idempotency keys that prevent duplicate processing.


⚖️ Trade-offs: What Each Prevention Strategy Costs in Throughput and Complexity

Prevention StrategyIsolation level requiredThroughput impact under contentionApplication complexityDetects conflict automatically?
Atomic SQLAny (works at Read Committed)Lowest — no locks, no retries neededLowest — change the SQL expression onlyN/A — gap is eliminated
SELECT FOR UPDATERead Committed or higherMedium — row locks cause queuing behind hot rowsLow — wrap reads in FOR UPDATEBlocks T2 at read; no app retry needed
Optimistic lockingRead Committed or higherLow under low contention; degrades under high contentionMedium — version column schema change, retry loop requiredApp retries on 0 rows-affected
Serializable isolationSerializableHighest — conflict tracking overhead, more aborts at high concurrencyMedium — catch serialization errors, implement retryDatabase aborts and signals automatically

The two most common failure patterns in production: (a) under-protection — using Read Committed with a read-modify-write cycle and no locking, assuming "the database handles it" — it does not; and (b) over-protection — running every transaction at Serializable when only a handful of critical hot paths actually need it, degrading overall throughput unnecessarily for uncontested reads.


🧭 Choosing the Right Prevention Strategy: A Decision Guide

The following flowchart captures the practical decision logic for selecting a lost-update prevention strategy. Start from the cheapest option and move toward heavier mechanisms only when the update pattern genuinely requires it.

flowchart TD
    A[Lost update risk identified on a row] --> B{Can the update be expressed as a single SQL expression?}
    B -->|Yes - e.g. val = val plus delta| C[Atomic SQL - works at any isolation level]
    B -->|No - complex multi-step logic required| D{Expected contention level on this row?}
    D -->|High - many concurrent writes| E[SELECT FOR UPDATE - pessimistic lock]
    D -->|Low to medium - conflicts are infrequent| F[Optimistic locking with version column]
    E --> G{Does the update span multiple rows or tables?}
    G -->|Yes - shared constraint across row set| H[Serializable isolation or SSI]
    G -->|No - single hot row| I[FOR UPDATE on that row is sufficient]
    F --> J{Retry rate acceptable in production?}
    J -->|Yes - few retries observed| K[Optimistic locking confirmed]
    J -->|No - too many conflicts causing retry storms| E

The flowchart guides the selection: atomic SQL eliminates the problem with no infrastructure cost and is the right default for any increment or decrement. Optimistic locking is appropriate for complex logic on low-contention rows. SELECT FOR UPDATE belongs on high-contention hot rows. Serializable isolation serves as the safety net when write skew is also present, or when auditing every write path is impractical.

SituationRecommended Strategy
Counter increment, quantity decrement, running totalAtomic SQL: UPDATE SET col = col + delta WHERE col > 0
Complex business rule on a single hot row (booking, seat reservation)SELECT ... FOR UPDATE
Low-conflict record edits (user profile, document save after think time)Optimistic locking with version column
Write-skew risk across multiple rows (on-call scheduling, overlapping bookings)Serializable isolation
MySQL InnoDB at any isolation level with complex update logicAlways use FOR UPDATE or atomic SQL — Repeatable Read does not auto-protect
Redis distributed cluster with shared countersWATCH / MULTI / EXEC optimistic concurrency
Microservices with separate database connections updating shared stateConditional writes in API contract or single authoritative writer

🧪 Practical Walkthrough: Tracing a Lost Update Through Concrete Numbers

This section walks through two concrete scenarios — one showing the failure mode clearly, one showing each prevention strategy applied to the same scenario — so the mechanism and the fix are directly comparable without abstraction.

Scenario A — The Vulnerable Read-Modify-Write Cycle

A sessions table tracks active login count per user. User 42 has active_sessions = 2. Two requests arrive simultaneously: the web service opens a new session for a new browser tab, and the mobile app also opens a new session.

StepWeb Service (T1)Mobile Service (T2)sessions.active_sessions
1SELECT active_sessions WHERE user_id=42 → 22 (committed)
2SELECT active_sessions WHERE user_id=42 → 22 (committed)
3Computes 2 + 1 = 3, issues UPDATE
4UPDATE SET active_sessions = 3 → committed3
5Computes 2 + 1 = 3, issues UPDATE3
6UPDATE SET active_sessions = 3 → committed3

Final value: active_sessions = 3. Correct value: active_sessions = 4. One login was lost. The user has three tracked sessions but four browsers open — the session count is now permanently off.

Scenario B — Fixed with an Atomic Operation

Replace the read-modify-write cycle with a single atomic increment. No reads are fetched into application memory. No intermediate value is computed outside the database.

StepWeb Service (T1)Mobile Service (T2)sessions.active_sessions
1UPDATE SET active_sessions = active_sessions + 1 WHERE user_id=422 → 3 committed
2UPDATE SET active_sessions = active_sessions + 1 WHERE user_id=423 → 4 committed

Final value: active_sessions = 4. Correct. The active_sessions + 1 expression is evaluated by the database using the row's current committed value at each write's execution moment. There is no gap for T2 to read a stale value — T2 reads and increments in a single operation after T1's write is committed.

Scenario C — Fixed with Optimistic Locking

When the update logic is too complex for a single expression (e.g., a tiered cap that prevents active_sessions exceeding 5 for a free-tier user), the application reads the current value and version, applies the tier-check logic, and writes back conditionally. If another transaction committed a change in the meantime, the version will have advanced and the UPDATE will affect 0 rows — signalling the application to retry.

StepWeb Service (T1)Mobile Service (T2)sessions row
1SELECT active_sessions=2, version=19 WHERE user_id=42version=19
2SELECT active_sessions=2, version=19 WHERE user_id=42version=19
3Applies tier logic → 3; UPDATE SET active_sessions=3, version=20 WHERE user_id=42 AND version=19 → 1 rowversion=20
4Applies tier logic → 3; UPDATE SET active_sessions=3, version=20 WHERE user_id=42 AND version=190 rowsversion=20
50 rows → conflict detected, retry: re-reads version=20, re-applies logic → 4; UPDATE with version=20 → 1 rowversion=21

Final value: active_sessions = 4, version = 21. Correct. The version column acts as a conflict sentinel: it proves whether the row was modified between the application's read and its write, enabling safe retry without ever requiring a lock.


🛠️ PostgreSQL, MySQL, and Redis: Enforcing Lost-Update Prevention in Practice

This section shows short DDL and config-level SQL patterns — not application code — that implement each prevention strategy in the three most commonly deployed data stores. Every snippet represents the minimum database-layer change needed to enforce safety.

PostgreSQL: SELECT FOR UPDATE Serializes the Read

Wrapping the SELECT in FOR UPDATE acquires a row-level exclusive lock that persists until the transaction ends. T2's SELECT FOR UPDATE on the same row blocks at the read step until T1 commits or rolls back — the overlap window is eliminated entirely.

-- PostgreSQL: pessimistic locking pattern for inventory decrement
BEGIN;

SELECT inventory_count
  FROM products
  WHERE id = 7842
  FOR UPDATE;
-- T2 blocks here until this transaction commits

UPDATE products
  SET inventory_count = inventory_count - 1
  WHERE id = 7842
    AND inventory_count > 0;

COMMIT;

The FOR UPDATE on the SELECT is the only required change relative to the vulnerable pattern. Without it, the SELECT takes a shared MVCC snapshot and does not block concurrent reads. With it, T2 cannot read the locked row until T1 commits — serializing the read-modify-write cycle.

PostgreSQL: Optimistic Locking with a Version Column

The products table is extended with a version integer column. The UPDATE includes a WHERE version = :old_version predicate. If the version changed between the application's read and this write, the update matches zero rows — the application detects the conflict and retries.

-- Schema: add version column (one-time migration)
ALTER TABLE products
  ADD COLUMN version INTEGER NOT NULL DEFAULT 1;

-- Read step: capture current version
SELECT inventory_count, version
  FROM products
  WHERE id = 7842;
-- Returns: inventory_count = 1, version = 7

-- Write step: conditional on version matching
UPDATE products
  SET inventory_count = 0,
      version         = version + 1
  WHERE id    = 7842
    AND version = 7;
-- rows_affected = 0  →  version changed, conflict detected, application retries
-- rows_affected = 1  →  committed successfully, version is now 8

The version column pattern is database-agnostic — the same WHERE version = :v condition works identically in MySQL InnoDB, SQL Server, Oracle, and CockroachDB.

MySQL InnoDB: Explicit FOR UPDATE Is Mandatory

MySQL at Repeatable Read does not auto-detect lost updates. SELECT ... FOR UPDATE is required for every read-modify-write cycle that must be safe. The AND inventory_count > 0 guard in the UPDATE adds a defense-in-depth check against negative inventory even if locking is momentarily bypassed.

-- MySQL: explicit locking required regardless of isolation level
START TRANSACTION;

SELECT inventory_count
  FROM products
  WHERE id = 7842
  FOR UPDATE;

UPDATE products
  SET inventory_count = inventory_count - 1
  WHERE id              = 7842
    AND inventory_count > 0;

COMMIT;

For the atomic SQL equivalent in MySQL — which needs no lock at all — the single statement UPDATE products SET inventory_count = inventory_count - 1 WHERE id = 7842 AND inventory_count > 0 provides complete protection at any isolation level.

Redis: WATCH / MULTI / EXEC for Optimistic Concurrency

The standard Redis MULTI / EXEC block batches commands but does not prevent another client from modifying a key between a GET and the queued SET. Adding WATCH turns the block into an optimistic transaction: if the watched key changes between WATCH and EXEC, the EXEC returns null (the block is discarded) and the client retries from the WATCH.

# Redis DDL-level pseudoconfig (not application code)
# Optimistic concurrency for inventory key

WATCH inventory:7842
  # Read current value outside MULTI block
  GET inventory:7842"1"
MULTI
  SET inventory:7842 0
EXEC
# Returns null if inventory:7842 changed since WATCH → client retries
# Returns array of results if committed successfully

The WATCH key is released automatically on EXEC (whether it aborted or succeeded), on DISCARD, or on any error in the watched key path. For high-throughput counters where retries are unacceptable, prefer the atomic DECR command on a single Redis instance — it has no gap.

For a deep-dive on isolation level semantics and how each level maps to specific anomaly prevention, see Isolation Levels in Databases. For the full taxonomy of distributed write anomalies beyond lost updates, see Data Anomalies in Distributed Systems.


📚 Lessons Learned: What Production Engineers Know About Lost Updates

  1. Silence is the most dangerous symptom. Lost updates produce no errors, no warnings, and no log entries at the database or application layer. Both transactions report success. The only discovery path is noticing that the final state is inconsistent with the operations that were reported as successful — often via a customer complaint, a manual audit, or a downstream system (the warehouse) rejecting an impossible request.

  2. Read Committed is not a safety net for read-modify-write. The default isolation level on PostgreSQL and Oracle (and the effective default on MySQL) allows lost updates. If your application reads a value, modifies it in application memory, and writes it back, you are vulnerable at Read Committed regardless of database vendor.

  3. MySQL Repeatable Read ≠ PostgreSQL Repeatable Read for lost updates. PostgreSQL's first-committer-wins rule auto-detects write-write conflicts at Repeatable Read. MySQL's does not. Never assume one database's isolation semantics map to another's just because they share a level name. Test lost-update scenarios explicitly when migrating between vendors.

  4. Prefer atomic SQL for all simple updates. UPDATE SET val = val + delta is always safer than a read-modify-write cycle for counters and quantities. It requires zero additional infrastructure, works at any isolation level, and has lower overhead than any locking or retry mechanism. Default to it unless the update genuinely cannot be expressed as a single SQL expression.

  5. Optimistic locking is a contract that every write path must honor. Adding a version column means nothing unless every UPDATE on that table checks it. A single code path that performs an unconditional UPDATE without WHERE version = :v silently breaks the entire optimistic locking invariant for that table. Audit all write paths when introducing optimistic locking.

  6. High contention surfaces deserve pessimistic locks. Optimistic locking's retry amplification effect — many transactions failing and retrying under high conflict — can make SELECT FOR UPDATE the more efficient choice for rows that are written frequently. Profile retry rates in staging with realistic concurrency before choosing between the two.

  7. Distributed systems require an explicit strategy at every boundary. Across microservices or NoSQL stores, there is no shared transaction coordinator. Conditional writes, append-only event logs, single-authoritative-writer patterns, and CRDT-compatible operations are the idiomatic solutions. Assume no lost-update protection exists until you have explicitly implemented one.


📌 Summary & Key Takeaways: Lost Updates in Seven Bullets

  • A lost update occurs when two read-modify-write transactions overwrite each other — one write silently discards the other, with no error raised by either the database or the application.
  • The root cause is the read-modify-write gap: T2 reads the same committed value T1 already read, and T2's write lands after T1's — overwriting T1's result using a stale starting point.
  • Atomic SQL (UPDATE SET col = col + delta) eliminates the gap entirely and is the correct default for all counter, quantity, and balance operations at any isolation level.
  • SELECT FOR UPDATE serializes competing reads at the row level, blocking T2 at the read step until T1 commits. Best for high-contention rows with complex update logic.
  • Optimistic locking (version column + WHERE version = :v) detects conflicts at write time and retries. Best for low-to-medium contention with complex multi-step logic.
  • Serializable isolation prevents all lost updates (and write skew) database-wide, with automatic abort on conflict. Best as a safety net or when auditing every write path is impractical.
  • PostgreSQL auto-detects lost updates at Repeatable Read. MySQL does not — treat MySQL Repeatable Read as equivalent to Read Committed for lost-update safety purposes.

📝 Practice Quiz

  1. A bank balance starts at $1,000. T1 reads $1,000 and plans to write $700 (subtracting a $300 transfer). T2 reads $1,000 and plans to write $900 (subtracting a $100 fee). T1 commits, then T2 commits. What is the final balance, and what should it be?

    • A) Final: $700. Should be: $600. T1's write was lost.
    • B) Final: $900. Should be: $600. T2's write overwrote T1's; T1's deduction was lost.
    • C) Final: $600. Both deductions applied correctly.
    • D) Final: $1,000. Both transactions were rolled back due to conflict. Correct Answer: B
  2. Which of the following SQL patterns completely prevents a lost update at Read Committed isolation in both PostgreSQL and MySQL?

    • A) BEGIN; SELECT balance FROM accounts WHERE id=1; UPDATE accounts SET balance = 900 WHERE id=1; COMMIT;
    • B) UPDATE accounts SET balance = balance - 100 WHERE id=1 AND balance >= 100;
    • C) BEGIN; SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT balance FROM accounts WHERE id=1; COMMIT;
    • D) SELECT balance FROM accounts WHERE id=1 FOR SHARE; Correct Answer: B
  3. PostgreSQL is running at Repeatable Read isolation. T1 reads a row and then T2 reads the same row. T1 commits a write. T2 then tries to commit its own write to the same row. What does PostgreSQL do?

    • A) PostgreSQL allows T2's write to proceed silently, producing a lost update.
    • B) PostgreSQL aborts T2 with a serialization failure and requires the application to retry.
    • C) PostgreSQL merges T1's and T2's writes using a last-write-wins policy.
    • D) PostgreSQL blocks T2's commit until T1's transaction ages out of the MVCC snapshot. Correct Answer: B
  4. A products table has a version INTEGER column. The application reads a row (version = 15) and issues UPDATE products SET price = 49.99, version = version + 1 WHERE id = 200 AND version = 15. The update returns 0 rows affected. What is the correct interpretation?

    • A) The row with id=200 does not exist. The application should insert it.
    • B) Another transaction updated the row between the read and this write, advancing the version past 15. The application should re-read the row and retry.
    • C) The version column does not allow arithmetic expressions in a WHERE clause.
    • D) The update was silently committed with no change to the version column. Correct Answer: B
  5. MySQL InnoDB is configured at Repeatable Read isolation. Two transactions execute a read-modify-write cycle on the same inventory row simultaneously. Neither uses SELECT FOR UPDATE or atomic SQL. What happens?

    • A) MySQL aborts the second transaction with ERROR 1213 (Deadlock found).
    • B) MySQL applies a first-committer-wins rule and aborts the slower transaction.
    • C) MySQL allows the second write to overwrite the first without any error — the lost update occurs silently.
    • D) MySQL queues the second transaction's write until the first transaction's write lock is released. Correct Answer: C
  6. A booking system allows users 30 seconds to review seat details before confirming. The application reads seat availability at page load, then writes the reservation when the user clicks Confirm. What is the best prevention strategy for this 30-second gap?

    • A) Hold a SELECT FOR UPDATE lock for the entire 30-second review window.
    • B) Use optimistic locking: detect the conflict at write time with a version check and return a "seat no longer available" response to the user.
    • C) Run all booking transactions at Serializable isolation.
    • D) Use an atomic decrement — UPDATE SET available = available - 1 WHERE id = ? AND available > 0. Correct Answer: B
  7. How does a lost update differ structurally from write skew?

    • A) A lost update requires uncommitted reads; write skew requires only committed reads.
    • B) A lost update involves two transactions writing the same row; write skew involves transactions writing different rows that together violate a shared constraint.
    • C) Write skew always requires three or more concurrent transactions; a lost update requires exactly two.
    • D) A lost update is prevented at Repeatable Read in all databases; write skew is not. Correct Answer: B
  8. Redis WATCH / MULTI / EXEC implements which class of lost-update prevention?

    • A) Pessimistic locking — it blocks other clients from reading the watched key until EXEC completes.
    • B) Atomic SQL — it collapses the GET and SET into a single Redis command.
    • C) Optimistic locking — it aborts the MULTI block if the watched key changed since WATCH was issued, requiring a client-side retry.
    • D) Serializable isolation — all Redis clients see a total order of operations. Correct Answer: C
  9. A microservices system has an Inventory Service and an Order Service that both read and update an Order record using separate database connections. There is no distributed transaction coordinator. Which approach most reliably prevents a lost update between them?

    • A) Set both services to use Serializable isolation on their own connections.
    • B) Use SELECT FOR UPDATE in both services independently.
    • C) Route all Order record writes through a single authoritative service, or use conditional writes with a version field in the API contract.
    • D) Use Read Uncommitted to reduce lock contention between services. Correct Answer: C
  10. (Open-ended) You are reviewing a fintech application that manages loyalty point balances. The code reads the current points balance for a user (which involves a multi-table calculation including tier status and recent transaction history), applies a redemption deduction using business rules that span 200 milliseconds of computation, and writes the new balance back. Describe: (a) which specific anomaly this pattern is vulnerable to and why the 200-millisecond computation window matters; (b) why a simple atomic SQL statement would not fully solve the problem here; and (c) which two prevention strategies you would evaluate first, and what observable production metric would determine your final choice between them.


Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms