Compare-and-Swap and Optimistic Locking: How Every Database Implements It
From the CMPXCHG CPU instruction to DynamoDB ConditionExpressions and Cassandra LWT — how CAS and optimistic locking work across every major database.
Abstract AlgorithmsTLDR: Compare-and-Swap (CAS) is the CPU-level atomic instruction that makes lock-free concurrency possible. Optimistic locking builds on it at the database layer: read freely, compute locally, write only if the record has not changed. Every major database — PostgreSQL, MySQL, DynamoDB, Redis, MongoDB, CockroachDB, and Cassandra — implements CAS semantics differently. Choosing the wrong implementation, or retrying naïvely, turns a 10ms write into a retry storm. This post explains the full stack from
CMPXCHGtoConditionExpression.
🚨 The 47-Second Window That Broke a Ride-Share Platform
At 08:14:32 UTC on a weekday morning, a ride-sharing platform's dispatch service and billing service both received an event for driver ID D-7741. The dispatch service read the driver record: status = AVAILABLE, version = 42. Simultaneously — 23 milliseconds later — the billing service read the same row: status = AVAILABLE, version = 42.
The dispatch service computed a new state: status = ASSIGNED, ride_id = R-9901. The billing service computed a parallel update: billing_cycle = OPEN, last_event_ts = 2026-03-14T08:14:32Z. Both services executed their UPDATE statements 47 seconds after the initial read — within 3 milliseconds of each other.
The dispatch write landed first. The billing write landed second — and because it used a plain UPDATE drivers SET billing_cycle='OPEN' WHERE id='D-7741', it overwrote the entire row with stale data. The driver's status reverted silently to AVAILABLE. No error was raised. No alert fired. The driver was now simultaneously "assigned" in the ride table and "available" in the dispatch queue.
For the next 4 minutes, the system tried to assign the driver to a second ride. The original passenger was stranded. Surge pricing calculations used stale availability data. Three downstream metrics pipelines ingested corrupt status events. The root cause was a single missing WHERE version = 42 clause.
This is the lost update problem — and Compare-and-Swap is the primitive that prevents it.
📖 The CPU Instruction Behind Every Database Concurrent Write
Compare-and-Swap (CAS) is an atomic CPU instruction. "Atomic" means it executes as a single, indivisible operation — no other CPU core can observe an intermediate state. The semantics are simple:
CAS(memory_location, expected_value, new_value) → boolean
if *memory_location == expected_value:
*memory_location = new_value
return true
else:
return false // nothing changed
On x86-64 CPUs, this maps to the CMPXCHG instruction. On ARM processors, the equivalent is a LDREX/STREX pair (Load-Reserved / Store-Conditional). Both guarantee that the comparison and the swap happen without any other core touching the memory location in between.
The atomicity guarantee comes from the CPU's cache coherence protocol. When a core executes CMPXCHG, it asserts exclusive ownership of the cache line containing the memory location. Other cores must wait or invalidate their cache lines. This is not a software lock — it is a hardware guarantee enforced by the memory subsystem itself.
CAS has exactly three outcomes:
| Outcome | Condition | Result |
| Success | Memory held expected value | Swap performed, returns true |
| Failure | Memory held a different value | No change, returns false |
| ABA | Memory changed A→B→A | Appears as success, may be logically incorrect |
The ABA problem (covered in depth shortly) is the critical pitfall that every production system using CAS must address.
The diagram below shows the CAS decision flow from the moment a thread issues the instruction to the moment it acts on the result.
graph TD
A[Thread reads memory_location into register] --> B{Is register == expected_value?}
B -->|Yes - atomically| C[Write new_value to memory_location]
B -->|No| D[Return false - no change]
C --> E[Return true - swap succeeded]
D --> F{Should retry?}
F -->|Yes - backoff| A
F -->|No - abort| G[Propagate conflict to caller]
The key insight here is that the B diamond and C box execute as a single hardware instruction — nothing can interleave between the comparison and the write. This is what makes CAS the foundation of all lock-free algorithms. The retry loop at F is an application-level decision; the hardware only guarantees the atomic compare-and-swap itself.
🔍 How CAS Propagates from Hardware to Distributed Storage — The Abstraction Stack
CAS does not stay at the hardware level. Every layer of software above the CPU builds on it, abstracting the raw memory instruction into higher-level constructs. Understanding this stack tells you why database version columns and DynamoDB ConditionExpressions are not arbitrary design choices — they are CAS, two layers up.
graph TD
A[x86 CMPXCHG / ARM LDREX+STREX] --> B[OS mutex and spinlock primitives]
B --> C[Language atomics - Java AtomicInteger - Go sync/atomic]
C --> D[Database version column + conditional UPDATE]
D --> E[Distributed conditional write - DynamoDB ConditionExpression - Cassandra LWT]
Each layer in this diagram adds latency and abstraction, but preserves the same semantic: compare the current state to what you expect, and only write if they match. The OS mutex wraps CMPXCHG in a fair scheduling policy. Java's AtomicInteger.compareAndSet() calls through to the JVM intrinsic which compiles directly to CMPXCHG. The database version column translates the hardware guarantee into a SQL predicate. DynamoDB's ConditionExpression translates it into a distributed, network-round-trip conditional write across replicated storage nodes.
The practical implication: every optimistic locking pattern you write in SQL or NoSQL is a CAS at heart. The database engine is doing the compare-and-swap for you, with durability and replication layered on top.
The ABA Problem — When CAS Gives a False Positive
The ABA problem is the most dangerous pitfall in lock-free programming. Here is the exact scenario:
- Thread T1 reads memory location
Mand observes valueA. - Thread T2 changes
MfromAtoB. - Thread T2 changes
Mback fromBtoA. - Thread T1 executes
CAS(M, A, new_value). The comparison succeeds — M holdsA. T1 believes nothing has changed.
The value looks unchanged but the state it represents may have changed entirely. In a memory allocator, A might be a pointer to a freed-and-reallocated node. In a database, it might be a version number that was incremented and then decremented (by a rollback), leaving the same number but completely different committed data.
Solutions in production systems:
| Solution | How it works | Where used |
| Version counter (monotonic) | Never reuse a version number — increment only, never decrement. Even after rollback, the sequence advances. | PostgreSQL xmin, most version-column patterns |
| Stamped reference | Combine value + counter in a single atomic word. Java's AtomicStampedReference<V> atomically swaps (value, stamp) pairs — the stamp is a monotonically increasing integer. | Java concurrent data structures |
| Hazard pointers | Each thread publishes pointers it is currently reading; garbage collection defers freeing until no hazard pointers reference the node. | Lock-free queues, memory-reclaim systems |
| MVCC epoch fencing | CockroachDB and PostgreSQL use transaction epoch/timestamp monotonicity — a version can never legally "go backward" in committed state. | CockroachDB SSI, PostgreSQL MVCC |
For database optimistic locking, the version counter solution is almost universally correct: use an INT or BIGINT version column that is only ever incremented, never set to a lower value, and never reset on rollback (rollbacks discard the entire in-flight row change, not the committed version number).
⚖️ Optimistic vs. Pessimistic Locking — Choosing the Right Concurrency Strategy
Before going database-by-database, it is worth internalising the trade-off between the two dominant concurrency strategies, because choosing the wrong one is a performance cliff.
Optimistic Locking assumes conflicts are rare. It reads without any lock, computes locally, and attempts to write with a CAS-style check. If the check fails (conflict detected), it retries. The cost of a conflict is a retry; the cost of no conflict is near-zero overhead.
Pessimistic Locking assumes conflicts are likely. It acquires a lock before reading, holds it through computation, and releases it after writing. The cost of every read is a lock acquisition; there are no retries, but there is potential deadlock and reduced throughput under low-contention workloads.
| Optimistic Locking | Pessimistic Locking | |
| Lock acquisition | None (read freely) | Before read (SELECT FOR UPDATE) |
| Conflict detection | At write time (CAS check) | At read time (lock blocks competing readers) |
| Throughput — low contention | High (no lock overhead) | Lower (lock round-trip on every read) |
| Throughput — high contention | Low (many retries, wasted work) | Higher (serialized, no wasted compute) |
| Deadlock risk | None | Yes (requires lock ordering or timeout) |
| Latency profile | Unpredictable under high contention | Predictable (wait time = lock hold time) |
| Best for | Read-heavy, rare conflicts: user profiles, product catalogs, config records | Write-heavy, frequent conflicts: inventory, financial ledgers, seat reservations |
The rule of thumb used at companies like Stripe and GitHub: if your conflict rate under production load is below ~5%, optimistic locking wins. Above ~20%, pessimistic locking or queue-based serialization often wins on throughput even though it looks "heavier" on paper.
⚙️ How Optimistic Locking Builds on CAS — The Version Column Pattern
The version column pattern is the canonical database implementation of CAS. The mechanics are straightforward once you see them as a CAS in disguise:
- Read:
SELECT id, data, version FROM records WHERE id = ?— record the version number alongside the data. - Compute: Apply your business logic locally. The row is not locked; other transactions can read and modify it freely.
- Write (CAS attempt):
UPDATE records SET data = ?, version = version + 1 WHERE id = ? AND version = ?— theAND version = ?clause is the compare; the write is the swap. - Check result: If
rows_affected = 1, the CAS succeeded. Ifrows_affected = 0, a conflict occurred — another transaction wrote between your read and your write. Retry from step 1.
The sequence diagram below shows two concurrent transactions (T1 and T2) attempting to update the same driver record. T1 wins; T2 detects the conflict, re-reads the latest state, recomputes, and succeeds on its second attempt.
sequenceDiagram
participant T1 as Transaction T1
participant DB as Database
participant T2 as Transaction T2
T1->>DB: SELECT id, status, version WHERE id=D-7741
DB-->>T1: status=AVAILABLE, version=5
T2->>DB: SELECT id, status, version WHERE id=D-7741
DB-->>T2: status=AVAILABLE, version=5
T1->>DB: UPDATE drivers SET status=ASSIGNED, version=6 WHERE id=D-7741 AND version=5
DB-->>T1: rows_affected=1 - success
T2->>DB: UPDATE drivers SET billing_cycle=OPEN, version=6 WHERE id=D-7741 AND version=5
DB-->>T2: rows_affected=0 - conflict detected
Note over T2,DB: T2 retries - re-reads fresh state
T2->>DB: SELECT id, status, version WHERE id=D-7741
DB-->>T2: status=ASSIGNED, version=6
T2->>DB: UPDATE drivers SET billing_cycle=OPEN, version=7 WHERE id=D-7741 AND version=6
DB-->>T2: rows_affected=1 - success
Notice that T2 does not fail permanently — it re-reads the committed state (version=6, status=ASSIGNED) before recomputing its update. This re-read step is critical: T2 must incorporate T1's committed changes before writing, not blindly reapply its stale computation. A naive retry that reuses the stale read is a bug, not a fix.
🧠 How Each Database Implements CAS — Internals and Performance
Each database engine exposes CAS semantics through a different internal mechanism — some via SQL predicates, some via distributed consensus protocols, and some via atomic scripting. This section covers the internal machinery of each and closes with a comparative latency analysis.
CAS Internals: How Each Database Engine Handles the Compare-and-Swap
PostgreSQL — Explicit Version Columns and the Hidden xmin
PostgreSQL gives you two distinct approaches to optimistic locking, and choosing between them matters.
Approach 1: Explicit version column. Add a version INTEGER NOT NULL DEFAULT 1 column to your table. Every UPDATE increments it, and every write predicate checks it. The RETURNING clause confirms the new version in a single round-trip:
-- Schema
ALTER TABLE drivers ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
-- Read
SELECT id, status, billing_cycle, version
FROM drivers
WHERE id = 'D-7741';
-- Write (CAS attempt)
UPDATE drivers
SET status = 'ASSIGNED',
ride_id = 'R-9901',
version = version + 1
WHERE id = 'D-7741'
AND version = 5
RETURNING version;
-- If RETURNING returns a row: success (new version = 6)
-- If RETURNING returns no rows: conflict (version was not 5)
Approach 2: xmin system column. Every PostgreSQL row carries an xmin column — the transaction ID of the last transaction that wrote the row. You can use xmin as an implicit version number without adding any application-managed column:
-- Read xmin alongside your data
SELECT id, status, xmin
FROM drivers
WHERE id = 'D-7741';
-- CAS write using xmin as the version check
UPDATE drivers
SET status = 'ASSIGNED'
WHERE id = 'D-7741'
AND xmin = '3847291'::xid;
When to use xmin vs. explicit version: xmin is convenient for ad-hoc scripts and avoids schema changes, but it has two gotchas. First, xmin values wrap around after ~2 billion transactions (the XID wraparound problem). Second, xmin changes even on rows you did not logically modify — a VACUUM that freezes a row updates xmin. Use an explicit version INTEGER column for application-level optimistic locking. Reserve xmin for diagnostic queries and tooling.
MySQL InnoDB — Explicit Versions and the ROW_COUNT() Check
MySQL InnoDB does not expose an xmin-equivalent. Every optimistic locking implementation requires an explicit version column. The CAS check works identically to PostgreSQL in SQL syntax, but the conflict detection differs slightly — you check ROW_COUNT() after the update rather than RETURNING:
-- Schema
ALTER TABLE drivers ADD COLUMN version INT NOT NULL DEFAULT 1;
-- CAS write
UPDATE drivers
SET status = 'ASSIGNED',
ride_id = 'R-9901',
version = version + 1
WHERE id = 'D-7741'
AND version = 5;
-- Conflict detection
SELECT ROW_COUNT();
-- 1 = success (version was 5, now 6)
-- 0 = conflict (someone else changed the row first)
One nuance with InnoDB: MySQL acquires a row-level lock during UPDATE execution even in optimistic locking scenarios. The lock is held only for the duration of the write statement itself (typically milliseconds), not for the entire read-modify-write cycle. This means InnoDB's optimistic locking does not eliminate lock overhead completely — it eliminates the long-held lock between your read and your write.
Table-level locks (LOCK TABLES) are incompatible with the optimistic pattern. Never mix LOCK TABLES with version-column optimistic locking on the same table — the table lock will block all concurrent reads, negating the throughput advantage.
DynamoDB — ConditionExpression as Distributed CAS
DynamoDB implements CAS through ConditionExpression — a predicate evaluated atomically on the storage node alongside the write. If the condition is false, DynamoDB rejects the write with a ConditionalCheckFailedException and makes no change.
The canonical version-based pattern:
{
"TableName": "Drivers",
"Key": { "id": { "S": "D-7741" } },
"UpdateExpression": "SET #st = :new_status, #v = :new_version",
"ConditionExpression": "attribute_exists(#v) AND #v = :expected_version",
"ExpressionAttributeNames": {
"#st": "status",
"#v": "version"
},
"ExpressionAttributeValues": {
":new_status": { "S": "ASSIGNED" },
":new_version": { "N": "6" },
":expected_version": { "N": "5" }
}
}
The attribute_exists(#v) AND #v = :expected_version condition guards against two failure modes: (1) a concurrent writer changed the version, and (2) the item was deleted between your read and write (the attribute_exists guard). If either condition fails, you receive ConditionalCheckFailedException and retry.
DynamoDB's Java SDK v2 DynamoDbEnhancedClient supports version management via @DynamoDbVersionAttribute on your data class. This is a configuration-level annotation — it instructs the SDK to automatically add the ConditionExpression on every putItem and updateItem call. No application write path needs to manually construct the condition.
Important performance note: Standard DynamoDB writes use last-writer-wins with eventual consistency. ConditionExpression writes are still single-shard operations — they do not incur the Paxos round-trip cost that Cassandra LWTs do (covered below). A DynamoDB conditional write is approximately the same latency as an unconditional write on the same shard: typically 1–5ms at median.
Redis — WATCH/MULTI/EXEC and Lua Script CAS
Redis has no native version columns, but it provides two mechanisms to achieve CAS semantics: the WATCH/MULTI/EXEC transaction block and Lua scripts.
WATCH/MULTI/EXEC — Optimistic transaction block:
WATCH driver:D-7741:version
-- Read current state
GET driver:D-7741:status
GET driver:D-7741:version
-- Begin transaction
MULTI
SET driver:D-7741:status "ASSIGNED"
SET driver:D-7741:version 6
EXEC
-- If EXEC returns nil: the WATCHed key changed before EXEC; transaction aborted
-- If EXEC returns array of results: success
WATCH registers a key for optimistic monitoring. When EXEC is called, Redis checks whether any WATCHed key has been modified since the WATCH command. If so, the entire MULTI/EXEC block is aborted — EXEC returns nil. No partial writes occur. This is exactly the CAS semantic: compare (did the key change?) then swap (execute the block) atomically.
Lua scripts — Guaranteed atomic CAS:
-- Atomic CAS via Lua (executed as a single Redis command)
local current = redis.call('GET', KEYS[1])
if current == ARGV[1] then
redis.call('SET', KEYS[1], ARGV[2])
return 1
else
return 0
end
Called as: EVAL <script> 1 driver:D-7741:version "5" "6"
WATCH/MULTI/EXEC vs. Lua script — key difference: WATCH/MULTI/EXEC is optimistic at the Redis client level — the abort check happens at EXEC time, but between WATCH and EXEC, other clients can read and modify the key. Lua scripts are stronger: Redis guarantees that no other command executes while a Lua script is running. The entire Lua script is atomic at the Redis server level. For a pure CAS operation, Lua is simpler and more reliable; use WATCH/MULTI/EXEC when you need to build a richer transaction across multiple keys.
MongoDB — findOneAndUpdate with Version Predicate
MongoDB's optimistic locking relies on findOneAndUpdate (or updateOne) with a version field in the filter document. The atomicity guarantee is at the document level — MongoDB acquires a document-level lock for the duration of the update:
// CAS update: only updates if version matches
db.drivers.findOneAndUpdate(
{ _id: "D-7741", version: 5 }, // filter = CAS check
{
$set: { status: "ASSIGNED", ride_id: "R-9901" },
$inc: { version: 1 } // atomically increment version
},
{ returnDocument: "after" }
)
// Returns the updated document on success
// Returns null if filter matched nothing (conflict or missing document)
Conflict detection is straightforward: if findOneAndUpdate returns null, either the document does not exist or the version did not match. Use returnDocument: "after" to get the new state in a single round-trip.
For updateOne, check result.matchedCount vs. result.modifiedCount. A matchedCount = 0 means the version predicate failed (conflict). A matchedCount = 1, modifiedCount = 0 should not happen with a $inc update but would indicate no-op semantics if it did.
Mongoose __v version key: Mongoose automatically adds a __v field (version key) to every schema and increments it on save(). The Mongoose middleware performs the version check transparently. However, Mongoose's version key only protects array field modifications by default — it does not protect arbitrary field updates unless you explicitly add the version predicate to your queries. For production-grade optimistic locking in MongoDB, use an explicit version field and the findOneAndUpdate pattern above rather than relying on __v.
CockroachDB — Serializable Isolation as Built-in CAS
CockroachDB runs Serializable Snapshot Isolation (SSI) by default — the strongest isolation level. Every transaction observes a consistent snapshot and is validated at commit time. If two transactions would produce a non-serializable execution (i.e., one transaction's read set overlaps another's write set in a way that changes the outcome), CockroachDB aborts one of them with a retryable error.
This means CockroachDB provides CAS semantics for free, without explicit version columns, for any UPDATE statement:
BEGIN;
SELECT status, version FROM drivers WHERE id = 'D-7741';
-- Compute new state...
UPDATE drivers SET status = 'ASSIGNED', ride_id = 'R-9901' WHERE id = 'D-7741';
COMMIT;
-- If another transaction committed a conflicting write concurrently,
-- CockroachDB aborts this transaction with error code 40001 (serialization_failure)
CockroachDB uses SAVEPOINT cockroach_restart for its recommended client-side retry loop:
BEGIN;
SAVEPOINT cockroach_restart;
-- your read + write logic --
RELEASE SAVEPOINT cockroach_restart;
COMMIT;
-- On 40001 error: ROLLBACK TO SAVEPOINT cockroach_restart; then retry
The SAVEPOINT cockroach_restart pattern is CockroachDB-specific: it marks the beginning of the retryable unit. On a serialization failure, rolling back to the savepoint (instead of to BEGIN) allows the connection to retry without a full reconnect.
Does CockroachDB make explicit version columns redundant? For preventing lost updates at the database level — yes, SSI handles it automatically. But application-level version columns are still useful when: (1) you need to propagate the version to your API layer for client-side optimistic locking (e.g., ETags in REST APIs), or (2) you want visible, auditable version history in the row data itself.
Cassandra — Lightweight Transactions and the Paxos Cost
Cassandra's normal writes are last-writer-wins with no version checking. To achieve CAS semantics, Cassandra provides Lightweight Transactions (LWTs) using an IF clause on UPDATE and INSERT statements:
UPDATE drivers
SET status = 'ASSIGNED',
ride_id = 'R-9901',
version = 6
WHERE id = 'D-7741'
IF version = 5;
The response always includes an [applied] boolean column:
[applied] | id | version
-----------+--------+---------
True | D-7741 | 6 -- success: version was 5, now 6
If [applied] = False, the row was not modified and the response also returns the current values of all IF-clause columns — so you can immediately inspect the conflicting state without a separate read.
The performance cost of LWT is significant. A normal Cassandra write requires a single coordinator round-trip to a quorum of replicas — roughly 1–3ms in a well-tuned cluster. A LWT write requires a Paxos consensus round consisting of at minimum 2 phase-1 (Prepare/Promise) round-trips and 1 phase-2 (Accept/Accepted) round-trip across the replica set. In practice, this is 3–4 network round-trips instead of 1, making LWT writes approximately 4× slower than normal writes at median latency, and significantly worse at the 99th percentile under contention.
When NOT to use Cassandra LWT:
- High-throughput counters: Use Cassandra's native
COUNTERcolumn type, which uses a convergent CRDT, not Paxos. - Idempotent operations: If your write is safe to apply multiple times (e.g., setting a boolean flag), prefer idempotent writes with application-level retry.
- Fan-out updates touching many rows: Each LWT row is a separate Paxos round. At scale, this serializes throughput badly.
- Time-series ingestion: Never use LWT on time-series tables. Use
WRITETIMEandTTLwith monotonic partition keys instead.
Use LWT in Cassandra only when the operation is truly non-idempotent and exactly-once semantics are required at the database level — for example, a "create user account if username not taken" operation.
Performance Analysis: Latency and Throughput Cost Across Database CAS Implementations
Not all CAS implementations are equally expensive. The table below summarises the latency and throughput characteristics of each database's mechanism under typical production conditions.
| Database | CAS Mechanism | Typical Write Latency | Overhead vs. Normal Write | Notes |
| PostgreSQL | WHERE version=? predicate | 1–5ms | ~0% overhead | Predicate evaluated inline with row lock; no extra round-trips |
| MySQL InnoDB | WHERE version=? + ROW_COUNT() | 1–5ms | ~0% overhead | Row-level lock for write duration only; no extra phases |
| DynamoDB | ConditionExpression | 1–5ms | ~0% overhead | Condition evaluated on storage node atomically; same latency as unconditional write |
| Redis | WATCH/MULTI/EXEC | <1ms | Minimal (~1 extra RTT for WATCH) | WATCH adds one round-trip; EXEC is atomic at server |
| Redis Lua | Lua script | <1ms | ~0% overhead | Fully atomic; no client-side retry needed |
| MongoDB | findOneAndUpdate with filter | 2–8ms | ~0% overhead | Document-level lock for write; no extra phases |
| CockroachDB | SSI abort + retry | 2–15ms | 10–30% at P99 under contention | SSI validation at commit adds latency; retry loop on conflict |
| Cassandra | LWT (IF clause) | 15–50ms | ~4× normal write | Paxos: 2 Prepare/Promise + 1 Accept/Accepted round-trips minimum |
The Cassandra LWT row is the most important takeaway: at 4× normal write latency and minimum 2 Paxos phase-1 round-trips, LWT at high request rates can saturate cluster coordinator threads and cause cascading latency degradation. CockroachDB's 10–30% P99 overhead is a reasonable price for automatic SSI — it only manifests under write contention, not under normal load.
🧪 Retry Strategy Patterns and Their Production Trade-offs
A CAS failure is not an error — it is a signal to retry. But how you retry determines whether your system recovers gracefully or degrades into a retry storm.
The decision flowchart below shows the complete retry logic for a production CAS write. Each branch represents a specific strategy with specific failure modes:
graph TD
A[Attempt CAS write] --> B{Result?}
B -->|Success| C[Done - proceed]
B -->|Conflict - version mismatch| D[Increment retry counter]
B -->|Hard error - network timeout| E[Classify error]
D --> F{Retry count under max?}
F -->|Yes| G[Re-read fresh state]
F -->|No| H[Abort - return error to caller]
G --> I[Recompute with fresh data]
I --> J[Wait - exponential backoff with jitter]
J --> A
E --> K{Is error retryable?}
K -->|Yes - transient| J
K -->|No - permanent| H
This flowchart illustrates the four critical decisions in a retry loop: (1) always re-read before recomputing, (2) apply exponential backoff with jitter, (3) enforce a maximum retry limit, and (4) distinguish between conflict-retries (version mismatch) and hard-error-retries (network timeout). Mixing these two categories without distinction is a common production bug.
The four retry strategies ranked by correctness:
| Strategy | Mechanism | When to use | Risk |
| Immediate retry | Loop with no delay | Almost never | Thundering herd — all retriers hit the DB simultaneously |
| Exponential backoff | delay = base * 2^attempt (e.g., 100ms, 200ms, 400ms) | Moderate contention | Synchronized retry waves if all clients start at the same time |
| Exponential backoff + jitter | delay = random(0, base * 2^attempt) | Most production scenarios | Slight complexity; highly recommended by AWS and Google SRE |
| Max retry + circuit breaker | Stop retrying after N attempts; open circuit if failure rate exceeds threshold | High-contention scenarios | Requires circuit breaker library; prevents runaway retry storms |
Idempotency keys make retries safe in distributed systems. Before the first attempt, generate a unique idempotency_key (a UUID). Include it in your write. If the write is interrupted mid-flight (network timeout), retry with the same key. The server can detect and deduplicate the retry without applying the operation twice. Stripe uses this pattern on every Payment Intent write.
🧭 When Optimistic Locking Fails — The High-Contention Decision Guide
Optimistic locking excels when conflicts are rare. But some workloads are inherently high-contention, and optimistic locking degrades badly under them.
Flash sale scenario: 50,000 users simultaneously attempt to decrement a product's inventory count from 1 to 0. Only one write can succeed. The other 49,999 all read version=100, compute new_quantity=0, and simultaneously attempt UPDATE ... WHERE version=100. One succeeds. 49,999 get rows_affected=0, re-read version=101, compute new_quantity=-1 (oops — needs a quantity > 0 guard), and retry. The retry wave generates 49,999 reads, 49,999 writes, and another 49,998 failures in rapid succession. This is a retry storm.
Decision guide — when to switch away from optimistic locking:
| Workload | Conflict rate | Recommended strategy |
| User profile updates | < 1% | Optimistic locking (version column) |
| Product catalog reads with rare admin edits | < 1% | Optimistic locking |
| Shopping cart line-item updates | 1–5% | Optimistic locking with jitter retry |
| Seat reservation / ticket booking | 5–30% | Pessimistic locking (SELECT FOR UPDATE) or queue-based serialization |
| Flash sale inventory decrement | > 50% | Queue-based serialization (one writer at a time via Kafka/SQS) or Redis atomic DECR with floor check |
| Financial ledger debit/credit | Variable | Pessimistic locking with deadlock timeout, or append-only ledger pattern |
The queue-based approach deserializes the problem: instead of 50,000 concurrent CAS attempts, a single consumer processes inventory decrements from a queue, one at a time. No conflicts, no retries, predictable latency. The cost is additional infrastructure and slightly higher write latency — usually an acceptable trade for correctness guarantees.
📊 The Complete CAS Write Flow — From Client Request to Committed State
The diagram below shows the complete request lifecycle for an optimistic write, from the client HTTP call through the service layer to the database, with explicit success and retry paths.
graph TD
A[Client - HTTP PATCH /drivers/D-7741] --> B[Service Layer]
B --> C[Read driver record and version]
C --> D[Database Read Replica or Primary]
D --> E[Return record with version=N]
E --> F[Compute new state locally]
F --> G[Attempt CAS write - UPDATE WHERE version=N]
G --> H{Database CAS result}
H -->|rows_affected=1 - success| I[Return 200 OK with new version]
H -->|rows_affected=0 - conflict| J[Re-read fresh state]
J --> K{Retry count under limit?}
K -->|Yes| L[Apply backoff and jitter]
L --> F
K -->|No| M[Return 409 Conflict to client]
I --> N[Emit audit event and downstream notifications]
This architecture shows three important design decisions. First, the service layer performs all computation locally — the database only executes the CAS predicate, not business logic. Second, the retry loop re-reads from the database before recomputing, ensuring each attempt operates on committed state. Third, exhausted retries surface as a 409 Conflict HTTP response — the client, not the server, decides whether to retry at the application level (e.g., ask the user to refresh and resubmit).
🌍 How Stripe, GitHub, and Amazon Use CAS in Production
Stripe uses version columns on PaymentIntent records. Every state transition (created → processing → succeeded → captured) is guarded by a CAS write against the current state value. If two webhook events try to advance the same intent simultaneously, only one succeeds. Stripe's idempotency key mechanism ensures safe retry of the losing event.
GitHub uses optimistic locking on repository metadata — fork counts, star counts, default branch pointers. The repos table carries a lock_version column (a Rails convention). Repository renames and visibility changes use CAS writes to prevent concurrent admin operations from corrupting the repository's canonical state.
Amazon DynamoDB's own documentation uses a shopping cart as the canonical optimistic locking example. Each cart item carries a version attribute. When a mobile client and a web client both load the cart and make independent additions, the ConditionExpression on the write ensures only the first write succeeds; the second receives ConditionalCheckFailedException and re-fetches the merged cart.
Atlassian Jira uses an OPTLOCK version column on issue records. Concurrent edits to the same issue (e.g., two users updating the description and the assignee simultaneously in the UI) are resolved through CAS writes on the Jira issue entity. The losing writer receives an "edit conflict" dialog prompting a manual merge — a user-visible manifestation of the CAS failure.
🛠️ OSS Config Reference — DDL and Query Syntax Across Databases
This section provides copy-ready DDL, query, and command syntax for each database's CAS implementation. These are configuration and query patterns — not application code.
PostgreSQL — Version Column DDL and UPDATE Pattern
-- DDL: add version column to existing table
ALTER TABLE drivers
ADD COLUMN version INTEGER NOT NULL DEFAULT 1;
-- Trigger: auto-increment version on every UPDATE (optional but recommended)
CREATE OR REPLACE FUNCTION increment_version()
RETURNS TRIGGER AS $$
BEGIN
NEW.version = OLD.version + 1;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_drivers_version
BEFORE UPDATE ON drivers
FOR EACH ROW EXECUTE FUNCTION increment_version();
-- CAS read
SELECT id, status, billing_cycle, version
FROM drivers
WHERE id = 'D-7741';
-- CAS write (explicit version management)
UPDATE drivers
SET status = 'ASSIGNED',
ride_id = 'R-9901',
version = version + 1
WHERE id = 'D-7741'
AND version = 5
RETURNING id, version;
-- No rows returned = conflict (retry required)
-- Row returned = success (new version in result)
DynamoDB — ConditionExpression JSON Pattern
{
"TableName": "Drivers",
"Key": {
"id": { "S": "D-7741" }
},
"UpdateExpression": "SET #st = :new_status, #v = :new_version, #rid = :ride_id",
"ConditionExpression": "attribute_exists(#v) AND #v = :expected_version",
"ExpressionAttributeNames": {
"#st": "status",
"#v": "version",
"#rid": "ride_id"
},
"ExpressionAttributeValues": {
":new_status": { "S": "ASSIGNED" },
":new_version": { "N": "6" },
":ride_id": { "S": "R-9901" },
":expected_version": { "N": "5" }
},
"ReturnValues": "ALL_NEW"
}
On ConditionalCheckFailedException: re-read the item using GetItem, inspect the current version, recompute, and retry with the new expected_version value.
Redis — WATCH/MULTI/EXEC Command Sequence
# Step 1: Watch the version key
WATCH driver:D-7741:version
# Step 2: Read current state
GET driver:D-7741:version
GET driver:D-7741:status
# Step 3: Begin optimistic transaction
MULTI
SET driver:D-7741:status "ASSIGNED"
SET driver:D-7741:ride_id "R-9901"
SET driver:D-7741:version "6"
EXEC
# EXEC returns:
# - Array of results if no watched key changed (success)
# - nil if any watched key changed (abort; retry from WATCH)
Cassandra — LWT IF Clause and Response Parsing
-- CAS write with LWT
UPDATE drivers
SET status = 'ASSIGNED',
ride_id = 'R-9901',
version = 6
WHERE id = 'D-7741'
IF version = 5;
-- Response columns:
-- [applied] = true → success
-- [applied] = false → conflict; Cassandra also returns current column values
-- INSERT with uniqueness guard (create-if-not-exists)
INSERT INTO usernames (username, user_id)
VALUES ('alice', 'U-001')
IF NOT EXISTS;
-- [applied] = false → username already taken; response includes existing user_id
📚 Lessons Learned — What Production Systems Teach You About CAS
Always re-read before recomputing on retry. A retry that reuses stale read data is not a retry — it is a repeated mistake. The whole point of detecting a conflict is that the world changed; your computation must incorporate those changes.
The ABA problem is real in database systems. A version number that wraps around (e.g., 32-bit counter with frequent updates) or a row that is deleted and re-inserted with the same primary key can produce ABA. Always use monotonically increasing, non-reusable version numbers —
BIGINT, notINT, in high-write tables.Cassandra LWT is a last resort, not a default. The 4× latency penalty compounds under load. Design Cassandra schemas to be idempotent and use LWT only where exactly-once semantics are truly non-negotiable. In most cases, idempotent writes with application-level deduplication are cheaper and more scalable.
CockroachDB's SSI is not free. Serializable isolation has a latency cost at commit time — the SSI validation scan. Under high write contention, the abort-and-retry rate can approach that of explicit optimistic locking. Use explicit version columns in CockroachDB when you want application-visible conflict detection rather than opaque
40001errors.Optimistic locking and pessimistic locking are not mutually exclusive. You can use
SELECT FOR UPDATEfor the high-contention "hot path" (e.g., the single inventory row for a flash sale SKU) while using version-column optimistic locking for the low-contention "cold path" (e.g., product catalog metadata). Profile your workload and segment your locking strategy.HTTP ETags are optimistic locking at the API layer. When a REST API returns an
ETagheader containing the resource version, and requires clients to sendIf-Match: <etag>on updates, it is implementing CAS between the client and the server. The server's version column is the authoritative version; the ETag is the client's read receipt. Losing theIf-Matchcheck on your REST endpoint exposes the same lost update vulnerability as losing theAND version=?clause on your SQL update.Idempotency keys make CAS safe to retry across network boundaries. A CAS write that times out mid-flight may or may not have committed. Without an idempotency key, retrying risks applying the operation twice. With an idempotency key, the server can detect and deduplicate the retry atomically.
📌 Summary & Key Takeaways
Compare-and-Swap is the hardware foundation of all lock-free concurrency. The CMPXCHG instruction on x86 CPUs provides an atomic compare-and-swap in a single clock cycle; every database optimistic locking primitive is this instruction expressed at a higher layer of abstraction.
Key takeaways for production engineers:
- CAS semantics are universal: PostgreSQL
WHERE version=?, DynamoDBConditionExpression, RedisWATCH/EXEC, CassandraIF version=, CockroachDB SSI abort — all are the same CAS primitive at different layers. - Retry correctly: Re-read → recompute → retry with jitter. Never reuse stale reads. Always cap retry count and emit metrics on conflict rate.
- The ABA problem requires monotonic version counters: Use
BIGINT, increment only, never reset. This eliminates the false-success case. - Match your locking strategy to your contention profile: Optimistic locking wins below ~5% conflict rate. Pessimistic locking or queue-based serialization wins above ~20%.
- Cassandra LWT is ~4× slower than normal writes due to the Paxos round-trip (2 phase-1 + 1 phase-2 = minimum 3 network round-trips). Use it sparingly.
- CockroachDB gives you SSI for free, but application-visible version columns are still valuable for API-level conflict surfacing and auditing.
- ETags are CAS at the HTTP layer. Always validate
If-Matchon mutating REST endpoints.
📝 Practice Quiz
What is the three-argument signature of the CAS primitive, and what does each argument mean?
Correct Answer:
CAS(memory_location, expected_value, new_value)—memory_locationis the address to check and modify;expected_valueis the value you expect to find there;new_valueis the value to write if the comparison succeeds.Which x86-64 CPU instruction implements Compare-and-Swap, and what is the ARM equivalent?
Correct Answer:
CMPXCHGon x86-64;LDREX/STREX(Load-Reserved/Store-Conditional) on ARM.Describe the ABA problem in one sentence and give one database scenario where it can occur.
Correct Answer: The ABA problem occurs when a value changes from A to B and back to A between a thread's read and its CAS attempt, making the CAS falsely succeed. A database scenario: a
versioncolumn with a 32-bitINTthat overflows and resets, allowing an old version number to re-appear on a different logical record state.In the version column pattern, what does a
rows_affected = 0result from anUPDATE ... WHERE version = ?indicate, and what must the application do next?Correct Answer: It indicates a conflict — another transaction modified the row and incremented the version since the current transaction read it. The application must re-read the row to get the latest version and committed state, recompute its changes on that fresh state, and retry the CAS write.
PostgreSQL offers
xminas an implicit version. Name two scenarios where usingxminfor optimistic locking is not recommended.Correct Answer: (1) When
xminvalues may wrap around due to XID wraparound after ~2 billion transactions. (2) WhenVACUUMoperations updatexminon rows you did not logically modify, causing spurious conflict false-positives.What is the approximate performance penalty for a Cassandra Lightweight Transaction (LWT) compared to a normal Cassandra write, and what protocol causes it?
Correct Answer: LWT writes are approximately 4× slower than normal writes, because each LWT requires a Paxos consensus round — a minimum of 2 phase-1 (Prepare/Promise) round-trips and 1 phase-2 (Accept/Accepted) round-trip across the replica set, versus a single round-trip quorum write for a normal write.
What is the key difference between Redis
WATCH/MULTI/EXECand a Redis Lua script for implementing CAS?Correct Answer:
WATCH/MULTI/EXECis optimistic at the client level — other clients can modify the watched key betweenWATCHandEXEC; the abort is detected atEXECtime. A Lua script is atomic at the server level — Redis guarantees no other command executes while the script runs, making it a true atomic CAS with no interleaved window.CockroachDB uses Serializable Snapshot Isolation by default. Does this make explicit version columns unnecessary? Explain.
Correct Answer: SSI prevents lost updates at the database level — any concurrent write that would produce a non-serializable outcome is aborted with error code
40001. Explicit version columns are not necessary for database-level correctness. However, they remain useful for: (1) surfacing the version in API responses (e.g., ETags), (2) providing auditable version history in the row, and (3) giving the application a human-readable conflict signal rather than a raw serialization error.In DynamoDB, what does
attribute_exists(#v) AND #v = :expected_versionin aConditionExpressionprotect against, compared to just#v = :expected_versionalone?Correct Answer: The
attribute_exists(#v)guard protects against the case where the item was deleted between the read and the write — if the item is gone,#v = :expected_versionalone would fail withConditionalCheckFailedExceptionfor both "item deleted" and "version mismatch" reasons. Withattribute_exists, the caller can distinguish between "the item no longer exists" and "the item exists but was modified".(Open-ended) Your team is building a flight seat reservation system. Seats can be reserved by millions of concurrent users during peak booking windows. A seat can transition from
AVAILABLE→HELD→CONFIRMEDorAVAILABLE→HELD→RELEASED. Describe the concurrency strategy you would choose for theHELDstate transition, justify your choice, and explain what would happen if you chose the opposite strategy instead.
🔗 Related Posts
- Lost Update Explained: When Two Writes Become One — The anomaly that CAS directly prevents; covers
FOR UPDATE, optimistic locking, and the read-modify-write cycle in depth. - Write Skew Explained: When Two Transactions Corrupt Consistent State — A related anomaly requiring serializable isolation; explains where optimistic locking alone is insufficient.
- Dirty Write Explained: When Uncommitted Data Gets Overwritten — Covers the baseline locking semantics that underpin CAS guarantees in distributed stores.
- Key Terms in Distributed Systems — Glossary reference for CAS, MVCC, isolation levels, and related distributed systems primitives.
- Isolation Levels in Databases — How Read Committed, Repeatable Read, and Serializable interact with optimistic locking strategies.
- ACID Transactions in DynamoDB, CosmosDB, and Spanner — How conditional writes fit into the broader ACID transaction model in distributed databases.
- Data Anomalies in Distributed Systems — Survey of all read and write anomalies; provides the full context for why CAS exists.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF
TLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer tr

Chain of Thought Prompting: Teaching LLMs to Think Step by Step
TLDR: Chain of Thought (CoT) prompting tells a language model to reason out loud before answering. By generating intermediate steps, the model steers itself toward correct conclusions — turning guessw

Transfer Learning Explained: Standing on the Shoulders of Pretrained Models
TLDR: You don't need millions of labeled images or months of GPU time to build a great model. Transfer learning lets you borrow a pretrained network's hard-won feature detectors, plug in a new output

LLM Hallucinations: Causes, Detection, and Mitigation Strategies
TLDR: LLMs hallucinate because they are trained to predict the next plausible token — not the next true token. Understanding the three hallucination types (factual, faithfulness, open-domain) plus the
