All Posts

Compare-and-Swap and Optimistic Locking: How Every Database Implements It

From the CMPXCHG CPU instruction to DynamoDB ConditionExpressions and Cassandra LWT — how CAS and optimistic locking work across every major database.

Abstract AlgorithmsAbstract Algorithms
··34 min read
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: Compare-and-Swap (CAS) is the CPU-level atomic instruction that makes lock-free concurrency possible. Optimistic locking builds on it at the database layer: read freely, compute locally, write only if the record has not changed. Every major database — PostgreSQL, MySQL, DynamoDB, Redis, MongoDB, CockroachDB, and Cassandra — implements CAS semantics differently. Choosing the wrong implementation, or retrying naïvely, turns a 10ms write into a retry storm. This post explains the full stack from CMPXCHG to ConditionExpression.

🚨 The 47-Second Window That Broke a Ride-Share Platform

At 08:14:32 UTC on a weekday morning, a ride-sharing platform's dispatch service and billing service both received an event for driver ID D-7741. The dispatch service read the driver record: status = AVAILABLE, version = 42. Simultaneously — 23 milliseconds later — the billing service read the same row: status = AVAILABLE, version = 42.

The dispatch service computed a new state: status = ASSIGNED, ride_id = R-9901. The billing service computed a parallel update: billing_cycle = OPEN, last_event_ts = 2026-03-14T08:14:32Z. Both services executed their UPDATE statements 47 seconds after the initial read — within 3 milliseconds of each other.

The dispatch write landed first. The billing write landed second — and because it used a plain UPDATE drivers SET billing_cycle='OPEN' WHERE id='D-7741', it overwrote the entire row with stale data. The driver's status reverted silently to AVAILABLE. No error was raised. No alert fired. The driver was now simultaneously "assigned" in the ride table and "available" in the dispatch queue.

For the next 4 minutes, the system tried to assign the driver to a second ride. The original passenger was stranded. Surge pricing calculations used stale availability data. Three downstream metrics pipelines ingested corrupt status events. The root cause was a single missing WHERE version = 42 clause.

This is the lost update problem — and Compare-and-Swap is the primitive that prevents it.


📖 The CPU Instruction Behind Every Database Concurrent Write

Compare-and-Swap (CAS) is an atomic CPU instruction. "Atomic" means it executes as a single, indivisible operation — no other CPU core can observe an intermediate state. The semantics are simple:

CAS(memory_location, expected_value, new_value) → boolean
  if *memory_location == expected_value:
    *memory_location = new_value
    return true
  else:
    return false   // nothing changed

On x86-64 CPUs, this maps to the CMPXCHG instruction. On ARM processors, the equivalent is a LDREX/STREX pair (Load-Reserved / Store-Conditional). Both guarantee that the comparison and the swap happen without any other core touching the memory location in between.

The atomicity guarantee comes from the CPU's cache coherence protocol. When a core executes CMPXCHG, it asserts exclusive ownership of the cache line containing the memory location. Other cores must wait or invalidate their cache lines. This is not a software lock — it is a hardware guarantee enforced by the memory subsystem itself.

CAS has exactly three outcomes:

OutcomeConditionResult
SuccessMemory held expected valueSwap performed, returns true
FailureMemory held a different valueNo change, returns false
ABAMemory changed A→B→AAppears as success, may be logically incorrect

The ABA problem (covered in depth shortly) is the critical pitfall that every production system using CAS must address.

The diagram below shows the CAS decision flow from the moment a thread issues the instruction to the moment it acts on the result.

graph TD
    A[Thread reads memory_location into register] --> B{Is register == expected_value?}
    B -->|Yes - atomically| C[Write new_value to memory_location]
    B -->|No| D[Return false - no change]
    C --> E[Return true - swap succeeded]
    D --> F{Should retry?}
    F -->|Yes - backoff| A
    F -->|No - abort| G[Propagate conflict to caller]

The key insight here is that the B diamond and C box execute as a single hardware instruction — nothing can interleave between the comparison and the write. This is what makes CAS the foundation of all lock-free algorithms. The retry loop at F is an application-level decision; the hardware only guarantees the atomic compare-and-swap itself.


🔍 How CAS Propagates from Hardware to Distributed Storage — The Abstraction Stack

CAS does not stay at the hardware level. Every layer of software above the CPU builds on it, abstracting the raw memory instruction into higher-level constructs. Understanding this stack tells you why database version columns and DynamoDB ConditionExpressions are not arbitrary design choices — they are CAS, two layers up.

graph TD
    A[x86 CMPXCHG / ARM LDREX+STREX] --> B[OS mutex and spinlock primitives]
    B --> C[Language atomics - Java AtomicInteger - Go sync/atomic]
    C --> D[Database version column + conditional UPDATE]
    D --> E[Distributed conditional write - DynamoDB ConditionExpression - Cassandra LWT]

Each layer in this diagram adds latency and abstraction, but preserves the same semantic: compare the current state to what you expect, and only write if they match. The OS mutex wraps CMPXCHG in a fair scheduling policy. Java's AtomicInteger.compareAndSet() calls through to the JVM intrinsic which compiles directly to CMPXCHG. The database version column translates the hardware guarantee into a SQL predicate. DynamoDB's ConditionExpression translates it into a distributed, network-round-trip conditional write across replicated storage nodes.

The practical implication: every optimistic locking pattern you write in SQL or NoSQL is a CAS at heart. The database engine is doing the compare-and-swap for you, with durability and replication layered on top.


The ABA Problem — When CAS Gives a False Positive

The ABA problem is the most dangerous pitfall in lock-free programming. Here is the exact scenario:

  1. Thread T1 reads memory location M and observes value A.
  2. Thread T2 changes M from A to B.
  3. Thread T2 changes M back from B to A.
  4. Thread T1 executes CAS(M, A, new_value). The comparison succeeds — M holds A. T1 believes nothing has changed.

The value looks unchanged but the state it represents may have changed entirely. In a memory allocator, A might be a pointer to a freed-and-reallocated node. In a database, it might be a version number that was incremented and then decremented (by a rollback), leaving the same number but completely different committed data.

Solutions in production systems:

SolutionHow it worksWhere used
Version counter (monotonic)Never reuse a version number — increment only, never decrement. Even after rollback, the sequence advances.PostgreSQL xmin, most version-column patterns
Stamped referenceCombine value + counter in a single atomic word. Java's AtomicStampedReference<V> atomically swaps (value, stamp) pairs — the stamp is a monotonically increasing integer.Java concurrent data structures
Hazard pointersEach thread publishes pointers it is currently reading; garbage collection defers freeing until no hazard pointers reference the node.Lock-free queues, memory-reclaim systems
MVCC epoch fencingCockroachDB and PostgreSQL use transaction epoch/timestamp monotonicity — a version can never legally "go backward" in committed state.CockroachDB SSI, PostgreSQL MVCC

For database optimistic locking, the version counter solution is almost universally correct: use an INT or BIGINT version column that is only ever incremented, never set to a lower value, and never reset on rollback (rollbacks discard the entire in-flight row change, not the committed version number).


⚖️ Optimistic vs. Pessimistic Locking — Choosing the Right Concurrency Strategy

Before going database-by-database, it is worth internalising the trade-off between the two dominant concurrency strategies, because choosing the wrong one is a performance cliff.

Optimistic Locking assumes conflicts are rare. It reads without any lock, computes locally, and attempts to write with a CAS-style check. If the check fails (conflict detected), it retries. The cost of a conflict is a retry; the cost of no conflict is near-zero overhead.

Pessimistic Locking assumes conflicts are likely. It acquires a lock before reading, holds it through computation, and releases it after writing. The cost of every read is a lock acquisition; there are no retries, but there is potential deadlock and reduced throughput under low-contention workloads.

Optimistic LockingPessimistic Locking
Lock acquisitionNone (read freely)Before read (SELECT FOR UPDATE)
Conflict detectionAt write time (CAS check)At read time (lock blocks competing readers)
Throughput — low contentionHigh (no lock overhead)Lower (lock round-trip on every read)
Throughput — high contentionLow (many retries, wasted work)Higher (serialized, no wasted compute)
Deadlock riskNoneYes (requires lock ordering or timeout)
Latency profileUnpredictable under high contentionPredictable (wait time = lock hold time)
Best forRead-heavy, rare conflicts: user profiles, product catalogs, config recordsWrite-heavy, frequent conflicts: inventory, financial ledgers, seat reservations

The rule of thumb used at companies like Stripe and GitHub: if your conflict rate under production load is below ~5%, optimistic locking wins. Above ~20%, pessimistic locking or queue-based serialization often wins on throughput even though it looks "heavier" on paper.


⚙️ How Optimistic Locking Builds on CAS — The Version Column Pattern

The version column pattern is the canonical database implementation of CAS. The mechanics are straightforward once you see them as a CAS in disguise:

  1. Read: SELECT id, data, version FROM records WHERE id = ? — record the version number alongside the data.
  2. Compute: Apply your business logic locally. The row is not locked; other transactions can read and modify it freely.
  3. Write (CAS attempt): UPDATE records SET data = ?, version = version + 1 WHERE id = ? AND version = ? — the AND version = ? clause is the compare; the write is the swap.
  4. Check result: If rows_affected = 1, the CAS succeeded. If rows_affected = 0, a conflict occurred — another transaction wrote between your read and your write. Retry from step 1.

The sequence diagram below shows two concurrent transactions (T1 and T2) attempting to update the same driver record. T1 wins; T2 detects the conflict, re-reads the latest state, recomputes, and succeeds on its second attempt.

sequenceDiagram
    participant T1 as Transaction T1
    participant DB as Database
    participant T2 as Transaction T2

    T1->>DB: SELECT id, status, version WHERE id=D-7741
    DB-->>T1: status=AVAILABLE, version=5
    T2->>DB: SELECT id, status, version WHERE id=D-7741
    DB-->>T2: status=AVAILABLE, version=5

    T1->>DB: UPDATE drivers SET status=ASSIGNED, version=6 WHERE id=D-7741 AND version=5
    DB-->>T1: rows_affected=1 - success

    T2->>DB: UPDATE drivers SET billing_cycle=OPEN, version=6 WHERE id=D-7741 AND version=5
    DB-->>T2: rows_affected=0 - conflict detected

    Note over T2,DB: T2 retries - re-reads fresh state
    T2->>DB: SELECT id, status, version WHERE id=D-7741
    DB-->>T2: status=ASSIGNED, version=6
    T2->>DB: UPDATE drivers SET billing_cycle=OPEN, version=7 WHERE id=D-7741 AND version=6
    DB-->>T2: rows_affected=1 - success

Notice that T2 does not fail permanently — it re-reads the committed state (version=6, status=ASSIGNED) before recomputing its update. This re-read step is critical: T2 must incorporate T1's committed changes before writing, not blindly reapply its stale computation. A naive retry that reuses the stale read is a bug, not a fix.


🧠 How Each Database Implements CAS — Internals and Performance

Each database engine exposes CAS semantics through a different internal mechanism — some via SQL predicates, some via distributed consensus protocols, and some via atomic scripting. This section covers the internal machinery of each and closes with a comparative latency analysis.

CAS Internals: How Each Database Engine Handles the Compare-and-Swap

PostgreSQL — Explicit Version Columns and the Hidden xmin

PostgreSQL gives you two distinct approaches to optimistic locking, and choosing between them matters.

Approach 1: Explicit version column. Add a version INTEGER NOT NULL DEFAULT 1 column to your table. Every UPDATE increments it, and every write predicate checks it. The RETURNING clause confirms the new version in a single round-trip:

-- Schema
ALTER TABLE drivers ADD COLUMN version INTEGER NOT NULL DEFAULT 1;

-- Read
SELECT id, status, billing_cycle, version
FROM drivers
WHERE id = 'D-7741';

-- Write (CAS attempt)
UPDATE drivers
SET    status       = 'ASSIGNED',
       ride_id      = 'R-9901',
       version      = version + 1
WHERE  id           = 'D-7741'
AND    version      = 5
RETURNING version;
-- If RETURNING returns a row: success (new version = 6)
-- If RETURNING returns no rows: conflict (version was not 5)

Approach 2: xmin system column. Every PostgreSQL row carries an xmin column — the transaction ID of the last transaction that wrote the row. You can use xmin as an implicit version number without adding any application-managed column:

-- Read xmin alongside your data
SELECT id, status, xmin
FROM drivers
WHERE id = 'D-7741';

-- CAS write using xmin as the version check
UPDATE drivers
SET    status = 'ASSIGNED'
WHERE  id     = 'D-7741'
AND    xmin   = '3847291'::xid;

When to use xmin vs. explicit version: xmin is convenient for ad-hoc scripts and avoids schema changes, but it has two gotchas. First, xmin values wrap around after ~2 billion transactions (the XID wraparound problem). Second, xmin changes even on rows you did not logically modify — a VACUUM that freezes a row updates xmin. Use an explicit version INTEGER column for application-level optimistic locking. Reserve xmin for diagnostic queries and tooling.


MySQL InnoDB — Explicit Versions and the ROW_COUNT() Check

MySQL InnoDB does not expose an xmin-equivalent. Every optimistic locking implementation requires an explicit version column. The CAS check works identically to PostgreSQL in SQL syntax, but the conflict detection differs slightly — you check ROW_COUNT() after the update rather than RETURNING:

-- Schema
ALTER TABLE drivers ADD COLUMN version INT NOT NULL DEFAULT 1;

-- CAS write
UPDATE drivers
SET    status  = 'ASSIGNED',
       ride_id = 'R-9901',
       version = version + 1
WHERE  id      = 'D-7741'
AND    version = 5;

-- Conflict detection
SELECT ROW_COUNT();
-- 1 = success (version was 5, now 6)
-- 0 = conflict (someone else changed the row first)

One nuance with InnoDB: MySQL acquires a row-level lock during UPDATE execution even in optimistic locking scenarios. The lock is held only for the duration of the write statement itself (typically milliseconds), not for the entire read-modify-write cycle. This means InnoDB's optimistic locking does not eliminate lock overhead completely — it eliminates the long-held lock between your read and your write.

Table-level locks (LOCK TABLES) are incompatible with the optimistic pattern. Never mix LOCK TABLES with version-column optimistic locking on the same table — the table lock will block all concurrent reads, negating the throughput advantage.


DynamoDB — ConditionExpression as Distributed CAS

DynamoDB implements CAS through ConditionExpression — a predicate evaluated atomically on the storage node alongside the write. If the condition is false, DynamoDB rejects the write with a ConditionalCheckFailedException and makes no change.

The canonical version-based pattern:

{
  "TableName": "Drivers",
  "Key": { "id": { "S": "D-7741" } },
  "UpdateExpression": "SET #st = :new_status, #v = :new_version",
  "ConditionExpression": "attribute_exists(#v) AND #v = :expected_version",
  "ExpressionAttributeNames": {
    "#st": "status",
    "#v":  "version"
  },
  "ExpressionAttributeValues": {
    ":new_status":       { "S": "ASSIGNED" },
    ":new_version":      { "N": "6" },
    ":expected_version": { "N": "5" }
  }
}

The attribute_exists(#v) AND #v = :expected_version condition guards against two failure modes: (1) a concurrent writer changed the version, and (2) the item was deleted between your read and write (the attribute_exists guard). If either condition fails, you receive ConditionalCheckFailedException and retry.

DynamoDB's Java SDK v2 DynamoDbEnhancedClient supports version management via @DynamoDbVersionAttribute on your data class. This is a configuration-level annotation — it instructs the SDK to automatically add the ConditionExpression on every putItem and updateItem call. No application write path needs to manually construct the condition.

Important performance note: Standard DynamoDB writes use last-writer-wins with eventual consistency. ConditionExpression writes are still single-shard operations — they do not incur the Paxos round-trip cost that Cassandra LWTs do (covered below). A DynamoDB conditional write is approximately the same latency as an unconditional write on the same shard: typically 1–5ms at median.


Redis — WATCH/MULTI/EXEC and Lua Script CAS

Redis has no native version columns, but it provides two mechanisms to achieve CAS semantics: the WATCH/MULTI/EXEC transaction block and Lua scripts.

WATCH/MULTI/EXEC — Optimistic transaction block:

WATCH driver:D-7741:version
-- Read current state
GET driver:D-7741:status
GET driver:D-7741:version
-- Begin transaction
MULTI
SET driver:D-7741:status "ASSIGNED"
SET driver:D-7741:version 6
EXEC
-- If EXEC returns nil: the WATCHed key changed before EXEC; transaction aborted
-- If EXEC returns array of results: success

WATCH registers a key for optimistic monitoring. When EXEC is called, Redis checks whether any WATCHed key has been modified since the WATCH command. If so, the entire MULTI/EXEC block is abortedEXEC returns nil. No partial writes occur. This is exactly the CAS semantic: compare (did the key change?) then swap (execute the block) atomically.

Lua scripts — Guaranteed atomic CAS:

-- Atomic CAS via Lua (executed as a single Redis command)
local current = redis.call('GET', KEYS[1])
if current == ARGV[1] then
    redis.call('SET', KEYS[1], ARGV[2])
    return 1
else
    return 0
end

Called as: EVAL <script> 1 driver:D-7741:version "5" "6"

WATCH/MULTI/EXEC vs. Lua script — key difference: WATCH/MULTI/EXEC is optimistic at the Redis client level — the abort check happens at EXEC time, but between WATCH and EXEC, other clients can read and modify the key. Lua scripts are stronger: Redis guarantees that no other command executes while a Lua script is running. The entire Lua script is atomic at the Redis server level. For a pure CAS operation, Lua is simpler and more reliable; use WATCH/MULTI/EXEC when you need to build a richer transaction across multiple keys.


MongoDB — findOneAndUpdate with Version Predicate

MongoDB's optimistic locking relies on findOneAndUpdate (or updateOne) with a version field in the filter document. The atomicity guarantee is at the document level — MongoDB acquires a document-level lock for the duration of the update:

// CAS update: only updates if version matches
db.drivers.findOneAndUpdate(
  { _id: "D-7741", version: 5 },                   // filter = CAS check
  {
    $set:  { status: "ASSIGNED", ride_id: "R-9901" },
    $inc:  { version: 1 }                           // atomically increment version
  },
  { returnDocument: "after" }
)
// Returns the updated document on success
// Returns null if filter matched nothing (conflict or missing document)

Conflict detection is straightforward: if findOneAndUpdate returns null, either the document does not exist or the version did not match. Use returnDocument: "after" to get the new state in a single round-trip.

For updateOne, check result.matchedCount vs. result.modifiedCount. A matchedCount = 0 means the version predicate failed (conflict). A matchedCount = 1, modifiedCount = 0 should not happen with a $inc update but would indicate no-op semantics if it did.

Mongoose __v version key: Mongoose automatically adds a __v field (version key) to every schema and increments it on save(). The Mongoose middleware performs the version check transparently. However, Mongoose's version key only protects array field modifications by default — it does not protect arbitrary field updates unless you explicitly add the version predicate to your queries. For production-grade optimistic locking in MongoDB, use an explicit version field and the findOneAndUpdate pattern above rather than relying on __v.


CockroachDB — Serializable Isolation as Built-in CAS

CockroachDB runs Serializable Snapshot Isolation (SSI) by default — the strongest isolation level. Every transaction observes a consistent snapshot and is validated at commit time. If two transactions would produce a non-serializable execution (i.e., one transaction's read set overlaps another's write set in a way that changes the outcome), CockroachDB aborts one of them with a retryable error.

This means CockroachDB provides CAS semantics for free, without explicit version columns, for any UPDATE statement:

BEGIN;
SELECT status, version FROM drivers WHERE id = 'D-7741';
-- Compute new state...
UPDATE drivers SET status = 'ASSIGNED', ride_id = 'R-9901' WHERE id = 'D-7741';
COMMIT;
-- If another transaction committed a conflicting write concurrently,
-- CockroachDB aborts this transaction with error code 40001 (serialization_failure)

CockroachDB uses SAVEPOINT cockroach_restart for its recommended client-side retry loop:

BEGIN;
SAVEPOINT cockroach_restart;

-- your read + write logic --

RELEASE SAVEPOINT cockroach_restart;
COMMIT;
-- On 40001 error: ROLLBACK TO SAVEPOINT cockroach_restart; then retry

The SAVEPOINT cockroach_restart pattern is CockroachDB-specific: it marks the beginning of the retryable unit. On a serialization failure, rolling back to the savepoint (instead of to BEGIN) allows the connection to retry without a full reconnect.

Does CockroachDB make explicit version columns redundant? For preventing lost updates at the database level — yes, SSI handles it automatically. But application-level version columns are still useful when: (1) you need to propagate the version to your API layer for client-side optimistic locking (e.g., ETags in REST APIs), or (2) you want visible, auditable version history in the row data itself.


Cassandra — Lightweight Transactions and the Paxos Cost

Cassandra's normal writes are last-writer-wins with no version checking. To achieve CAS semantics, Cassandra provides Lightweight Transactions (LWTs) using an IF clause on UPDATE and INSERT statements:

UPDATE drivers
SET    status  = 'ASSIGNED',
       ride_id = 'R-9901',
       version = 6
WHERE  id = 'D-7741'
IF     version = 5;

The response always includes an [applied] boolean column:

 [applied] | id     | version
-----------+--------+---------
 True      | D-7741 |       6    -- success: version was 5, now 6

If [applied] = False, the row was not modified and the response also returns the current values of all IF-clause columns — so you can immediately inspect the conflicting state without a separate read.

The performance cost of LWT is significant. A normal Cassandra write requires a single coordinator round-trip to a quorum of replicas — roughly 1–3ms in a well-tuned cluster. A LWT write requires a Paxos consensus round consisting of at minimum 2 phase-1 (Prepare/Promise) round-trips and 1 phase-2 (Accept/Accepted) round-trip across the replica set. In practice, this is 3–4 network round-trips instead of 1, making LWT writes approximately 4× slower than normal writes at median latency, and significantly worse at the 99th percentile under contention.

When NOT to use Cassandra LWT:

  • High-throughput counters: Use Cassandra's native COUNTER column type, which uses a convergent CRDT, not Paxos.
  • Idempotent operations: If your write is safe to apply multiple times (e.g., setting a boolean flag), prefer idempotent writes with application-level retry.
  • Fan-out updates touching many rows: Each LWT row is a separate Paxos round. At scale, this serializes throughput badly.
  • Time-series ingestion: Never use LWT on time-series tables. Use WRITETIME and TTL with monotonic partition keys instead.

Use LWT in Cassandra only when the operation is truly non-idempotent and exactly-once semantics are required at the database level — for example, a "create user account if username not taken" operation.


Performance Analysis: Latency and Throughput Cost Across Database CAS Implementations

Not all CAS implementations are equally expensive. The table below summarises the latency and throughput characteristics of each database's mechanism under typical production conditions.

DatabaseCAS MechanismTypical Write LatencyOverhead vs. Normal WriteNotes
PostgreSQLWHERE version=? predicate1–5ms~0% overheadPredicate evaluated inline with row lock; no extra round-trips
MySQL InnoDBWHERE version=? + ROW_COUNT()1–5ms~0% overheadRow-level lock for write duration only; no extra phases
DynamoDBConditionExpression1–5ms~0% overheadCondition evaluated on storage node atomically; same latency as unconditional write
RedisWATCH/MULTI/EXEC<1msMinimal (~1 extra RTT for WATCH)WATCH adds one round-trip; EXEC is atomic at server
Redis LuaLua script<1ms~0% overheadFully atomic; no client-side retry needed
MongoDBfindOneAndUpdate with filter2–8ms~0% overheadDocument-level lock for write; no extra phases
CockroachDBSSI abort + retry2–15ms10–30% at P99 under contentionSSI validation at commit adds latency; retry loop on conflict
CassandraLWT (IF clause)15–50ms~4× normal writePaxos: 2 Prepare/Promise + 1 Accept/Accepted round-trips minimum

The Cassandra LWT row is the most important takeaway: at 4× normal write latency and minimum 2 Paxos phase-1 round-trips, LWT at high request rates can saturate cluster coordinator threads and cause cascading latency degradation. CockroachDB's 10–30% P99 overhead is a reasonable price for automatic SSI — it only manifests under write contention, not under normal load.


🧪 Retry Strategy Patterns and Their Production Trade-offs

A CAS failure is not an error — it is a signal to retry. But how you retry determines whether your system recovers gracefully or degrades into a retry storm.

The decision flowchart below shows the complete retry logic for a production CAS write. Each branch represents a specific strategy with specific failure modes:

graph TD
    A[Attempt CAS write] --> B{Result?}
    B -->|Success| C[Done - proceed]
    B -->|Conflict - version mismatch| D[Increment retry counter]
    B -->|Hard error - network timeout| E[Classify error]
    D --> F{Retry count under max?}
    F -->|Yes| G[Re-read fresh state]
    F -->|No| H[Abort - return error to caller]
    G --> I[Recompute with fresh data]
    I --> J[Wait - exponential backoff with jitter]
    J --> A
    E --> K{Is error retryable?}
    K -->|Yes - transient| J
    K -->|No - permanent| H

This flowchart illustrates the four critical decisions in a retry loop: (1) always re-read before recomputing, (2) apply exponential backoff with jitter, (3) enforce a maximum retry limit, and (4) distinguish between conflict-retries (version mismatch) and hard-error-retries (network timeout). Mixing these two categories without distinction is a common production bug.

The four retry strategies ranked by correctness:

StrategyMechanismWhen to useRisk
Immediate retryLoop with no delayAlmost neverThundering herd — all retriers hit the DB simultaneously
Exponential backoffdelay = base * 2^attempt (e.g., 100ms, 200ms, 400ms)Moderate contentionSynchronized retry waves if all clients start at the same time
Exponential backoff + jitterdelay = random(0, base * 2^attempt)Most production scenariosSlight complexity; highly recommended by AWS and Google SRE
Max retry + circuit breakerStop retrying after N attempts; open circuit if failure rate exceeds thresholdHigh-contention scenariosRequires circuit breaker library; prevents runaway retry storms

Idempotency keys make retries safe in distributed systems. Before the first attempt, generate a unique idempotency_key (a UUID). Include it in your write. If the write is interrupted mid-flight (network timeout), retry with the same key. The server can detect and deduplicate the retry without applying the operation twice. Stripe uses this pattern on every Payment Intent write.


🧭 When Optimistic Locking Fails — The High-Contention Decision Guide

Optimistic locking excels when conflicts are rare. But some workloads are inherently high-contention, and optimistic locking degrades badly under them.

Flash sale scenario: 50,000 users simultaneously attempt to decrement a product's inventory count from 1 to 0. Only one write can succeed. The other 49,999 all read version=100, compute new_quantity=0, and simultaneously attempt UPDATE ... WHERE version=100. One succeeds. 49,999 get rows_affected=0, re-read version=101, compute new_quantity=-1 (oops — needs a quantity > 0 guard), and retry. The retry wave generates 49,999 reads, 49,999 writes, and another 49,998 failures in rapid succession. This is a retry storm.

Decision guide — when to switch away from optimistic locking:

WorkloadConflict rateRecommended strategy
User profile updates< 1%Optimistic locking (version column)
Product catalog reads with rare admin edits< 1%Optimistic locking
Shopping cart line-item updates1–5%Optimistic locking with jitter retry
Seat reservation / ticket booking5–30%Pessimistic locking (SELECT FOR UPDATE) or queue-based serialization
Flash sale inventory decrement> 50%Queue-based serialization (one writer at a time via Kafka/SQS) or Redis atomic DECR with floor check
Financial ledger debit/creditVariablePessimistic locking with deadlock timeout, or append-only ledger pattern

The queue-based approach deserializes the problem: instead of 50,000 concurrent CAS attempts, a single consumer processes inventory decrements from a queue, one at a time. No conflicts, no retries, predictable latency. The cost is additional infrastructure and slightly higher write latency — usually an acceptable trade for correctness guarantees.


📊 The Complete CAS Write Flow — From Client Request to Committed State

The diagram below shows the complete request lifecycle for an optimistic write, from the client HTTP call through the service layer to the database, with explicit success and retry paths.

graph TD
    A[Client - HTTP PATCH /drivers/D-7741] --> B[Service Layer]
    B --> C[Read driver record and version]
    C --> D[Database Read Replica or Primary]
    D --> E[Return record with version=N]
    E --> F[Compute new state locally]
    F --> G[Attempt CAS write - UPDATE WHERE version=N]
    G --> H{Database CAS result}
    H -->|rows_affected=1 - success| I[Return 200 OK with new version]
    H -->|rows_affected=0 - conflict| J[Re-read fresh state]
    J --> K{Retry count under limit?}
    K -->|Yes| L[Apply backoff and jitter]
    L --> F
    K -->|No| M[Return 409 Conflict to client]
    I --> N[Emit audit event and downstream notifications]

This architecture shows three important design decisions. First, the service layer performs all computation locally — the database only executes the CAS predicate, not business logic. Second, the retry loop re-reads from the database before recomputing, ensuring each attempt operates on committed state. Third, exhausted retries surface as a 409 Conflict HTTP response — the client, not the server, decides whether to retry at the application level (e.g., ask the user to refresh and resubmit).


🌍 How Stripe, GitHub, and Amazon Use CAS in Production

Stripe uses version columns on PaymentIntent records. Every state transition (created → processing → succeeded → captured) is guarded by a CAS write against the current state value. If two webhook events try to advance the same intent simultaneously, only one succeeds. Stripe's idempotency key mechanism ensures safe retry of the losing event.

GitHub uses optimistic locking on repository metadata — fork counts, star counts, default branch pointers. The repos table carries a lock_version column (a Rails convention). Repository renames and visibility changes use CAS writes to prevent concurrent admin operations from corrupting the repository's canonical state.

Amazon DynamoDB's own documentation uses a shopping cart as the canonical optimistic locking example. Each cart item carries a version attribute. When a mobile client and a web client both load the cart and make independent additions, the ConditionExpression on the write ensures only the first write succeeds; the second receives ConditionalCheckFailedException and re-fetches the merged cart.

Atlassian Jira uses an OPTLOCK version column on issue records. Concurrent edits to the same issue (e.g., two users updating the description and the assignee simultaneously in the UI) are resolved through CAS writes on the Jira issue entity. The losing writer receives an "edit conflict" dialog prompting a manual merge — a user-visible manifestation of the CAS failure.


🛠️ OSS Config Reference — DDL and Query Syntax Across Databases

This section provides copy-ready DDL, query, and command syntax for each database's CAS implementation. These are configuration and query patterns — not application code.

PostgreSQL — Version Column DDL and UPDATE Pattern

-- DDL: add version column to existing table
ALTER TABLE drivers
  ADD COLUMN version INTEGER NOT NULL DEFAULT 1;

-- Trigger: auto-increment version on every UPDATE (optional but recommended)
CREATE OR REPLACE FUNCTION increment_version()
RETURNS TRIGGER AS $$
BEGIN
  NEW.version = OLD.version + 1;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_drivers_version
BEFORE UPDATE ON drivers
FOR EACH ROW EXECUTE FUNCTION increment_version();

-- CAS read
SELECT id, status, billing_cycle, version
FROM   drivers
WHERE  id = 'D-7741';

-- CAS write (explicit version management)
UPDATE drivers
SET    status       = 'ASSIGNED',
       ride_id      = 'R-9901',
       version      = version + 1
WHERE  id           = 'D-7741'
AND    version      = 5
RETURNING id, version;
-- No rows returned = conflict (retry required)
-- Row returned = success (new version in result)

DynamoDB — ConditionExpression JSON Pattern

{
  "TableName": "Drivers",
  "Key": {
    "id": { "S": "D-7741" }
  },
  "UpdateExpression": "SET #st = :new_status, #v = :new_version, #rid = :ride_id",
  "ConditionExpression": "attribute_exists(#v) AND #v = :expected_version",
  "ExpressionAttributeNames": {
    "#st":  "status",
    "#v":   "version",
    "#rid": "ride_id"
  },
  "ExpressionAttributeValues": {
    ":new_status":       { "S": "ASSIGNED" },
    ":new_version":      { "N": "6" },
    ":ride_id":          { "S": "R-9901" },
    ":expected_version": { "N": "5" }
  },
  "ReturnValues": "ALL_NEW"
}

On ConditionalCheckFailedException: re-read the item using GetItem, inspect the current version, recompute, and retry with the new expected_version value.

Redis — WATCH/MULTI/EXEC Command Sequence

# Step 1: Watch the version key
WATCH driver:D-7741:version

# Step 2: Read current state
GET driver:D-7741:version
GET driver:D-7741:status

# Step 3: Begin optimistic transaction
MULTI
SET driver:D-7741:status "ASSIGNED"
SET driver:D-7741:ride_id "R-9901"
SET driver:D-7741:version "6"
EXEC

# EXEC returns:
# - Array of results if no watched key changed (success)
# - nil if any watched key changed (abort; retry from WATCH)

Cassandra — LWT IF Clause and Response Parsing

-- CAS write with LWT
UPDATE drivers
SET    status  = 'ASSIGNED',
       ride_id = 'R-9901',
       version = 6
WHERE  id = 'D-7741'
IF     version = 5;

-- Response columns:
-- [applied] = true  → success
-- [applied] = false → conflict; Cassandra also returns current column values

-- INSERT with uniqueness guard (create-if-not-exists)
INSERT INTO usernames (username, user_id)
VALUES ('alice', 'U-001')
IF NOT EXISTS;

-- [applied] = false → username already taken; response includes existing user_id

📚 Lessons Learned — What Production Systems Teach You About CAS

  1. Always re-read before recomputing on retry. A retry that reuses stale read data is not a retry — it is a repeated mistake. The whole point of detecting a conflict is that the world changed; your computation must incorporate those changes.

  2. The ABA problem is real in database systems. A version number that wraps around (e.g., 32-bit counter with frequent updates) or a row that is deleted and re-inserted with the same primary key can produce ABA. Always use monotonically increasing, non-reusable version numbers — BIGINT, not INT, in high-write tables.

  3. Cassandra LWT is a last resort, not a default. The 4× latency penalty compounds under load. Design Cassandra schemas to be idempotent and use LWT only where exactly-once semantics are truly non-negotiable. In most cases, idempotent writes with application-level deduplication are cheaper and more scalable.

  4. CockroachDB's SSI is not free. Serializable isolation has a latency cost at commit time — the SSI validation scan. Under high write contention, the abort-and-retry rate can approach that of explicit optimistic locking. Use explicit version columns in CockroachDB when you want application-visible conflict detection rather than opaque 40001 errors.

  5. Optimistic locking and pessimistic locking are not mutually exclusive. You can use SELECT FOR UPDATE for the high-contention "hot path" (e.g., the single inventory row for a flash sale SKU) while using version-column optimistic locking for the low-contention "cold path" (e.g., product catalog metadata). Profile your workload and segment your locking strategy.

  6. HTTP ETags are optimistic locking at the API layer. When a REST API returns an ETag header containing the resource version, and requires clients to send If-Match: <etag> on updates, it is implementing CAS between the client and the server. The server's version column is the authoritative version; the ETag is the client's read receipt. Losing the If-Match check on your REST endpoint exposes the same lost update vulnerability as losing the AND version=? clause on your SQL update.

  7. Idempotency keys make CAS safe to retry across network boundaries. A CAS write that times out mid-flight may or may not have committed. Without an idempotency key, retrying risks applying the operation twice. With an idempotency key, the server can detect and deduplicate the retry atomically.


📌 Summary & Key Takeaways

Compare-and-Swap is the hardware foundation of all lock-free concurrency. The CMPXCHG instruction on x86 CPUs provides an atomic compare-and-swap in a single clock cycle; every database optimistic locking primitive is this instruction expressed at a higher layer of abstraction.

Key takeaways for production engineers:

  • CAS semantics are universal: PostgreSQL WHERE version=?, DynamoDB ConditionExpression, Redis WATCH/EXEC, Cassandra IF version=, CockroachDB SSI abort — all are the same CAS primitive at different layers.
  • Retry correctly: Re-read → recompute → retry with jitter. Never reuse stale reads. Always cap retry count and emit metrics on conflict rate.
  • The ABA problem requires monotonic version counters: Use BIGINT, increment only, never reset. This eliminates the false-success case.
  • Match your locking strategy to your contention profile: Optimistic locking wins below ~5% conflict rate. Pessimistic locking or queue-based serialization wins above ~20%.
  • Cassandra LWT is ~4× slower than normal writes due to the Paxos round-trip (2 phase-1 + 1 phase-2 = minimum 3 network round-trips). Use it sparingly.
  • CockroachDB gives you SSI for free, but application-visible version columns are still valuable for API-level conflict surfacing and auditing.
  • ETags are CAS at the HTTP layer. Always validate If-Match on mutating REST endpoints.

📝 Practice Quiz

  1. What is the three-argument signature of the CAS primitive, and what does each argument mean?

    Correct Answer: CAS(memory_location, expected_value, new_value)memory_location is the address to check and modify; expected_value is the value you expect to find there; new_value is the value to write if the comparison succeeds.

  2. Which x86-64 CPU instruction implements Compare-and-Swap, and what is the ARM equivalent?

    Correct Answer: CMPXCHG on x86-64; LDREX/STREX (Load-Reserved/Store-Conditional) on ARM.

  3. Describe the ABA problem in one sentence and give one database scenario where it can occur.

    Correct Answer: The ABA problem occurs when a value changes from A to B and back to A between a thread's read and its CAS attempt, making the CAS falsely succeed. A database scenario: a version column with a 32-bit INT that overflows and resets, allowing an old version number to re-appear on a different logical record state.

  4. In the version column pattern, what does a rows_affected = 0 result from an UPDATE ... WHERE version = ? indicate, and what must the application do next?

    Correct Answer: It indicates a conflict — another transaction modified the row and incremented the version since the current transaction read it. The application must re-read the row to get the latest version and committed state, recompute its changes on that fresh state, and retry the CAS write.

  5. PostgreSQL offers xmin as an implicit version. Name two scenarios where using xmin for optimistic locking is not recommended.

    Correct Answer: (1) When xmin values may wrap around due to XID wraparound after ~2 billion transactions. (2) When VACUUM operations update xmin on rows you did not logically modify, causing spurious conflict false-positives.

  6. What is the approximate performance penalty for a Cassandra Lightweight Transaction (LWT) compared to a normal Cassandra write, and what protocol causes it?

    Correct Answer: LWT writes are approximately 4× slower than normal writes, because each LWT requires a Paxos consensus round — a minimum of 2 phase-1 (Prepare/Promise) round-trips and 1 phase-2 (Accept/Accepted) round-trip across the replica set, versus a single round-trip quorum write for a normal write.

  7. What is the key difference between Redis WATCH/MULTI/EXEC and a Redis Lua script for implementing CAS?

    Correct Answer: WATCH/MULTI/EXEC is optimistic at the client level — other clients can modify the watched key between WATCH and EXEC; the abort is detected at EXEC time. A Lua script is atomic at the server level — Redis guarantees no other command executes while the script runs, making it a true atomic CAS with no interleaved window.

  8. CockroachDB uses Serializable Snapshot Isolation by default. Does this make explicit version columns unnecessary? Explain.

    Correct Answer: SSI prevents lost updates at the database level — any concurrent write that would produce a non-serializable outcome is aborted with error code 40001. Explicit version columns are not necessary for database-level correctness. However, they remain useful for: (1) surfacing the version in API responses (e.g., ETags), (2) providing auditable version history in the row, and (3) giving the application a human-readable conflict signal rather than a raw serialization error.

  9. In DynamoDB, what does attribute_exists(#v) AND #v = :expected_version in a ConditionExpression protect against, compared to just #v = :expected_version alone?

    Correct Answer: The attribute_exists(#v) guard protects against the case where the item was deleted between the read and the write — if the item is gone, #v = :expected_version alone would fail with ConditionalCheckFailedException for both "item deleted" and "version mismatch" reasons. With attribute_exists, the caller can distinguish between "the item no longer exists" and "the item exists but was modified".

  10. (Open-ended) Your team is building a flight seat reservation system. Seats can be reserved by millions of concurrent users during peak booking windows. A seat can transition from AVAILABLEHELDCONFIRMED or AVAILABLEHELDRELEASED. Describe the concurrency strategy you would choose for the HELD state transition, justify your choice, and explain what would happen if you chose the opposite strategy instead.



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms