Distributed Transactions: 2PC, Saga, and XA Explained

How to maintain data consistency across microservices when ACID transactions don't span databases.

System Design Interview Prep

Abstract Algorithms

·Mar 28, 2026·25 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 25 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Distributed transactions require you to choose a consistency model before choosing a protocol. 2PC and XA give atomic all-or-nothing commits but block all participants on coordinator failure. Saga gives eventual consistency with explicit compensation and survives coordinator crashes — at the cost of designing compensating transactions for every step. The Outbox pattern and idempotency are mandatory, not optional: they are what make Saga production-safe.

TLDR: Choose 2PC/XA when atomicity is non-negotiable and participants are co-managed; choose Saga when services are independent and eventual consistency is acceptable. Always pair Saga with the Outbox pattern and idempotency keys.

📖 The Charge-Without-Order Incident: Why ACID Transactions Don't Cross Service Boundaries

In 2018, a major e-commerce platform discovered that customers had been charged for orders that were never created. The payment service completed its database write successfully. The order service received the request and started writing — but the process crashed before it committed. From the payment service's perspective, everything succeeded. From the customer's perspective, money was gone and no order existed.

In a monolith, this scenario cannot happen. You open a single database connection, write both rows inside the same BEGIN / COMMIT block, and the database's ACID guarantees enforce atomicity: either both rows land, or neither does. Rollback is automatic on failure.

In a microservices architecture, you cannot do this. The payment database and the order database are physically separate servers. They may be different vendors — PostgreSQL for orders, MySQL for payments, Cassandra for notifications. There is no single transaction coordinator that holds locks across all of them. When you make two separate service calls, you have two separate transactions. If the second call fails after the first has committed, you have a partial write — and no built-in mechanism to undo it.

This is the distributed transaction problem. Before choosing a solution, you must answer three questions that eliminate options:

Design question	Why the answer eliminates options
Can all participants be online simultaneously?	2PC requires every participant to respond synchronously — one timeout aborts the entire operation
Is the operation long-running (seconds to minutes)?	Holding database locks across network calls causes severe contention at scale
Can the business tolerate a short window of inconsistency?	If yes, Saga is viable; if no, 2PC or XA is the only choice

⚙️ Two-Phase Commit: How a Blocking Protocol Achieves Distributed Atomicity

Two-Phase Commit uses a coordinator — one service or middleware node — that orchestrates a global commit across multiple participants (individual databases or services). The protocol unfolds in two strict phases.

Phase 1 — Prepare: The coordinator sends a PREPARE message to every participant. Each participant writes its data to a durable write-ahead log, acquires all needed locks, and votes. A VOTE_COMMIT means "I can guarantee a successful commit if you tell me to." A VOTE_ABORT means "I cannot proceed."

Phase 2 — Commit or Abort: If every participant voted VOTE_COMMIT, the coordinator writes a commit record to its own durable log and broadcasts COMMIT. If any participant voted VOTE_ABORT, the coordinator broadcasts ABORT and all participants roll back.

The critical constraint: between voting VOTE_COMMIT in Phase 1 and receiving the coordinator's final decision in Phase 2, each participant holds all its acquired locks. If the coordinator crashes in that window, participants cannot proceed — they cannot commit unilaterally (another participant may have aborted) and cannot abort unilaterally (the coordinator may have written a commit). This is the uncertainty window, the fundamental flaw of 2PC.

2PC failure scenario	What happens to participants
Coordinator crashes before sending PREPARE	Transaction never starts; all participants are unaffected
Coordinator crashes after collecting all votes, before COMMIT	All prepared participants hold locks indefinitely — blocking protocol
One participant crashes before voting	Coordinator aborts after timeout; all others roll back cleanly
One participant crashes after voting COMMIT	Coordinator waits for ACK; participant recovers from WAL and re-applies
Network partition between coordinator and one participant	That participant is stuck prepared; coordinator's broadcast determines others

When 2PC is the right call:

Two to three databases, same vendor, strong consistency is a non-negotiable requirement.
Operation duration is milliseconds — never seconds.
The coordinator is highly available (ZooKeeper, etcd, or database-native distributed coordinator).
All participants speak the 2PC protocol (rules out most NoSQL stores).

⚙️ XA Transactions: The Database Industry Standard for 2PC

XA is the X/Open distributed transaction standard that specifies how a transaction manager coordinates multiple resource managers (databases) using the 2PC protocol. In Java, the Java Transaction API (JTA) is the primary implementation. Most relational databases — PostgreSQL, MySQL, Oracle, IBM DB2 — support XA.

In a JTA setup, the application server acts as the transaction manager, automatically detecting and enlisting each XADataSource connection opened within a UserTransaction.begin() / commit() block. Application code calls utx.begin(), performs normal JDBC operations against both the payment and order datasources, and calls utx.commit(). The JTA runtime translates this into a full two-phase commit: it calls xa.prepare() on every enlisted datasource in sequence; if all prepare calls succeed, it calls xa.commit() on each; if any prepare fails, it calls xa.rollback() on all enlisted resources. The application code never directly coordinates the two-phase protocol — JTA handles it transparently, but the coordinator and lock-duration constraints of 2PC apply in full.

This looks elegant at small scale, but XA carries critical limitations for microservices:

Database vendor lock-in. All datasources must implement the XA protocol. Redis, DynamoDB, Cassandra, and most NoSQL stores do not.
Not cross-process. XA only works when the transaction manager can directly enlist XADataSource connections. It cannot coordinate an HTTP call to a remote service or a Kafka publish.
Performance at scale. XA prepare() is synchronous and holds row-level locks for the full duration of the global transaction. At high write throughput, this creates severe lock contention.
Connection pool friction. HikariCP — the dominant Java connection pool — does not support XA connections in its default configuration. You must switch to a JTA-aware pool (Agroal, c3p0 XA), reintroducing complexity.

XA is a viable choice for Java EE monoliths with multiple datasources under a single DBA team's control. It is not a distributed transaction solution for independently-owned polyglot microservices.

🧠 Deep Dive: The Saga Pattern — Long-Running Transactions Without Global Locks

A Saga decomposes a distributed transaction into a sequence of local transactions, each fully committed by exactly one service. If a later step fails, the saga triggers compensating transactions in reverse order — new, forward-moving database writes that undo the business effect of prior steps.

A saga never holds locks across services. Each local transaction commits and releases its locks immediately. Consistency is eventual — the system may be momentarily inconsistent between steps — but well-designed compensation guarantees convergence to a coherent final state.

There are two coordination styles:

Saga style	Coordinator	Failure detection	Best for
Orchestration	Central saga class sends commands	Orchestrator tracks step state explicitly	Complex conditional workflows, multi-branch logic, compliance auditability
Choreography	None — workflow is emergent from events	Each service reacts to its own failure events	High-throughput pipelines, independent team ownership, loose coupling

The Saga Internals: Orchestration, Choreography, and Compensation Mechanics

In an Orchestration Saga, a central Saga Orchestrator holds the workflow definition. It sends a command to the first participant, waits for a success or failure event, and then decides the next command. On failure, it issues compensating commands in strict reverse order.

In a Choreography Saga, there is no central brain. Each service listens to domain events from a message broker and reacts: it performs its local transaction and emits the next event. A PaymentFailedEvent causes the Inventory Service to react with a ReleaseInventoryCommand — no orchestrator needed. The workflow emerges from the event chain.

Compensation ordering is critical. Compensating transactions must reverse steps in the exact reverse sequence they were applied. If Payment fails after Inventory was reserved, the correct compensation order is:

Step 3 — CreateShipment: did not execute; no compensation needed.
Step 2 — AuthorizePayment: failed; issue ReleaseInventoryCommand first.
Step 1 — ReserveInventory: was successful; issue CancelOrder only after inventory is released.

Reversing this order (cancelling the order before releasing inventory) leaves the system readable as "cancelled order with unreleased stock" during the compensation window — a different inconsistency introduced by the cleanup itself.

The compensability constraint. Every saga step must have a pre-designed compensating transaction. If a step's effect is physically irreversible — triggering a warehouse pick, sending an SMS — move it to the last position so it only fires after all prior steps confirm. Alternatively, design a semantic undo: "send cancellation confirmation" as the compensating action for "send order confirmation."

Performance Analysis: Lock Duration, Throughput, and Latency Characteristics

The performance profile of each approach is determined by lock duration:

Protocol	Lock scope	Lock duration	Impact at 1000 TPS
2PC / XA	All participants simultaneously	Full duration of coordinator round-trip	Lock contention across services; throughput degrades non-linearly
Saga	One service at a time	Duration of that service's local transaction only	Each step's lock is released before the next step begins
Saga (failed path)	Compensating steps only	Duration of each compensating transaction	Compensation runs serially in reverse; adds latency proportional to steps

At 1,000 writes per second across three services, 2PC holds locks on all three databases for the entire coordinator round-trip (minimum 3 network hops: prepare × 3 + commit × 3). Saga holds locks on only one database at a time, each for the duration of a single local transaction. The throughput difference at scale is significant.

The trade-off: Saga's reduced locking comes at the cost of temporary inconsistency between steps. Design your system to surface intermediate saga states (e.g., PAYMENT_PENDING, INVENTORY_RESERVED) as meaningful business states rather than implementation artifacts.

Mathematical Model: The Theoretical Bound on Distributed Consistency

The FLP Impossibility result (Fischer, Lynch, Paterson, 1985) proves that in a fully asynchronous distributed system, no consensus algorithm can guarantee both safety (all participants agree on the same value) and liveness (the algorithm always terminates) in the presence of even a single process failure.

2PC violates liveness: when the coordinator fails in the uncertainty window, participants may wait indefinitely. This is not a bug in 2PC — it is provably unavoidable for any protocol that attempts strong consistency across distributed participants.

The CAP Theorem formalizes the consequence: you cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance. 2PC and XA choose Consistency + Partition Tolerance, sacrificing availability on coordinator failure. Saga chooses Availability + Partition Tolerance, sacrificing immediate consistency in favor of eventual convergence through compensation.

In practice, the choice between 2PC and Saga is not an engineering preference — it is a direct consequence of your system's position on the CAP triangle, which in turn is a business decision about which failure mode is tolerable.

📊 Visualizing Distributed Transaction Flows

The Two-Phase Commit Protocol in Sequence

sequenceDiagram
    participant C as Coordinator
    participant P1 as Payment DB
    participant P2 as Order DB

    C->>P1: PREPARE
    C->>P2: PREPARE
    P1-->>C: VOTE_COMMIT
    P2-->>C: VOTE_COMMIT
    Note over C: All votes = COMMIT  write commit record to WAL
    C->>P1: COMMIT
    C->>P2: COMMIT
    P1-->>C: ACK
    P2-->>C: ACK
    Note over P1,P2: Both participants commit and release locks

This sequence diagram shows the nominal path: both participants vote yes, the coordinator writes its commit decision to its own WAL (making the decision durable), then broadcasts COMMIT to all participants. The uncertainty window is the gap between the coordinator writing its commit record and each participant receiving the COMMIT message — if the coordinator crashes inside this window, participants are stuck with locks held and no safe unilateral action available.

The Saga Compensation Flow: Happy Path and Failure Recovery

flowchart TD
    A[ Order Placed] --> B[Reserve Inventory local tx  commits immediately]
    B -->|" InventoryReservedEvent"| C[Authorize Payment local tx  commits immediately]
    C -->|" PaymentAuthorizedEvent"| D[Create Shipment local tx  commits immediately]
    D --> E[ Order Confirmed  Saga Complete]

    B -->|" OutOfStockEvent"| F[Cancel Order no prior steps to compensate]
    F --> I[ Saga Ended  Insufficient Stock]

    C -->|" PaymentDeclinedEvent"| G[Release Inventory  compensating tx  reverse Step 1]
    G --> H[Cancel Order  compensating tx  close saga]
    H --> J[ Saga Compensated  Payment Declined]

This flowchart shows both the happy path (all three local transactions succeed, saga ends cleanly) and the failure path (payment fails after inventory was already reserved). The compensation chain runs in strict reverse: ReleaseInventory fires before CancelOrder because the inventory reservation was the earlier step — releasing it first restores the earlier invariant before the order record is cancelled. The key insight is that compensation is not a rollback — it is forward-moving business logic that undoes a prior business effect.

🌍 Real-World Applications: How Major Systems Handle Distributed Transaction Consistency

Uber's Trip Lifecycle — Choreography Saga: Uber's core trip creation uses a choreography-based saga across Matching, Trip, Payment, and Driver Pay services. Each service emits domain events rather than calling upstream services directly. When a payment pre-authorization fails, the Trip service receives a PaymentFailedEvent and compensates by releasing the matched driver and cancelling the trip. There is no central orchestrator — the workflow emerges from event reactions. This approach allows Uber to update any individual service independently without modifying a central workflow definition.

Amazon's Order Fulfillment — Orchestration Saga: Amazon's order pipeline spans inventory reservation, payment authorization, warehouse pick instruction, and shipment label generation. Because the fulfillment workflow has conditional branching (multiple fulfilment centers, split shipments, pre-order holds), Amazon uses an orchestration model where a central Workflow Engine (similar to AWS Step Functions) tracks saga state and issues compensation commands explicitly. The branching logic and per-step retry policies live in one inspectable place.

Financial Services — 2PC with XA: Core banking systems — wire transfers between accounts in the same bank — use XA transactions via JTA. Both the debit account and credit account are PostgreSQL datasources under the same DBA team's control. The amounts are small, the operations are milliseconds, and the strong consistency guarantee is a regulatory requirement. XA is appropriate here precisely because the constraints 2PC demands (same-vendor, co-managed, tight latency) are all satisfied.

The Booking.com Flight + Hotel Bundle — Two-step Saga with hold/confirm: Booking.com's package booking uses a "hold-then-confirm" saga. Flight seats and hotel rooms are first held (reserved but not confirmed) via a local transaction in each vendor's system. Holds have a TTL. Only after both holds succeed does the saga send confirm commands. If a confirmation fails, the hold is allowed to expire — a time-bounded semantic compensation rather than an explicit undo call. This is a practical variant of compensation design: not all compensations require an active rollback call.

🏗️ The Outbox Pattern: Atomically Publishing Events Alongside Database Writes

The Saga pattern immediately creates a dual-write problem: a service must both commit its local transaction to its own database and publish an event to a message broker (Kafka, RabbitMQ) so downstream saga participants can proceed. These are two separate writes. If the service crashes after the database commit but before the broker publish completes, the event is silently lost and the saga stalls permanently.

The Outbox pattern solves this by reducing the dual-write to a single atomic database transaction:

The outbox table lives in the same database as the service's business data. Each row captures everything the poller needs to forward the event: aggregate_type and aggregate_id identify which business entity changed, event_type names the domain event, and payload carries the event body as a JSON object. The published_at column acts as the pending/published flag — a null value means the event is waiting to be forwarded; a non-null timestamp means the poller has successfully delivered it to the broker. A CDC connector such as Debezium can tail the database write-ahead log to detect new outbox rows without polling, forwarding each event with low latency while keeping the publication step entirely outside the business transaction.

The service writes its business data and its outbox event record in the same database transaction. A separate outbox poller (or a Change Data Capture connector like Debezium reading the PostgreSQL WAL) reads unpublished rows, publishes them to the broker, and marks them published.

Within a single database transaction, the service executes its business write — decrementing stock — and immediately writes an outbox event record. Both writes share the same transaction boundary: if the transaction commits, both the stock decrement and the outbox record land atomically; if the transaction rolls back for any reason, neither write survives. A separate outbox poller process reads rows with a null published_at timestamp, forwards each event to Kafka, and marks the row as published. Because the poller runs outside the business transaction and retries on failure, delivery is guaranteed at-least-once — the same event may be published more than once if the poller crashes between forwarding and marking, which is precisely why every downstream saga participant must handle duplicate messages idempotently.

Because the outbox poller publishes events asynchronously after the database commit, delivery is at-least-once — the same event may be published more than once on poller retry or during a broker redelivery. This is expected and correct behavior, not a bug — but it means every downstream saga participant must implement idempotency.

🏗️ Idempotency: Making Saga Retries and At-Least-Once Delivery Safe

Saga orchestrators retry failed steps. Message brokers redeliver messages on consumer failure. Outbox pollers may publish the same event twice during a polling window overlap. Without idempotency, a retried AuthorizePaymentCommand charges the customer twice — a real production failure at every company that skipped this step.

Every saga participant must be idempotent: processing the same request multiple times must produce exactly the same observable result as processing it once. The standard implementation is the idempotency key pattern:

Every saga participant checks an idempotency store before executing any business logic. The idempotency key — generated by the orchestrator and stable across all retries of that specific step — is looked up first; if a prior result exists, the participant returns it immediately without re-executing the operation. If no prior result exists, the participant executes the business logic and stores the result in the idempotency store within the same database transaction as the business write. This transactional coupling is critical: if the business write rolls back, the idempotency record must also roll back, ensuring the next retry correctly re-executes rather than finding a phantom "already completed" record from a failed attempt.

Idempotency key design rule	Why it matters
Generated by the caller, not the recipient	The orchestrator controls the key — recipients must never generate keys, or two retries generate two keys
Stable across all retries of the same step	Use a deterministic composite key: `sagaId + ":" + stepName`
TTL long enough to cover the retry window	After the window expires, the record can be garbage-collected
Stored in the same transaction as the business write	If the business write rolls back, the idempotency record must also roll back — no orphaned records

🧪 Practical Walkthrough: Tracing a Failed Checkout Saga Step by Step

The following trace walks through an Orchestration Saga for a checkout flow using the Outbox pattern for event publishing. Payment is declined after inventory was reserved.

Step 1 — Order Placed

OrderService receives HTTP POST, writes Order(id=X, status=PENDING) to its DB.
In the same transaction, writes OutboxEvent(InventoryReserveRequested, orderId=X) to its outbox table.
Transaction commits. Outbox poller picks up the event and publishes to Kafka topic saga.commands.

Step 2 — Inventory Service Receives Command

InventoryService consumes ReserveInventoryCommand(orderId=X, quantity=3).
Checks idempotency store — not seen before.
Writes Stock(-3) and OutboxEvent(InventoryReservedEvent, orderId=X) in one transaction. Commits.
Poller publishes InventoryReservedEvent to saga.events.

Step 3 — Orchestrator Advances Saga

OrderFulfillmentSaga receives InventoryReservedEvent.
Updates saga state to PAYMENT_PENDING in the saga store.
Publishes AuthorizePaymentCommand(orderId=X, amount=59.99) to saga.commands.

Step 4 — Payment Fails

PaymentService consumes AuthorizePaymentCommand. Card declines.
Writes OutboxEvent(PaymentDeclinedEvent, orderId=X) to its outbox. Commits.
Poller publishes PaymentDeclinedEvent to saga.events.

Step 5 — Orchestrator Triggers Compensation (reverse order)

OrderFulfillmentSaga receives PaymentDeclinedEvent.
Updates saga state to COMPENSATING.
First compensation: sends ReleaseInventoryCommand(orderId=X) → InventoryService restores stock.
Second compensation: sends CancelOrderCommand(orderId=X, reason=PAYMENT_DECLINED) → OrderService marks order CANCELLED.
Saga state moves to CANCELLED. Orchestrator marks saga ended.

Why compensation order matters here: If CancelOrderCommand fired first, the Order record would enter CANCELLED state while the Inventory record still shows RESERVED. Any concurrent read of order state plus inventory state during that gap would see a cancelled order that still has stock reserved — the inverse inconsistency of the original failure. Releasing inventory first guarantees that the system moves from {order=PENDING, stock=RESERVED} → {order=PENDING, stock=AVAILABLE} → {order=CANCELLED, stock=AVAILABLE}, with each intermediate state being coherent.

⚖️ Trade-offs and Failure Modes: 2PC, XA, Saga Orchestration, and Saga Choreography Compared

Dimension	2PC	XA (JTA)	Saga — Orchestration	Saga — Choreography
Consistency model	Strong (atomic)	Strong (atomic)	Eventual	Eventual
Coordinator	External coordinator node	JTA transaction manager	Saga orchestrator class	None — emergent
Availability on coordinator failure	Blocked — uncertainty window	Blocked — JTA manager is SPOF	Degraded but recoverable via saga store	Fully decoupled
Lock duration	Held across all participants for full 2PC duration	Held across all enrolled datasources	Per-step local tx only	Per-step local tx only
Participant requirements	Must implement 2PC protocol	Must use XA-capable datasource	Must handle command/event interface	Must align on shared event schema
Compensating transactions	Not needed — atomic rollback	Not needed — JTA rollback	Required — explicit per-step design	Required — per-service reaction
Polyglot datastores	❌ No	❌ No	✅ Yes	✅ Yes
Cross-HTTP service coordination	❌ No	❌ No	✅ Yes	✅ Yes
Observability	Transaction log	JTA transaction log	Centralized saga state store	Distributed tracing required
Debugging stuck transactions	Coordinator WAL	JTA recovery log	Saga state table — single queryable source	Correlated trace IDs across all participants
Best use case	2–3 same-vendor DBs, hard atomicity	Java EE multi-datasource, co-managed DBs	Complex conditional workflows, compliance audit	High-throughput event pipelines, independent teams

🧭 Decision Guide: Choosing Your Distributed Transaction Strategy

Situation	Recommendation
Financial reconciliation across 2–3 same-vendor databases, hard atomicity required	2PC — use a highly-available coordinator; keep participant count minimal
Java EE / Spring app with multiple datasources under the same DBA team	XA (JTA) — Atomikos or Bitronix; all datasources must be XA-capable
Long-running workflow (>100ms), polyglot services, eventual consistency acceptable	Saga Orchestration — Axon Framework, Temporal, or AWS Step Functions
Services owned by independent teams, event-driven architecture already established	Saga Choreography — Kafka events; each service reacts to domain events
Service must publish events alongside DB writes without risk of message loss	Outbox Pattern — mandatory companion to any Saga that uses a message broker
Saga participant may receive duplicate commands on retry or broker redelivery	Idempotency Key — caller-generated, stored in same transaction as business write
Need to choose between orchestration and choreography	Orchestration if the workflow has conditional branches or compliance audit requirements; Choreography if teams are independent and distributed tracing tooling is mature

When eventual consistency is acceptable: In most e-commerce, booking, and logistics workflows, a gap of seconds to minutes of cross-service inconsistency is a modeled, expected state — not data corruption. An order in PAYMENT_PENDING state while the payment service processes is a valid visible business state. Design intermediate saga states as first-class domain states and expose them in the UI meaningfully.

When eventual consistency is not acceptable: Real-time financial settlement with immediate debit/credit reconciliation, inventory reservation for strictly limited-quantity flash sales where oversell causes hard downstream consequences, and regulatory-mandated atomic audit trails. In these cases, accept 2PC or XA and the availability and performance trade-offs they impose.

🛠️ Axon Framework: How It Implements Saga Orchestration in Java

Axon Framework is the leading Java framework for CQRS, event sourcing, and saga orchestration. An Axon Saga is a stateful event-handler class that tracks a long-running business process. Axon persists saga state between events, routes events to the correct saga instance, handles retry, and manages lifecycle — you focus on business logic and compensating commands.

An Axon Saga class is marked as a stateful orchestrator whose fields are automatically serialized and persisted to the saga store after every event handler runs. A @StartSaga annotation on one event handler tells Axon to create a new saga instance when that event type first arrives — typically the order-placed event that initiates the workflow. SagaLifecycle.associateWith() registers the correlation key (such as orderId) so Axon routes all subsequent events to the correct saga instance rather than creating new ones. Each step in the workflow maps to a separate saga event handler: on InventoryReservedEvent, the orchestrator issues the payment authorization command; on PaymentDeclinedEvent, it issues compensation commands in strict reverse order — releasing inventory before cancelling the order. Both the happy path and the compensation path terminate with an @EndSaga-annotated handler, at which point Axon serializes the final saga state and removes the instance from the active store. If the application restarts mid-saga, Axon deserializes the saved state and resumes from the last completed event — no custom checkpointing required.

Annotation / API	What it does
`@Saga`	Marks the class as a saga; Axon persists its serialized state in the saga store between events
`@StartSaga`	Axon creates a new saga instance when this event type first arrives
`SagaLifecycle.associateWith("key", value)`	Registers an association so Axon routes future events to this specific instance
`@SagaEventHandler(associationProperty = "orderId")`	Handles an event for the saga instance whose association matches
`@EndSaga`	Axon marks the instance complete, serializes final state, and removes it from the active store

Axon persists saga state after every handled event. If the application restarts mid-saga, the orchestrator resumes from the exact saved state — no custom state management required.

For a full deep-dive on Axon's CQRS and event sourcing architecture, see Saga Pattern: Coordinating Distributed Transactions with Compensation.

📚 Lessons From Production Distributed Transaction Failures

Don't retrofit 2PC onto microservices. XA and 2PC were designed for a world of two or three co-located databases managed by the same team. Introducing them across independent services adds coordinator SPOF, prevents polyglot datastores, and makes each service's database upgrade a coordinated multi-team event. The operational cure is worse than the consistency problem it solves.

Compensating transactions must be total and pre-designed. You cannot write compensating transactions reactively when a failure first occurs in production. Design the compensation logic before the first line of saga code is written. For every step, answer: "If the next step fails, exactly what SQL statement undoes this step's business effect?" If you cannot answer it, the step is not compensable — redesign the saga boundary.

Idempotency is not optional, and it is not free. Every network call in a distributed system will eventually be retried. At-least-once delivery is the default contract for every durable message broker. Treating idempotency as an afterthought means discovering double-charges, duplicate shipments, or double-credited driver payments in production — after real money has moved.

The Outbox pattern is the only correct event publication mechanism. Any service that calls kafkaProducer.send() directly after a database commit has a silent data-loss bug on crash-between-commit-and-publish. This triggers on every application restart, deployment, or container eviction — not just during catastrophic failures.

Saga state must be observable. A saga spanning five services with no central state visibility means that any stuck or failed saga requires reading five separate service logs with correlated trace IDs to diagnose. Invest in saga state queryability from day one — Axon's saga store, a purpose-built saga state table, or Temporal's built-in workflow history. You will use it the first week a saga gets stuck in production.

Design intermediate states as first-class business states. The states visible during saga execution — INVENTORY_RESERVED, PAYMENT_PENDING, SHIPMENT_SCHEDULED — will be read by users, customer support, and downstream analytics. Model them explicitly in the domain. A customer who sees "Order Processing" for 30 seconds does not call support. A customer who sees a payment confirmation with no order reference calls within minutes.

📌 TLDR: Summary & Key Takeaways

2PC delivers atomic all-or-nothing commits across participants but blocks indefinitely during the coordinator uncertainty window after all votes are collected. Use it only for small numbers of same-vendor databases with hard consistency requirements and a highly-available coordinator.
XA (JTA) is the Java standard for 2PC across multiple datasources. Viable in Java EE monoliths with co-owned databases; too brittle and vendor-tied for polyglot independent microservices.
Saga Orchestration replaces global locking with a sequence of local transactions coordinated by a central orchestrator. Choose this when workflows are conditional, multi-branch, or require compliance audit trails — and when your team owns all participants.
Saga Choreography removes the central orchestrator entirely. Best for high-throughput event pipelines with services owned by independent teams, backed by mature distributed tracing.
Outbox Pattern is mandatory for any Saga that publishes events to a broker. Writing to the outbox in the same database transaction as the business write is the only way to guarantee at-least-once event delivery without 2PC.
Idempotency is required for every saga participant because at-least-once delivery and orchestrator retries guarantee duplicate commands in any production system. Design idempotency keys as caller-generated, deterministic, TTL-bounded, and stored transactionally.
The core trade-off is not technical preference — it is a business consistency model decision driven by the CAP theorem. Make it explicitly, design compensating transactions before writing saga code, and make intermediate states visible from day one.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Clock Skew and Causality Violations: Why Distributed Clocks Lie

TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...

May 3, 2026•18 min read

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...

May 3, 2026•23 min read

Split Brain Explained: When Two Nodes Both Think They Are Leader

TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...

May 3, 2026•20 min read

HyperLogLog Explained: Counting Billions of Unique Items with 12 KB

TLDR: HyperLogLog estimates the number of distinct elements in a dataset using ~12 KB of memory regardless of cardinality — with ±0.81% error. The insight: if you hash every element to a random bit string, the maximum length of leading zeros you obse...

May 3, 2026•17 min read