All Posts

ACID Properties Explained: How SQL Databases Guarantee Atomicity, Consistency, Isolation, and Durability

From WAL and MVCC to fsync and isolation levels — the internals behind every ACID guarantee in PostgreSQL, MySQL, Oracle, and SQL Server.

Abstract AlgorithmsAbstract Algorithms
··44 min read
Cover Image for ACID Properties Explained: How SQL Databases Guarantee Atomicity, Consistency, Isolation, and Durability
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

AI-assisted content. This post may have been written or enhanced with the help of AI tools. While efforts are made to ensure accuracy, the content may contain errors or inaccuracies. Please verify critical information independently.

TLDR: ACID is four orthogonal guarantees that every SQL transaction must provide. Atomicity says all-or-nothing: PostgreSQL implements it via WAL rollback; MySQL InnoDB via undo logs. Consistency says constraints always hold: the database rejects any write that violates a CHECK, FOREIGN KEY, or NOT NULL. Isolation says concurrent transactions cannot see each other's intermediate states: PostgreSQL uses MVCC + Serializable Snapshot Isolation; MySQL InnoDB uses MVCC + Two-Phase Locking. Durability says committed data survives crashes: every major database writes to a write-ahead log and calls fsync before acknowledging a commit. Know which property protects which failure class — and know the configuration knobs that let you trade durability for throughput when the workload permits.

📖 The $500 That Vanished: Why ACID Properties Exist

It is 09:14:33.412 UTC on a Friday. A banking platform is processing end-of-week payroll transfers. One of those transfers is transfer($500, account_A → account_B) — an employee's last paycheck.

Here is what the application code does:

  1. UPDATE accounts SET balance = balance - 500 WHERE id = 'account_A'; → executes successfully. Account A is debited. Balance: $2,300 → $1,800.
  2. The application server running the transfer logic suffers a kernel panic. The JVM process is killed mid-flight.
  3. The credit to Account B — UPDATE accounts SET balance = balance + 500 WHERE id = 'account_B'; — never executes.

Without any transaction guarantees, what has happened? Account A has lost $500 that Account B never received. The money does not exist anywhere in the database. The ledger no longer balances. There is no log entry proving the transfer completed, because it never did. And there is no automatic mechanism to reverse the debit, because the database treated each SQL statement as independent.

This is the failure class that Atomicity was designed to prevent. But it is not the only failure class. What if:

  • A second transfer concurrently reads Account A's balance mid-transfer, sees $1,800, and makes an overdraft decision based on a balance that is about to be rolled back — a dirty read?
  • The CHECK (balance >= 0) constraint is silently bypassed because two concurrent transactions both read the same balance and both proceed — classic write skew?
  • The transfer commits successfully, the user sees a confirmation screen, and then the database server loses power — and on restart, the committed transfer is gone?

Each of those scenarios is a different failure class. Each one is prevented by a different ACID property. This post explains what each property actually guarantees, what failure it prevents, and — most importantly — how your database implements it at the storage engine level using WAL, undo logs, MVCC, locking, and fsync.


🔍 ACID Is Four Guarantees, Not One Feature

ACID stands for Atomicity, Consistency, Isolation, and Durability. These are four orthogonal properties. A database that provides three of them but not the fourth does not provide ACID. They work as a complete set.

Each property targets a distinct failure class:

PropertyFailure Class PreventedCore Mechanism
AtomicityPartial writes — half a transaction survives a crashUndo log / WAL rollback
ConsistencyConstraint violations — data enters an illegal stateConstraint evaluation at commit
IsolationConcurrency anomalies — transactions interfere mid-flightMVCC, 2PL, SSI
DurabilityCommitted data loss — a confirmed write disappears after a crashWAL + fsync

A useful mental model: think of a transaction as a sealed envelope of changes. Atomicity says the envelope is either fully delivered or fully returned — never partially. Consistency says the contents must pass postal inspection before delivery. Isolation says the post office handles each envelope as if no other envelopes are in the system at the same time. Durability says once the delivery receipt is signed, the package cannot disappear from the recipient's mailbox.

Now let's examine how each property is actually built.


⚙️ Atomicity: How WAL and Undo Logs Guarantee All-or-Nothing

Atomicity means a transaction either commits completely — every write takes effect — or rolls back completely — no write takes effect. There is no in-between. If the application crashes between the debit and the credit in the bank transfer, the debit is undone as if it never happened.

The failure mode atomicity prevents is a partial write: a state where some operations of a multi-step transaction have been applied to durable storage but others have not. Without atomicity, every multi-step operation is a potential data corruption event waiting for a crash.

How PostgreSQL implements Atomicity — Write-Ahead Log

PostgreSQL uses a Write-Ahead Log (WAL) to guarantee atomicity. The rule is simple but absolute: before any change is written to a data page (the heap), the change must first be written as a WAL record on disk. WAL records are sequential, append-only, and durable.

When a transaction begins:

  1. Every INSERT, UPDATE, and DELETE produces one or more WAL records describing the change.
  2. Those WAL records are written to the WAL buffer, then flushed to disk.
  3. The actual heap pages (the table data) are modified in the shared buffer pool.
  4. On COMMIT, PostgreSQL flushes all pending WAL records for the transaction to disk and writes a commit WAL record. Only after this flush does the client receive the commit acknowledgment.
  5. On ROLLBACK (or on crash recovery of an uncommitted transaction), PostgreSQL reads the WAL, finds no commit record for that transaction, and uses the WAL's undo information to reverse all heap page changes that transaction made.

This is why a crash mid-transaction does not corrupt data in PostgreSQL: the heap pages may have been partially modified in memory, but on restart, the recovery process scans the WAL, identifies transactions that have no commit record, and rolls back their changes.

How MySQL InnoDB implements Atomicity — Undo Log

MySQL InnoDB uses a different but equivalent mechanism: the undo log (also called rollback segments). Before InnoDB modifies any row, it writes the before-image of that row to the undo log. If the transaction commits, the undo log entries for that transaction are eventually discarded. If the transaction rolls back — or if crash recovery detects an uncommitted transaction — InnoDB reads the undo log entries and applies them in reverse order, restoring every row to its pre-transaction state.

The sequence diagram below traces the bank transfer crash scenario and shows how WAL-based atomicity in PostgreSQL recovers the debited account:

sequenceDiagram
    participant App as Application Server
    participant WAL as WAL Buffer / Disk
    participant Heap as Data Pages (Heap)
    participant Rec as Recovery Process

    App->>WAL: Write WAL record: debit account_A by 500
    WAL-->>App: WAL record durable
    App->>Heap: Modify account_A page in buffer pool
    Note over App: Server crashes here - no COMMIT WAL record written
    Rec->>WAL: Scan WAL on restart
    Rec-->>Rec: No COMMIT record found for txn-8821
    Rec->>Heap: Roll back account_A page to pre-transaction state
    Note over Heap: account_A balance restored to 2300

The diagram shows the two-phase nature of WAL atomicity: first, WAL records the intent; second, if no commit record exists at recovery time, the undo path is taken. The application server crash between steps produces zero durable partial state — the heap is fully restored.

Transaction rollback mechanics also give applications fine-grained control. Both PostgreSQL and MySQL support SAVEPOINT — a named checkpoint within a transaction. ROLLBACK TO SAVEPOINT sp1 undoes only the work done since sp1 without aborting the entire transaction, which is useful for nested retry logic.


Consistency means that a transaction takes the database from one valid state to another valid state. A valid state is one in which all declared integrity constraints hold: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK constraints are all satisfied.

The failure mode consistency prevents is a constraint violation reaching durable storage: a row with a null value in a NOT NULL column; a foreign key pointing to a non-existent parent row; an account balance that has gone negative in a system where CHECK (balance >= 0) is defined.

Consistency is the only ACID property that is shared between the database and the application. The database enforces the constraints it knows about. Business rules the database cannot express — such as "a transfer cannot exceed the sender's daily limit" — are the application's responsibility. If the application has a bug in its business logic, the database's consistency guarantees do not save it.

How SQL databases implement Consistency — Constraint evaluation at commit

When a transaction issues a COMMIT, the database engine evaluates all active constraints against the rows modified by that transaction. If any constraint is violated, the commit is rejected and the transaction is rolled back automatically, returning an error to the client.

graph TD
    A[Transaction issues COMMIT] --> B{NOT NULL check passes?}
    B -->|No| Z[Rollback + error to client]
    B -->|Yes| C{UNIQUE / PRIMARY KEY check passes?}
    C -->|No| Z
    C -->|Yes| D{FOREIGN KEY check passes?}
    D -->|No| Z
    D -->|Yes| E{CHECK constraint passes?}
    E -->|No| Z
    E -->|Yes| F[COMMIT succeeds - data durable]

The diagram shows constraint evaluation as a sequential gate: every declared constraint must pass before the commit is allowed through. The order of evaluation can vary by database, but the effect is the same — any single violation rolls the entire transaction back.

Deferred constraints in PostgreSQL

By default, constraints are evaluated immediately — at the time each row is modified, not just at commit. PostgreSQL allows constraints to be declared DEFERRABLE INITIALLY DEFERRED, which shifts evaluation to commit time. This is essential for operations where constraints would be temporarily violated mid-transaction but are guaranteed to be satisfied by commit — for example, re-ordering rows in a circular foreign key chain, or bulk-loading data where rows reference each other.

ALTER TABLE orders
    ADD CONSTRAINT fk_customer
    FOREIGN KEY (customer_id) REFERENCES customers(id)
    DEFERRABLE INITIALLY DEFERRED;

The critical distinction: ACID Consistency ≠ CAP Consistency

This point causes significant confusion. The C in ACID refers to single-node database integrity — constraints, invariants, valid states. The C in CAP theorem refers to linearizability in a distributed system: whether all nodes see the same data at the same time. These are completely different properties. A database can provide ACID consistency (no constraint violations) while violating CAP consistency (different nodes see different values). Do not conflate them.


⚙️ Isolation: How MVCC and Locking Prevent Concurrent Transactions from Interfering

Isolation means that concurrent transactions execute as if they were running serially — one after the other. The intermediate state of one transaction is invisible to all other concurrent transactions. If two transfers are happening simultaneously to overlapping accounts, neither one should be able to observe the other in a half-finished state.

Isolation is the hardest ACID property to implement efficiently. It governs the correctness-throughput trade-off at the core of every database concurrency design. Because full serializability is expensive, SQL databases define an isolation spectrum with four standard levels:

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadWrite Skew
Read UncommittedPossiblePossiblePossiblePossible
Read CommittedPreventedPossiblePossiblePossible
Repeatable ReadPreventedPreventedPossible (MySQL: Prevented)Possible
SerializablePreventedPreventedPreventedPrevented

Each anomaly listed in the table has a dedicated deep-dive post (linked in the Related Posts section at the end). The important observation here is that each higher isolation level closes another class of anomaly, at increasing cost to concurrency.

Two-Phase Locking (2PL)

The traditional mechanism for isolation is Two-Phase Locking: a transaction acquires all the locks it needs (growing phase) and then releases them only after commit or rollback (shrinking phase). No lock is released until the transaction ends. This prevents other transactions from modifying rows that are in use, but it means readers can block writers and writers can block readers — reducing throughput under high concurrency.

SQL Server's default isolation (Read Committed with locking) and MySQL InnoDB's REPEATABLE READ use 2PL variants.

Multi-Version Concurrency Control (MVCC)

PostgreSQL, Oracle, and MySQL InnoDB all implement MVCC: instead of blocking readers when a writer modifies a row, the database keeps multiple versions of each row. A reading transaction sees the version that was current at its transaction start time — it never sees the version being written by a concurrent uncommitted transaction.

The diagram below illustrates how MVCC handles the bank transfer read isolation problem. T1 is a balance-read transaction; T2 is the transfer transaction. MVCC allows T1 to see a consistent snapshot without blocking T2:

graph TD
    A[T2 starts: writes new version of account_A - balance 1800] --> B[T2 is still uncommitted]
    C[T1 starts at snapshot time T_start] --> D[T1 reads account_A]
    D --> E{Row version committed before T_start?}
    E -->|Yes - old version balance 2300| F[T1 returns balance 2300]
    E -->|No - T2 new version not yet committed| F
    B --> G[T2 commits]
    G --> H[Future transactions see balance 1800]
    F --> I[T1 sees balance 2300 throughout its lifetime]

The key insight: T1 reads the old committed version (balance: $2,300) because T2's new version was not committed at T1's snapshot time. T2 is free to proceed concurrently. There is no blocking in either direction. This is why MVCC is often described as "readers don't block writers, writers don't block readers."

Serializable Snapshot Isolation (SSI) in PostgreSQL

PostgreSQL's SERIALIZABLE isolation level goes beyond MVCC. It implements Serializable Snapshot Isolation (SSI), which detects read-write anti-dependency cycles — the root cause of write skew anomalies — without acquiring heavy predicate locks. SSI tracks transaction dependencies at runtime and aborts a transaction if completing it would produce a cycle that is incompatible with serial execution. The aborted transaction can then be retried.

Database isolation defaults

DatabaseDefault Isolation Level
PostgreSQLRead Committed
MySQL InnoDBRepeatable Read
OracleRead Committed
SQL ServerRead Committed (lock-based)

PostgreSQL and Oracle default to Read Committed because their MVCC implementations make it highly efficient. MySQL InnoDB defaults to Repeatable Read because its gap-lock implementation makes phantom prevention relatively cheap at that level.


⚙️ Durability: How WAL and fsync Guarantee Committed Data Survives Crashes

Durability means that once a transaction commits and the client receives the acknowledgment, the changes are permanent. A subsequent crash — power loss, kernel panic, disk controller failure — does not roll back a committed transaction. The user who clicked "Pay" and saw the confirmation screen is guaranteed their payment is recorded, even if the server dies one second later.

The failure mode durability prevents is committed data loss: a scenario where a transaction's commit acknowledgment was sent to the client, but the data existed only in volatile memory (DRAM) at the time of the crash and was never flushed to non-volatile storage.

How PostgreSQL implements Durability — WAL + fsync

PostgreSQL's durability guarantee is built on two mechanisms working together:

  1. WAL flush before commit acknowledgment: PostgreSQL's commit path flushes all WAL records for the committing transaction to disk (using fsync or fdatasync) before returning COMMIT to the client. The client cannot receive a success response until the WAL is durable.

  2. Checkpoints: Periodically (controlled by checkpoint_completion_target and max_wal_size), PostgreSQL writes all modified ("dirty") pages from the shared buffer pool to the actual heap files on disk. This limits how far back the recovery process needs to replay the WAL after a crash — you only replay WAL records since the last completed checkpoint.

sequenceDiagram
    participant Client as Client Application
    participant PG as PostgreSQL
    participant WAL as WAL File (Disk)
    participant Heap as Heap Data Files (Disk)

    Client->>PG: COMMIT transaction txn-8821
    PG->>WAL: Write COMMIT record for txn-8821
    PG->>WAL: fsync - flush WAL to non-volatile storage
    WAL-->>PG: fsync confirmed
    PG-->>Client: COMMIT acknowledged
    Note over Client: Client considers the transaction durable
    Note over PG: Server crashes here
    PG->>WAL: Restart - replay WAL from last checkpoint
    WAL-->>PG: Replay COMMIT record for txn-8821
    PG->>Heap: Apply txn-8821 changes to heap files
    Note over Heap: Data intact - durability preserved

The sequence makes the durability contract concrete: the client receives the commit acknowledgment only after the WAL fsync completes. A crash after that point replays the WAL on restart and restores the committed state. The heap files may be stale — that is fine, the WAL is the source of truth.

How MySQL InnoDB implements Durability — Redo Log

MySQL InnoDB uses a redo log (analogous to WAL): a circular on-disk log of all changes made to InnoDB pages. The redo log is flushed to disk at commit time, governed by innodb_flush_log_at_trx_commit:

  • = 1 (default, fully durable): Redo log is flushed to disk with fsync at every commit. Survives OS crashes and power loss. Maximum safety.
  • = 2 (OS-level durability): Redo log is written to the OS file cache at commit, flushed to disk once per second. Survives application crashes; may lose up to 1 second of commits on OS crash or power loss.
  • = 0 (dangerous): Redo log is flushed from InnoDB's own log buffer to the OS cache once per second. May lose up to 1 second of commits on application crash.

Group Commit

Both PostgreSQL and MySQL InnoDB implement group commit: when multiple transactions commit near-simultaneously, their WAL records are batched into a single fsync call. This dramatically improves throughput under high concurrency — 10 transactions sharing one fsync pay the I/O cost of one, not ten. Group commit is transparent to applications.


🧠 Deep Dive: How the WAL Commit Path and MVCC Storage Actually Work

Understanding why ACID properties hold requires going one level below the property definitions — into the exact sequence of I/O calls, data structures, and version chains that make each guarantee mechanically enforceable.

The WAL Internals: LSNs, WAL Segments, and the Commit Pipeline

Every change in PostgreSQL is identified by a Log Sequence Number (LSN) — a monotonically increasing 64-bit integer representing a byte offset in the WAL stream. When a row is modified, PostgreSQL:

  1. Assigns the change an LSN (e.g., 0/1A3F2B0).
  2. Writes a WAL record containing: the LSN, the transaction ID (XID), the relation OID (table identifier), the block number being modified, and the before/after images of the changed bytes.
  3. Appends the WAL record to the WAL buffer (a circular in-memory ring buffer, default 16MB, controlled by wal_buffers).
  4. At commit time (or when the WAL buffer fills), flushes WAL segments to disk. Each WAL segment is a 16MB file in $PGDATA/pg_wal/. The flush uses fdatasync (Linux default) to bypass the OS page cache and write directly to non-volatile storage.
  5. Writes the COMMIT WAL record with the final LSN. Only after this record is fsync'd does PostgreSQL send the commit acknowledgment to the client.

The checkpoint process periodically writes all dirty shared buffer pool pages to the heap files and records the checkpoint LSN in the control file. After a checkpoint, all WAL records before that LSN are no longer needed for recovery. This bounds crash recovery time: on restart, PostgreSQL replays only WAL records newer than the last checkpoint LSN.

MySQL InnoDB's redo log follows the same logical structure. The redo log is a set of on-disk files (historically ib_logfile0, ib_logfile1; MySQL 8.0.30+ uses a single #ib_redo0 file) written as a circular ring. Each redo log record contains the tablespace ID, page number, byte offset within the page, and new page content. At commit (innodb_flush_log_at_trx_commit = 1), InnoDB flushes the redo log ring buffer to the redo log files with fsync before returning success to the client.

Performance Analysis: The fsync Bottleneck and Group Commit

The single most expensive operation in the ACID commit path is the fsync call. On a SATA SSD, a single fsync takes approximately 0.1–1ms. On a spinning disk, it can take 3–10ms. This means a single-threaded workload performing fsync on every commit is limited to 100–10,000 commits per second by I/O alone — regardless of CPU speed or memory.

Group commit is the primary mechanism databases use to amortize this cost. When multiple transactions commit in rapid succession, their WAL records are batched and a single fsync covers all of them. A batch of 50 transactions shares one fsync call — effectively reducing the per-transaction I/O cost by 50×.

PostgreSQL's commit_delay and commit_siblings parameters control group commit behaviour: if commit_siblings or more transactions are currently committing, PostgreSQL waits commit_delay microseconds before flushing, allowing more transactions to join the batch. At high concurrency, group commit is automatic without any delay — the WAL writer processes multiple transactions' WAL records in a single flush pass.

The practical implication: at low concurrency (one transaction at a time), synchronous_commit = on with a fast NVMe SSD provides adequate throughput for most OLTP workloads. At very high concurrency on spinning disks, synchronous_commit = off (PostgreSQL) or innodb_flush_log_at_trx_commit = 2 (MySQL) can be the difference between 500 commits/second and 50,000 commits/second.


🧪 Tracing All Four ACID Properties on the Bank Transfer

To make the four-property framework concrete, let's trace the transfer($500, account_A → account_B) scenario end-to-end, with each ACID property actively in play.

graph TD
    A[transfer 500 from A to B starts] --> B[Debit account_A by 500]
    B --> C{Consistency check: CHECK balance >= 0}
    C -->|Fails - account_A has only 200| D[Rollback entire transaction - Atomicity]
    C -->|Passes| E[Credit account_B by 500]
    E --> F{Consistency check: FOREIGN KEY, NOT NULL pass?}
    F -->|Fails| D
    F -->|Passes| G[Concurrent reader T2 reads account_A]
    G --> H[Isolation: T2 sees pre-transfer snapshot via MVCC]
    H --> I[COMMIT - WAL fsync called]
    I --> J[Durability: changes survive any subsequent crash]
    J --> K[Transfer complete - both changes durable]
    D --> L[account_A balance unchanged - no partial write]

The diagram traces the four ACID checkpoints on a single transaction. Atomicity acts as a safety net if anything fails mid-transfer. Consistency evaluates the CHECK (balance >= 0) constraint before any change is finalized. Isolation ensures that a concurrent reader running alongside the transfer sees a consistent pre-transfer or post-transfer snapshot, never a mid-transfer state. Durability ensures that the WAL is flushed before the client receives the confirmation.

Concretely:

  • A: If the credit to account_B fails (network error, constraint violation, application crash), the debit to account_A is rolled back. The ledger stays balanced.
  • C: CHECK (balance >= 0) on account_A fires if the transfer would send more than the available balance. The database rejects the transaction before any durable write occurs.
  • I: A concurrent reporting query running at the same time reads account_A's balance. Via MVCC, it sees the last committed snapshot — either the pre-transfer balance ($2,300) or the post-transfer balance ($1,800) — never the in-flight intermediate state.
  • D: Once COMMIT is acknowledged, even if the server loses power in the next millisecond, the WAL contains the full committed transaction and recovery will restore it.

📊 How PostgreSQL, MySQL, Oracle, and SQL Server Implement Each Property

PropertyPostgreSQLMySQL InnoDBOracleSQL Server
AtomicityWAL + MVCC rollback on recoveryUndo log + redo logUndo segments (rollback segments)Transaction log (LDF file)
ConsistencyConstraints + triggers + deferred constraintsConstraints + triggersConstraints + triggersConstraints + triggers
IsolationMVCC + SSI (Serializable)MVCC + 2PL gap locksMVCC (statement-level read consistency)2PL + optional RCSI (MVCC variant)
DurabilityWAL + fsync + checkpointsInnoDB redo log + fsyncOnline redo log + archivelogWrite-ahead log + VLF flushing
Default isolationRead CommittedRepeatable ReadRead CommittedRead Committed

Three observations from this table:

  1. All four databases use write-ahead logging for durability — the mechanism differs in naming but is functionally equivalent.
  2. Oracle's MVCC operates at statement level by default (each statement sees a snapshot as of its own start time), while PostgreSQL's MVCC operates at transaction level (the entire transaction sees the snapshot as of its start time). This makes Oracle more prone to non-repeatable reads within a single transaction.
  3. SQL Server's Read Committed Snapshot Isolation (RCSI) is an opt-in MVCC mode that turns Read Committed from a locking protocol into an MVCC protocol — readers no longer block writers, significantly improving throughput in read-heavy workloads.

🌍 Where ACID Guarantees Are Non-Negotiable in Production

ACID is not a checkbox for academic correctness — it directly governs the safety of specific real-world transaction classes. Understanding which systems require full ACID and which can relax individual properties is the practical skill that separates database-aware engineers from those who apply blanket rules.

Financial systems and payment processors require all four properties without compromise. Stripe processes millions of transactions per day on PostgreSQL. Debit-credit pairs must be atomic (no partial transfers), balances must satisfy non-negativity constraints (consistency), concurrent payment requests must not double-spend the same funds (isolation), and confirmed payments must survive node failures (durability). A single property violation at this scale produces regulatory exposure and customer harm.

E-commerce inventory and reservation systems require atomicity and isolation at minimum. When a customer reserves the last unit of a product, the reservation check and the quantity decrement must execute atomically and in isolation from concurrent requests. Netflix uses MySQL InnoDB and PostgreSQL across its content metadata services. Airbnb's reservation system runs on MySQL InnoDB with Repeatable Read isolation specifically to prevent double-booking anomalies in calendar slot reservations.

Healthcare record systems require atomicity and consistency: a patient record update that writes a new medication dosage but fails to link it to the prescribing physician must be rolled back entirely. Partial writes in healthcare data can cause misdiagnosis if downstream reads encounter half-updated records.

Session stores and analytics ingestion pipelines are the systems where relaxing durability is appropriate. A session token that expires in 15 minutes does not need synchronous_commit = on and fsync = on. The cost of losing a few seconds of session data on a crash is negligible. Similarly, an analytics pipeline ingesting 100,000 events per second from a click stream can accept innodb_flush_log_at_trx_commit = 2 — the volume and bounded loss window make it an acceptable trade.

Distributed ledger operations — including blockchain-adjacent systems and multi-region financial replication — require distributed ACID. Google Spanner uses TrueTime (GPS + atomic clock timestamps) to provide external consistency across global datacenters. CockroachDB implements distributed transactions using the Raft consensus protocol, providing serializable isolation across nodes. Both pay a latency premium over single-region deployments, but that premium is the price of global ACID.


⚖️ Durability vs. Throughput: The fsync Trade-off

Full ACID durability is not free. The fsync call that PostgreSQL and MySQL InnoDB issue at commit time is a synchronous flush to non-volatile storage — it blocks until the disk confirms the write. This is the single most expensive operation in the commit path, and it is why high-frequency OLTP workloads on spinning disks saturate quickly.

Database operators have several knobs to trade durability for throughput:

PostgreSQL: synchronous_commit

The synchronous_commit parameter in postgresql.conf controls how much of the WAL pipeline must complete before the commit acknowledgment is sent:

  • on (default): WAL flushed to disk before ack. Full durability.
  • remote_write (streaming replication): WAL written to standby OS cache before ack. Survives primary crash.
  • local (replication context): WAL flushed locally, standby not waited for. Useful when replica durability is secondary.
  • off: Commit is acknowledged after WAL is written to the WAL buffer — before fsync. On crash, up to wal_writer_delay (default 200ms) of committed transactions may be lost. The database is not corrupted — no partial writes — but recent commits may vanish.

The synchronous_commit = off setting can deliver 2–3× throughput improvement on commit-intensive workloads. The risk is bounded and known: at most 200ms of data loss in a crash. For session data, rate limiting counters, or analytics ingestion pipelines, this is an acceptable trade. For financial ledgers, it is not.

MySQL InnoDB: innodb_flush_log_at_trx_commit

As described earlier, the three settings offer a spectrum:

SettingDurabilityThroughputAcceptable for
1Full (survives power loss)BaselineFinancial transactions, audit logs
2OS-level (survives app crash)~2× improvementApplication crash recovery is sufficient
0InnoDB buffer onlyMaximumEphemeral data, batch loads, analytics

When to use these trade-offs

The principle is: match durability requirements to the cost of the data. A session token expiring in 15 minutes does not need the same durability guarantee as a bank debit. An analytics event ingestion pipeline writing 50,000 rows/second can absorb 200ms of potential loss far more easily than a payment processor committing 200 rows/second.

The right answer is never to disable durability globally — use it selectively, per workload, with SET synchronous_commit = off scoped to specific transactions or sessions rather than the entire instance.


🔄 ACID vs. BASE: When Full Guarantees Are Too Expensive to Distribute

ACID properties were designed for single-node (or tightly-coupled) relational databases. When you distribute a database across multiple nodes — sharding across datacenters, geo-replication across continents — providing full ACID becomes expensive, sometimes prohibitively so.

This is where the CAP theorem enters: a distributed system under network partition can provide either Consistency (every node returns the latest committed value) or Availability (every node responds to requests) but not both simultaneously. This is the reason distributed databases introduced BASE as an alternative consistency model:

  • Basically Available: the system responds to requests even during partial failures
  • Soft state: the system's state may change over time without new input (due to eventual consistency propagation)
  • Eventually consistent: given no new writes, all replicas converge to the same value

BASE is not unsafe — it is a deliberate trade-off. NoSQL databases like Cassandra, DynamoDB (in eventual consistency mode), and CouchDB choose availability and partition tolerance over strict consistency. For use cases where a slightly stale read is acceptable — social media timelines, product catalog searches, recommendation engines — BASE is the right choice.

For financial transactions, inventory management, reservations, and any domain where correctness trumps availability, ACID remains the correct choice.

The CAP theorem does not make ACID impossible in distributed systems — it makes it expensive. CockroachDB, Google Spanner, and FaunaDB provide distributed ACID using consensus protocols (Raft, Paxos) and TrueTime. They are fully ACID-compliant but require careful latency management across nodes.


⚠️ Common Misconceptions About ACID That Lead to Production Bugs

"ACID means safe; BASE means unsafe"

False. BASE is a deliberate engineering choice for workloads where eventual consistency is acceptable and availability under partition is non-negotiable. Calling it "unsafe" mischaracterizes the design intent. The unsafety comes from using BASE guarantees for workloads that require ACID — not from BASE itself.

"Serializable isolation level = ACID"

Serializable is the I in ACID — the strongest isolation level. It is one of four properties. A database that provides Serializable isolation but lacks a write-ahead log and crashes without flushing committed data is not ACID. Isolation without durability is not ACID.

"NoSQL databases can't be ACID"

False. MongoDB 4.0+ provides multi-document ACID transactions. CockroachDB provides distributed ACID across nodes. FaunaDB was designed from the ground up as ACID-compliant. The common association of NoSQL with BASE is historical, not structural.

"The C in ACID and the C in CAP are the same thing"

Completely different. ACID Consistency = constraint integrity within a single database. CAP Consistency = linearizability across distributed nodes. A system can have one, both, or neither. Conflating them leads to category errors in distributed system design.

"Disabling fsync is just a performance setting, not a correctness setting"

Disabling fsync in PostgreSQL (fsync = off) disables the durability guarantee entirely. A crash with fsync = off can corrupt the database — not just lose recent commits, but corrupt the data files themselves, because partial writes to pages are not protected. This is the one configuration change in PostgreSQL that can cause data corruption, and it is documented as such.


🧭 Choosing the Right Isolation Level and Durability Setting for Your Workload

The right ACID configuration depends entirely on your workload's correctness requirements and throughput constraints. This decision guide maps workload patterns to specific database settings.

Isolation Level Decision Table

Workload PatternAnomalies You Must PreventRecommended IsolationNotes
Payment / debit-credit transfersDirty read, non-repeatable read, write skewSerializableUse SSI in PostgreSQL for write-skew prevention without heavy locking
Inventory reservation (single-item)Dirty read, lost updateRepeatable ReadOr Read Committed + SELECT ... FOR UPDATE
Order status readsDirty read onlyRead CommittedDefault in PostgreSQL and Oracle; sufficient for most OLTP reads
Reporting / analyticsTolerable stalenessRead Committed or SnapshotMVCC snapshots give consistent point-in-time reads without blocking writers
Bulk data load (admin)None — data is being re-createdRead UncommittedNot available in PostgreSQL; MySQL only; use with care
Double-booking prevention (calendar/seats)Phantom read, write skewSerializable or explicit lockingSELECT ... FOR UPDATE on the target row + atomic check-and-insert

Durability Setting Decision Table

WorkloadData Loss ToleranceRecommended SettingRationale
Financial transactions, audit logsZeroPostgreSQL: synchronous_commit = on / MySQL: innodb_flush_log_at_trx_commit = 1Full durability — no data loss under any failure
Application event logs, user activity200ms window acceptablePostgreSQL: synchronous_commit = off~2–3× throughput gain; bounded, non-corrupting loss
Session data, rate-limit counters1 second window acceptableMySQL: innodb_flush_log_at_trx_commit = 2OS-level durability; survives app crash, not power loss
Analytics ingest, ETL scratch tablesRecoverable from sourceMySQL: innodb_flush_log_at_trx_commit = 0Maximum throughput; only use if source data can be reloaded
Dev / staging environmentsAnyEither relaxed settingNever disable fsync entirely — even in staging

The guiding principle: match the durability guarantee to the recovery cost of the data. Ask: "If this data were lost after a crash, how long would it take to recover it, and what is the business impact?" If the answer is "we can't recover it" or "regulatory violation," use synchronous_commit = on. If the answer is "we re-read from Kafka," use off.


🛠️ PostgreSQL and MySQL: Production ACID Configuration

PostgreSQL postgresql.conf — Durability and WAL settings

The following parameters govern how PostgreSQL implements Atomicity and Durability at the storage level. These are the first parameters any DBA or platform engineer should understand when tuning a PostgreSQL instance for production.

# postgresql.conf

# --- Durability controls ---

# Guarantee WAL is flushed to disk before commit ack.
# Set to 'off' only for non-critical, high-throughput sessions.
synchronous_commit = on

# Must be ON in production. Turning off WILL cause data corruption on crash.
fsync = on

# WAL flush method. 'fdatasync' is the safe default on Linux.
# 'open_sync' or 'open_datasync' for specific NFS/storage configs.
wal_sync_method = fdatasync

# Time between WAL writer flushes (relevant when synchronous_commit = off)
wal_writer_delay = 200ms

# --- Checkpoint controls (limit WAL replay distance after crash) ---
checkpoint_completion_target = 0.9
max_wal_size = 1GB

# --- Isolation level default ---
# Options: read committed, repeatable read, serializable
default_transaction_isolation = 'read committed'

PostgreSQL DDL — Consistency via CHECK and DEFERRABLE constraints

-- Enforce balance non-negativity at the database level
CREATE TABLE accounts (
    id          BIGSERIAL PRIMARY KEY,
    owner_id    BIGINT NOT NULL REFERENCES users(id),
    balance     NUMERIC(18, 2) NOT NULL DEFAULT 0,
    CONSTRAINT chk_balance_non_negative CHECK (balance >= 0)
);

-- Deferred foreign key: allows temporary violation within a transaction,
-- evaluated only at COMMIT time. Useful for bulk inserts with circular refs.
ALTER TABLE transfers
    ADD CONSTRAINT fk_source_account
    FOREIGN KEY (source_account_id) REFERENCES accounts(id)
    DEFERRABLE INITIALLY DEFERRED;

MySQL my.cnf — InnoDB Durability settings

# my.cnf (mysqld section)

# --- InnoDB Durability ---

# 1 = fully durable (default): redo log flushed to disk on every commit.
# 2 = OS-level: redo log to OS cache on commit, fsync once/second.
# 0 = dangerous: redo log to InnoDB buffer only, may lose up to 1 second on crash.
innodb_flush_log_at_trx_commit = 1

# Controls binary log sync. 1 = sync to disk on every commit (required for replication durability).
sync_binlog = 1

# InnoDB redo log capacity (MySQL 8.0.30+). Replaces innodb_log_file_size.
innodb_redo_log_capacity = 1073741824  # 1 GB

# Default transaction isolation for all connections.
transaction_isolation = REPEATABLE-READ

PostgreSQL: Setting isolation level per transaction

-- Set isolation level at the transaction level (overrides default)
BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- ... critical operations requiring full serializability ...
COMMIT;

-- Or set for the current session only
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Check current isolation level
SHOW transaction_isolation;

📚 Lessons Learned from Production ACID Failures and Near-Misses

  1. fsync = off in PostgreSQL is not a tuning knob — it is a data corruption switch. Several high-profile incidents involving cloud database providers stem from this setting being disabled for performance benchmarks and accidentally left in production. Unlike synchronous_commit = off, which risks losing recent commits, fsync = off risks corrupting the entire data directory on an unclean shutdown.

  2. The C in ACID only protects what you declared. A CHECK (balance >= 0) constraint blocks overdrafts at the database level. But a business rule like "a transfer cannot exceed $10,000 per day" lives in application code. If that application code has a concurrency bug, the database's consistency guarantees do not catch it. Application-level invariants need application-level enforcement — or stored procedures, or serializable transactions that read-then-write the daily total atomically.

  3. MVCC solves the reader-writer blocking problem but creates a vacuum obligation. In PostgreSQL, old row versions (needed for in-flight MVCC snapshots) accumulate in heap pages until VACUUM reclaims them. A long-running transaction holds a snapshot open, preventing VACUUM from reclaiming any row version newer than that snapshot. In high-write databases, a forgotten long transaction causes table bloat. Monitor pg_stat_activity for transactions older than 5 minutes.

  4. innodb_flush_log_at_trx_commit = 2 is not as safe as it looks. Setting 2 survives application crashes but not OS crashes or power loss. If your EC2 instance is unexpectedly terminated (not gracefully stopped), a setting of 2 may lose up to 1 second of committed transactions. For financial data, always use setting 1 and invest in fast NVMe storage instead of weakening durability.

  5. Isolation level defaults differ across databases — and this causes real bugs on migration. Moving an application from MySQL InnoDB (Repeatable Read) to PostgreSQL (Read Committed) changes the isolation semantics. Queries that relied on Repeatable Read's protection against non-repeatable reads will now see changed data within the same transaction. Always validate isolation requirements explicitly rather than relying on database defaults.

  6. SAVEPOINT is underused. Applications that implement retry logic by aborting and restarting entire transactions on transient errors waste significant throughput. Using SAVEPOINT to mark a retry boundary allows partial rollback without aborting the outer transaction — critical in long-running batch jobs where the cost of restarting from scratch is prohibitive.

  7. Serializable isolation is not prohibitively expensive in PostgreSQL with SSI. A common belief is that Serializable isolation requires heavy locking that kills concurrency. PostgreSQL's SSI implementation detects conflicts without locking — it aborts transactions only when a true serialization conflict is detected. For many OLTP workloads, the abort rate under SSI is low, and the correctness guarantee is absolute. Benchmark before dismissing Serializable as impractical.


📌 Summary: ACID in SQL Databases

  • Atomicity guarantees all-or-nothing execution. PostgreSQL uses WAL rollback: uncommitted WAL records are reversed on crash recovery. MySQL InnoDB uses undo logs: before-images are reapplied if the transaction does not commit.

  • Consistency guarantees that constraint invariants — NOT NULL, UNIQUE, FOREIGN KEY, CHECK — are never violated by a committed transaction. The database evaluates all constraints at commit time and rejects the transaction if any fails. Application-layer business rules are outside this guarantee.

  • Isolation governs how concurrent transactions interact. The four standard isolation levels — Read Uncommitted, Read Committed, Repeatable Read, Serializable — represent increasing correctness at decreasing concurrency. PostgreSQL's MVCC ensures readers never block writers; its SSI-based Serializable level prevents write skew without heavy locking.

  • Durability guarantees that committed data survives crashes. WAL-based durability means: write the intent to an append-only log, flush with fsync, then acknowledge the commit. On crash, replay the WAL from the last checkpoint. PostgreSQL's synchronous_commit = off and MySQL's innodb_flush_log_at_trx_commit = 2 weaken this guarantee in exchange for throughput — acceptable only when bounded data loss is an acceptable trade-off.

  • The four properties work as a unit. Losing one breaks the model. Atomicity without durability can produce phantom commits. Isolation without consistency can allow anomalous states to become durable. Consistency without atomicity can produce partial constraint violations.

  • ACID vs. BASE is not a safety vs. performance binary. It is a correctness vs. availability trade-off under partition. Use ACID for financial, inventory, and reservation workloads. Use BASE for social timelines, analytics events, and recommendation pipelines where bounded staleness is acceptable.

  • Know your database's defaults. PostgreSQL and Oracle default to Read Committed. MySQL InnoDB defaults to Repeatable Read. SQL Server defaults to Read Committed with locking (not MVCC). These defaults have direct implications for the anomalies your application must guard against at the application layer.


📝 Practice Quiz: ACID Properties in SQL Databases

  1. A transfer($500, account_A, account_B) transaction debits account_A successfully, then the application server crashes before crediting account_B. PostgreSQL is running with default settings. What state does the database contain after crash recovery, and which ACID property is responsible?

    • A) account_A is debited $500, account_B is unchanged — partial write is durable
    • B) Both account_A and account_B are unchanged — WAL rollback restored the pre-transaction state
    • C) Both changes are applied because the WAL replay re-executes all SQL
    • D) The outcome depends on whether the transaction used a SAVEPOINT
    • Correct Answer: B
  2. Which of the following database configurations in PostgreSQL is most likely to cause on-disk data corruption (not just data loss) after an unclean shutdown?

    • A) synchronous_commit = off
    • B) fsync = off
    • C) wal_sync_method = fdatasync
    • D) default_transaction_isolation = 'read committed'
    • Correct Answer: B
  3. A CHECK (balance >= 0) constraint is defined on the accounts table. Transaction T1 reads account_A balance as $100 and issues UPDATE accounts SET balance = balance - 150 WHERE id = 'account_A'. What happens?

    • A) The update succeeds and balance becomes -$50, violating the constraint
    • B) The database raises a constraint violation error and rolls back the transaction
    • C) The database silently clamps the balance to 0
    • D) The behavior depends on the isolation level in use
    • Correct Answer: B
  4. PostgreSQL is running at READ COMMITTED isolation. Transaction T1 reads row R at timestamp T_start. Transaction T2 updates row R and commits at T_start + 100ms. T1 reads row R again at T_start + 200ms. What does T1 see on the second read?

    • A) The original value from T_start (Repeatable Read behavior)
    • B) T2's committed value (non-repeatable read — permitted at Read Committed)
    • C) An error because T2 has modified R
    • D) The result depends on whether T1 holds a shared lock on R
    • Correct Answer: B
  5. MySQL InnoDB is configured with innodb_flush_log_at_trx_commit = 2. A transaction commits successfully and the client receives acknowledgment. The operating system crashes 500ms later. What is the expected data loss?

    • A) No data loss — setting 2 provides full durability
    • B) Up to 1 second of committed transactions may be lost because the redo log was only in the OS cache
    • C) All data since the last InnoDB checkpoint is lost
    • D) Only the last committed transaction is lost
    • Correct Answer: B
  6. An engineer sets synchronous_commit = off in PostgreSQL for a high-throughput session logging service. A crash occurs. What is the correct characterization of the outcome?

    • A) The database is corrupted and requires a full restore
    • B) Up to wal_writer_delay (default 200ms) of acknowledged commits may be missing; no corruption, no partial writes
    • C) All commits since the last checkpoint are lost
    • D) The database refuses to restart because WAL is incomplete
    • Correct Answer: B
  7. What is the key difference between the "C" in ACID and the "C" in the CAP theorem?

    • A) They are identical — both refer to the guarantee that all reads return the most recent write
    • B) ACID Consistency means constraint integrity within a single database; CAP Consistency means linearizability across distributed nodes
    • C) CAP Consistency is a stronger guarantee than ACID Consistency
    • D) ACID Consistency applies only to relational databases; CAP Consistency applies only to NoSQL
    • Correct Answer: B
  8. A PostgreSQL table has a FOREIGN KEY constraint declared as DEFERRABLE INITIALLY DEFERRED. Transaction T1 inserts a child row referencing a parent row that does not yet exist, then inserts the parent row, then commits. What happens?

    • A) The first insert immediately fails with a foreign key violation
    • B) The foreign key is evaluated only at COMMIT time; since the parent exists by then, the transaction succeeds
    • C) The transaction succeeds but the foreign key constraint is permanently disabled
    • D) The behavior is undefined — deferred constraints are not supported in PostgreSQL
    • Correct Answer: B
  9. A developer migrates an application from MySQL InnoDB (default Repeatable Read) to PostgreSQL (default Read Committed). Which anomaly is the application now exposed to that it was previously protected from?

    • A) Dirty reads — PostgreSQL permits dirty reads at Read Committed
    • B) Phantom reads — PostgreSQL Read Committed allows new rows to appear in repeated range queries within a transaction
    • C) Non-repeatable reads — a row read twice in the same transaction may return different values at Read Committed
    • D) Write skew — Repeatable Read does not prevent write skew in MySQL InnoDB
    • Correct Answer: C
  10. (Open-ended challenge) A financial platform runs PostgreSQL with default_transaction_isolation = 'read committed' and synchronous_commit = on. The engineering team proposes setting synchronous_commit = off for the payment confirmation write path — arguing that the 200ms loss window is acceptable because "payments are replicated to a standby anyway." Evaluate this argument. What failure scenarios does it fail to account for? What would you change about the proposal, and under what conditions — if any — would synchronous_commit = off be an acceptable configuration for a payment write path?



Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms