All Posts

Event Sourcing Pattern: Auditability, Replay, and Evolution of Domain State

Persist domain facts as immutable events and rebuild state predictably under change.

Abstract AlgorithmsAbstract Algorithms
ยทยท15 min read

AI-assisted content.

TLDR: Event sourcing pays off when regulatory audit history and replay are first-class requirements โ€” but it demands strict schema evolution, a snapshot strategy, and a framework that owns aggregate lifecycle. Spring Boot + Axon Framework is the fastest production-grade path on the JVM.

๐Ÿ“– Why Storing Events Instead of State Changes Everything

In 2017, a GitLab database administrator ran rm -rf on the wrong production server. They had no event log โ€” just nightly snapshots. Six hours of user data was lost permanently, and thousands of repositories were irrecoverable. Event sourcing would have made full replay possible from any point in that six-hour window. That one architectural choice โ€” append events instead of overwriting state โ€” is the difference between "we can restore to any second" and "we lost six hours and cannot get them back."

Most databases store the current state of a record.A subscription row has a status column. When billing suspends the account, you overwrite ACTIVE with SUSPENDED. Done โ€” but the why, when, and sequence of transitions that led there are gone.

Event sourcing flips the model. Instead of storing the latest snapshot of truth, you store every domain event that caused a state change as an append-only log. Current state is derived on demand by replaying those events in sequence. The log is the audit trail โ€” not a derived artefact built on top of it.

AspectTraditional CRUDEvent Sourcing
What is storedCurrent row stateOrdered sequence of immutable events
Audit historyRequires separate audit tableBuilt-in โ€” the event log is the record
Temporal queriesDifficult without CDC or snapshotsReplay the stream to any past position
Concurrent writesLast-write-wins risk without careOptimistic concurrency on stream version
Schema evolutionALTER TABLE migrationsEvent upcasting at read time

You gain a tamper-evident fact log, time-travel queries, and decoupled read models. You give up simple SELECT * queries and accept the operational cost of snapshot management and schema versioning.

๐Ÿ” The Four Building Blocks of an Event-Sourced System

Every production event-sourced system has four roles:

  1. Command โ€” an intent to change state; validated against current aggregate state before writing.
  2. Aggregate โ€” the consistency boundary; enforces invariants, emits events, and advances its internal state machine.
  3. Event Store โ€” the append-only log; events are immutable, each aggregate instance owns a stream by ID.
  4. Projection โ€” a read model rebuilt from the event stream; projections are disposable and always rebuildable.

โš™๏ธ How a Command Flows into an Auditable Event Stream

flowchart TD
    C[Client Command] --> CH[Command Handler (SubscriptionAggregate)]
    CH -->|"validates invariants applies event"| ES[(Event Store Append-Only Log)]
    ES -->|"event published on event bus"| P[BillingHistoryProjection (Event Handler)]
    P --> QM[(Query Model BillingHistoryRepository)]
    QM -->|"query response"| Q[GetBillingHistoryQuery]

    ES -. "token-based replay" .-> RP[Replay Processor (TrackingEventProcessor)]
    RP -. "rebuilds view for audit dispute" .-> QM

    style ES fill:#f5f5f5,stroke:#555
    style QM fill:#e8f4e8,stroke:#555
    style RP fill:#fff3e0,stroke:#f90,stroke-dasharray: 5 5

Solid arrows show the live command path. Dashed arrows show replay โ€” the TrackingEventProcessor resets its token to reconstruct the query model for audit at any historical timestamp.

The aggregate never writes directly to the query model. It emits events; projections consume them independently. A new projection โ€” say, a fraud-detection read model โ€” can be added without touching existing aggregate code.

๐Ÿ“Š Event-Sourcing Data Flow Overview

flowchart TD
    CMD[Command] --> AGG[Aggregate validates invariants]
    AGG -->|"emit event"| ES[(Event Store append-only)]
    ES -->|"project"| RM[Read Model]
    ES -. "replay" .-> AUDIT[Audit View]

This diagram shows the two primary data flows in an event-sourced system: the live command path where a Command drives an Aggregate to emit events into the append-only Event Store, which then projects a Read Model; and the dashed replay path where the same Event Store replays historical events to reconstruct an Audit View. The aggregate never writes directly to the read model, keeping write and read concerns fully separated. The key takeaway is that the Event Store is the single source of truth โ€” both the current state and any past state are derivable from it at any time.

๐Ÿ“Š Event Write: Command to Store

sequenceDiagram
    participant C as Client
    participant Agg as Aggregate
    participant ES as EventStore
    participant Proj as Projection
    C->>Agg: Send Command
    Agg->>Agg: Validate & decide
    Agg->>ES: Append events
    ES-->>Agg: Events persisted
    ES->>Proj: Notify new events
    Proj->>Proj: Rebuild read model
    Proj-->>C: Updated state

This sequence diagram zooms into the synchronous command path: the Client sends a command, the Aggregate validates it and decides to emit events, the EventStore appends them (enforcing version-based optimistic locking), and the Projection rebuilds the read model before confirming the updated state back to the Client. The notify step from EventStore to Projection can be synchronous or via an event bus depending on consistency requirements. The key takeaway is that the Aggregate never reads from the read model โ€” it only applies commands to its own event history, keeping the write path free of read-side dependencies.

๐Ÿง  Deep Dive: Inside the Aggregate: State Machines, Snapshots, and Schema Evolution

Internals: Aggregate State Reconstruction

An aggregate's state exists only in memory during command processing. Before handling a command, the framework loads the aggregate by replaying every past event for that aggregate ID in sequence. Each @EventSourcingHandler method advances internal state โ€” status flags, counters, IDs โ€” until the aggregate is fully current. The command handler then checks invariants against that reconstructed in-memory state.

This is powerful but carries a cost: if a subscription has 5,000 events, loading it means replaying 5,000 events before each command. Snapshots solve this. A snapshot captures the full aggregate state at event N; the next load starts from the snapshot and replays only the delta after N.

Schema Evolution Through Upcasting

Events are immutable, but their schemas change. Old stored events must be upcasted โ€” transformed at read time into the new schema without modifying stored data. Axon's EventUpcasterChain handles this transparently. The rule: always deploy upcasters before deploying new event versions.

Performance Analysis: Replay Cost Drivers

FactorImpactMitigation
Event stream lengthLinear aggregate load timeSnapshot every N events
Projection rebuildFull event store scanToken-based reset with parallel threads
Upcaster chain depthCPU overhead at deserializationKeep upcasters thin; version events early
Projection lagStale reads during backfillMonitor processor lag; dedicate a shadow DB for replay

๐Ÿ› ๏ธ Axon Framework and EventStoreDB: Event Sourcing on the JVM

Axon Framework is a Spring Boot-native Java framework that manages the full event-sourcing lifecycle: aggregate command handling, event persistence, snapshotting, replay, upcasting, and projection tracking. EventStoreDB is a purpose-built append-only database with server-side projections and persistent subscription support โ€” the recommended backend for production Axon deployments requiring audit-grade storage.

These tools solve the event-sourcing problem by owning the infrastructure that makes aggregates deterministic: Axon's @CommandHandler / @EventSourcingHandler pattern enforces the strict separation between command validation and state mutation; the TrackingEventProcessor manages checkpoints and replay; the EventUpcasterChain handles schema evolution transparently. Teams write domain logic; Axon owns the replay machinery.

The complete SubscriptionAggregate, BillingHistoryProjection, snapshot configuration, and replay code are shown in the ๐Ÿงช Subscription Billing section below. The minimal starting dependency:

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>
<!-- Optional: EventStoreDB connector replaces the default JPA event store -->
<dependency>
  <groupId>org.axonframework.extensions.eventstored</groupId>
  <artifactId>axon-eventstoredb-spring-boot-starter</artifactId>
  <version>0.1.0</version>
</dependency>
FrameworkStrengthsBest fit
Axon Framework (Spring Boot)Spring-native, full ES + CQRS lifecycle, built-in snapshots, replay, and upcastingEnterprise Spring Boot teams wanting all pieces integrated
EventStoreDB Java clientPurpose-built append-only store, server-side projections, excellent audit semanticsTeams that want a best-in-class store and will wire their own projections
Spring Data + custom event tableLightweight, no new infrastructure; PostgreSQL append-only event table with OutboxSimple domains; teams wary of framework lock-in
Lagom (Akka-based)Reactive, high throughput, persistent entities, cluster shardingHigh-concurrency JVM services already on the Akka stack

For a full deep-dive on Axon Framework and EventStoreDB in production, a dedicated follow-up post is planned.

๐ŸŒ Real-World Applications

Event sourcing earns its complexity where audit trails and replay are first-class business requirements.

Company / IndustryDriverEvent sourcing advantage
LMAX Exchange โ€” finance6M+ orders/sec with full regulatory auditReplay market state to any timestamp for regulators
Shopify โ€” e-commerceFraud investigation, inventory disputesReplay order event stream to exact inventory at purchase time
Healthcare systemsConsent tracking, patient record disputesImmutable facts with time-travel replay; no separate audit table
InsuranceClaims and policy versioningFull decision trail; compensation events on reversals

๐Ÿงช Subscription Billing: Building the Aggregate, Projection, and Replay

Scenario: A billing platform tracks the lifecycle of each subscription โ€” CREATED โ†’ ACTIVATED โ†’ SUSPENDED โ†’ CANCELLED. Every state transition is an immutable domain event appended to the subscription's event stream. When a customer disputes a charge, the support team replays the event stream to reconstruct exactly what the account looked like at the moment of the disputed transaction.

Maven Dependency

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>

Domain Events (Immutable Value Objects)

public record SubscriptionCreatedEvent(
    String subscriptionId, String tenantId, String planId, Instant occurredAt) {}

public record SubscriptionActivatedEvent(
    String subscriptionId, String tenantId, Instant occurredAt) {}

public record SubscriptionSuspendedEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

public record SubscriptionCancelledEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

Each event is a value object with no setters. The aggregate assigns IDs and timestamps at the command-handler boundary โ€” events never generate their own identity.

SubscriptionAggregate

@Aggregate(snapshotTriggerDefinition = "subscriptionSnapshotTrigger")
public class SubscriptionAggregate {

    @AggregateIdentifier
    private String subscriptionId;
    private SubscriptionStatus status;
    private String tenantId;

    protected SubscriptionAggregate() {} // required by Axon for event-sourced replay

    @CommandHandler
    public SubscriptionAggregate(CreateSubscriptionCommand cmd) {
        AggregateLifecycle.apply(new SubscriptionCreatedEvent(
            cmd.subscriptionId(), cmd.tenantId(), cmd.planId(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionCreatedEvent event) {
        this.subscriptionId = event.subscriptionId();
        this.tenantId       = event.tenantId();
        this.status         = SubscriptionStatus.CREATED;
    }

    @CommandHandler
    public void handle(SuspendSubscriptionCommand cmd) {
        if (status != SubscriptionStatus.ACTIVE) {
            throw new IllegalStateException(
                "Only ACTIVE subscriptions can be suspended; current status: " + status);
        }
        AggregateLifecycle.apply(new SubscriptionSuspendedEvent(
            subscriptionId, tenantId, cmd.reason(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionSuspendedEvent event) {
        this.status = SubscriptionStatus.SUSPENDED;
    }
}

@CommandHandler enforces invariants then calls AggregateLifecycle.apply(). @EventSourcingHandler is the only place state is mutated โ€” this strict separation is why replay is always deterministic regardless of how many times it runs.

Snapshot Configuration โ€” Preventing Cold-Start Replay Tax

@Configuration
public class AxonConfig {

    @Bean
    public SnapshotTriggerDefinition subscriptionSnapshotTrigger(Snapshotter snapshotter) {
        // Capture a snapshot after every 50 events.
        // Next load starts from the snapshot and replays only the delta (โ‰ค 49 events).
        return new EventCountSnapshotTriggerDefinition(snapshotter, 50);
    }
}

Without snapshots, a subscription with 500 billing events pays a 500-event replay cost on every command. With a threshold of 50, the worst-case delta is 49 events.

BillingHistoryProjection โ€” Read Model and Audit Query Handler

@Component
@ProcessingGroup("billing-history")
public class BillingHistoryProjection {

    private final BillingHistoryRepository repo;

    public BillingHistoryProjection(BillingHistoryRepository repo) {
        this.repo = repo;
    }

    @EventHandler
    public void on(SubscriptionCreatedEvent event, @Timestamp Instant eventTimestamp) {
        repo.save(new BillingHistoryEntry(
            event.subscriptionId(), event.tenantId(),
            "CREATED", event.planId(), eventTimestamp));
    }

    @EventHandler
    public void on(SubscriptionSuspendedEvent event, @Timestamp Instant eventTimestamp) {
        repo.updateStatus(
            event.subscriptionId(), "SUSPENDED", event.reason(), eventTimestamp);
    }

    @QueryHandler
    public List<BillingHistoryEntry> handle(GetBillingHistoryQuery query) {
        return repo.findBySubscriptionId(query.subscriptionId());
    }
}

Every @EventHandler must be idempotent โ€” replay will call these methods again during incident recovery and projection refactors. Use upsert semantics keyed on the event's sequence number to guarantee safety.

Replaying the Event Stream for Audit Disputes

When a customer disputes a charge and the team needs the account state at a specific past timestamp, reset the projection's tracking token to replay from the event store:

// Reset and replay the billing-history projection from the beginning of the event store
eventProcessingConfig
    .eventProcessorByProcessingGroup("billing-history", TrackingEventProcessor.class)
    .ifPresent(processor -> {
        processor.shutDown();
        processor.resetTokens(); // replays all events in stream order
        processor.start();
    });

To scope the replay to a specific timestamp window, filter inside the @EventHandler by comparing the injected @Timestamp Instant against the dispute window before persisting. The event store is immutable โ€” replay always produces the same result, making it a reliable audit mechanism.

๐Ÿ“Š Time-Travel Replay

sequenceDiagram
    participant Q as QueryClient
    participant ES as EventStore
    participant Agg as Aggregate
    Q->>ES: Query events from t0 to t1
    ES-->>Q: Return event slice
    Q->>Agg: Replay events t0t1
    Agg->>Agg: Apply each event
    Agg-->>Q: Past state at t1
    Note over Q,Agg: Audit snapshot restored

This sequence diagram illustrates time-travel replay: a QueryClient requests the event slice from t0 to t1, the EventStore returns exactly those events, and the Aggregate replays them in order to reconstruct the system's state at that historical point. Because the Event Store is append-only and immutable, replaying the same window always produces the same result โ€” making it a reliable foundation for audit investigations and dispute resolution. The key takeaway is that point-in-time state reconstruction is a first-class capability of event sourcing, not a special-case workaround.

โš–๏ธ Trade-offs & Failure Modes in Practice

Failure modeSymptomRoot causeFirst mitigation
Long aggregate streamsHigh command latency on warm-upNo snapshot strategyAdd EventCountSnapshotTriggerDefinition
Incompatible old eventsClassCastException during replay after deploySchema changed without upcasterAdd SingleEventUpcaster before deploying new event version
Projection lag under loadStale reads; audit disputes on in-flight dataInsufficient processor threadsIncrease TrackingEventProcessor thread count
Unbounded event store growthStorage cost; slow tail scansNo retention or archival policyArchive cold streams; keep hot window in fast storage tier
Non-idempotent projectionDuplicate rows after replay@EventHandler not safe to call twiceUse upsert keyed on aggregate ID + event sequence number

๐Ÿงญ Decision Guide: When Event Sourcing Earns Its Complexity

SituationRecommendation
Regulatory audit trail required (finance, healthcare, insurance)Strong fit โ€” the event log is the compliance record
Temporal queries: "what was the state at time T?"Strong fit โ€” replay to any past stream position
Simple CRUD with no audit or replay requirementsAvoid โ€” operational overhead is not justified
High write throughput (>10k events/sec per stream)Use with caution โ€” partition streams; evaluate Axon Server
Team unfamiliar with CQRS and aggregate designRun EventStorming workshops and model the domain first

๐Ÿ”ง Operator Field Note: Three Production Realities

1. Snapshot monitoring. Track axon_command_bus_handler_latency_seconds per aggregate type. Climbing latency with aggregate age signals snapshots are not firing. Query DomainEventEntry sorted by event count to find outliers.

2. Schema-incompatible old events. A ClassCastException during replay almost always means a missing upcaster. Safe sequence: write a SingleEventUpcaster (V1 โ†’ V2), deploy it before the new event version, then deploy the aggregate code. Never modify stored events in place.

3. Isolated projection replay. Each @ProcessingGroup owns its own tracking token. Resetting billing-history leaves all other processors unaffected. Route audit queries to a dedicated shadow query model so live billing traffic is never blocked during replay.

๐Ÿ“š Hard-Won Lessons from Production Event-Sourced Systems

  • Design events for readers, not writers. Rich, self-describing payloads survive upcasting; terse internal codes do not.
  • Snapshots are not optional at scale. An aggregate with 1,000 events pays a 1,000-event replay cost on every command without one. Define your threshold before going live.
  • Idempotent projections are mandatory. Every @EventHandler must be safe to call twice โ€” replay occurs during incident recovery and schema migration.
  • Schema evolution is the hardest operational problem. Deploy upcasters before new event versions, never after.
  • Replay is a first-class feature. Use it for analytics backfills, fraud investigation, and projection refactors.

๐Ÿ“Œ TLDR: Summary & Key Takeaways

  • Event sourcing stores immutable domain facts rather than mutable state rows; current state is always derivable by replaying the event log in order.
  • Aggregates are deterministic state machines: @CommandHandler enforces invariants; @EventSourcingHandler mutates state โ€” nowhere else. This separation makes replay reliable.
  • Snapshots are essential for long-lived aggregates. Without them, command latency grows linearly with stream length.
  • Projections are disposable read models. Because the event store is the source of truth, any query model can be rebuilt from the log at any time โ€” including for historical audit.
  • Schema evolution requires upcasters. Deploy the upcaster before the new event version; test replay in staging before promoting to production.
  • Audit trails, temporal queries, and replay-based dispute resolution are built-in features โ€” not bolt-ons.
Share

Test Your Knowledge

๐Ÿง 

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms