Event Sourcing Pattern: Auditability, Replay, and Evolution of Domain State
Persist domain facts as immutable events and rebuild state predictably under change.
Abstract AlgorithmsAI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: Event sourcing pays off when regulatory audit history and replay are first-class requirements โ but it demands strict schema evolution, a snapshot strategy, and a framework that owns aggregate lifecycle. Spring Boot + Axon Framework is the fastest production-grade path on the JVM.
๐ Why Storing Events Instead of State Changes Everything
In 2017, a GitLab database administrator ran rm -rf on the wrong production server. They had no event log โ just nightly snapshots. Six hours of user data was lost permanently, and thousands of repositories were irrecoverable. Event sourcing would have made full replay possible from any point in that six-hour window. That one architectural choice โ append events instead of overwriting state โ is the difference between "we can restore to any second" and "we lost six hours and cannot get them back."
Most databases store the current state of a record.A subscription row has a status column. When billing suspends the account, you overwrite ACTIVE with SUSPENDED. Done โ but the why, when, and sequence of transitions that led there are gone.
Event sourcing flips the model. Instead of storing the latest snapshot of truth, you store every domain event that caused a state change as an append-only log. Current state is derived on demand by replaying those events in sequence. The log is the audit trail โ not a derived artefact built on top of it.
| Aspect | Traditional CRUD | Event Sourcing |
| What is stored | Current row state | Ordered sequence of immutable events |
| Audit history | Requires separate audit table | Built-in โ the event log is the record |
| Temporal queries | Difficult without CDC or snapshots | Replay the stream to any past position |
| Concurrent writes | Last-write-wins risk without care | Optimistic concurrency on stream version |
| Schema evolution | ALTER TABLE migrations | Event upcasting at read time |
You gain a tamper-evident fact log, time-travel queries, and decoupled read models. You give up simple SELECT * queries and accept the operational cost of snapshot management and schema versioning.
๐ The Four Building Blocks of an Event-Sourced System
Every production event-sourced system has four roles:
- Command โ an intent to change state; validated against current aggregate state before writing.
- Aggregate โ the consistency boundary; enforces invariants, emits events, and advances its internal state machine.
- Event Store โ the append-only log; events are immutable, each aggregate instance owns a stream by ID.
- Projection โ a read model rebuilt from the event stream; projections are disposable and always rebuildable.
โ๏ธ How a Command Flows into an Auditable Event Stream
flowchart TD
C[Client Command] --> CH[Command Handler (SubscriptionAggregate)]
CH -->|"validates invariants applies event"| ES[(Event Store Append-Only Log)]
ES -->|"event published on event bus"| P[BillingHistoryProjection (Event Handler)]
P --> QM[(Query Model BillingHistoryRepository)]
QM -->|"query response"| Q[GetBillingHistoryQuery]
ES -. "token-based replay" .-> RP[Replay Processor (TrackingEventProcessor)]
RP -. "rebuilds view for audit dispute" .-> QM
style ES fill:#f5f5f5,stroke:#555
style QM fill:#e8f4e8,stroke:#555
style RP fill:#fff3e0,stroke:#f90,stroke-dasharray: 5 5
Solid arrows show the live command path. Dashed arrows show replay โ the TrackingEventProcessor resets its token to reconstruct the query model for audit at any historical timestamp.
The aggregate never writes directly to the query model. It emits events; projections consume them independently. A new projection โ say, a fraud-detection read model โ can be added without touching existing aggregate code.
๐ Event-Sourcing Data Flow Overview
flowchart TD
CMD[Command] --> AGG[Aggregate validates invariants]
AGG -->|"emit event"| ES[(Event Store append-only)]
ES -->|"project"| RM[Read Model]
ES -. "replay" .-> AUDIT[Audit View]
This diagram shows the two primary data flows in an event-sourced system: the live command path where a Command drives an Aggregate to emit events into the append-only Event Store, which then projects a Read Model; and the dashed replay path where the same Event Store replays historical events to reconstruct an Audit View. The aggregate never writes directly to the read model, keeping write and read concerns fully separated. The key takeaway is that the Event Store is the single source of truth โ both the current state and any past state are derivable from it at any time.
๐ Event Write: Command to Store
sequenceDiagram
participant C as Client
participant Agg as Aggregate
participant ES as EventStore
participant Proj as Projection
C->>Agg: Send Command
Agg->>Agg: Validate & decide
Agg->>ES: Append events
ES-->>Agg: Events persisted
ES->>Proj: Notify new events
Proj->>Proj: Rebuild read model
Proj-->>C: Updated state
This sequence diagram zooms into the synchronous command path: the Client sends a command, the Aggregate validates it and decides to emit events, the EventStore appends them (enforcing version-based optimistic locking), and the Projection rebuilds the read model before confirming the updated state back to the Client. The notify step from EventStore to Projection can be synchronous or via an event bus depending on consistency requirements. The key takeaway is that the Aggregate never reads from the read model โ it only applies commands to its own event history, keeping the write path free of read-side dependencies.
๐ง Deep Dive: Inside the Aggregate: State Machines, Snapshots, and Schema Evolution
Internals: Aggregate State Reconstruction
An aggregate's state exists only in memory during command processing. Before handling a command, the framework loads the aggregate by replaying every past event for that aggregate ID in sequence. Each @EventSourcingHandler method advances internal state โ status flags, counters, IDs โ until the aggregate is fully current. The command handler then checks invariants against that reconstructed in-memory state.
This is powerful but carries a cost: if a subscription has 5,000 events, loading it means replaying 5,000 events before each command. Snapshots solve this. A snapshot captures the full aggregate state at event N; the next load starts from the snapshot and replays only the delta after N.
Schema Evolution Through Upcasting
Events are immutable, but their schemas change. Old stored events must be upcasted โ transformed at read time into the new schema without modifying stored data. Axon's EventUpcasterChain handles this transparently. The rule: always deploy upcasters before deploying new event versions.
Performance Analysis: Replay Cost Drivers
| Factor | Impact | Mitigation |
| Event stream length | Linear aggregate load time | Snapshot every N events |
| Projection rebuild | Full event store scan | Token-based reset with parallel threads |
| Upcaster chain depth | CPU overhead at deserialization | Keep upcasters thin; version events early |
| Projection lag | Stale reads during backfill | Monitor processor lag; dedicate a shadow DB for replay |
๐ ๏ธ Axon Framework and EventStoreDB: Event Sourcing on the JVM
Axon Framework is a Spring Boot-native Java framework that manages the full event-sourcing lifecycle: aggregate command handling, event persistence, snapshotting, replay, upcasting, and projection tracking. EventStoreDB is a purpose-built append-only database with server-side projections and persistent subscription support โ the recommended backend for production Axon deployments requiring audit-grade storage.
These tools solve the event-sourcing problem by owning the infrastructure that makes aggregates deterministic: Axon's @CommandHandler / @EventSourcingHandler pattern enforces the strict separation between command validation and state mutation; the TrackingEventProcessor manages checkpoints and replay; the EventUpcasterChain handles schema evolution transparently. Teams write domain logic; Axon owns the replay machinery.
The complete SubscriptionAggregate, BillingHistoryProjection, snapshot configuration, and replay code are shown in the ๐งช Subscription Billing section below. The minimal starting dependency:
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.3</version>
</dependency>
<!-- Optional: EventStoreDB connector replaces the default JPA event store -->
<dependency>
<groupId>org.axonframework.extensions.eventstored</groupId>
<artifactId>axon-eventstoredb-spring-boot-starter</artifactId>
<version>0.1.0</version>
</dependency>
| Framework | Strengths | Best fit |
| Axon Framework (Spring Boot) | Spring-native, full ES + CQRS lifecycle, built-in snapshots, replay, and upcasting | Enterprise Spring Boot teams wanting all pieces integrated |
| EventStoreDB Java client | Purpose-built append-only store, server-side projections, excellent audit semantics | Teams that want a best-in-class store and will wire their own projections |
| Spring Data + custom event table | Lightweight, no new infrastructure; PostgreSQL append-only event table with Outbox | Simple domains; teams wary of framework lock-in |
| Lagom (Akka-based) | Reactive, high throughput, persistent entities, cluster sharding | High-concurrency JVM services already on the Akka stack |
For a full deep-dive on Axon Framework and EventStoreDB in production, a dedicated follow-up post is planned.
๐ Real-World Applications
Event sourcing earns its complexity where audit trails and replay are first-class business requirements.
| Company / Industry | Driver | Event sourcing advantage |
| LMAX Exchange โ finance | 6M+ orders/sec with full regulatory audit | Replay market state to any timestamp for regulators |
| Shopify โ e-commerce | Fraud investigation, inventory disputes | Replay order event stream to exact inventory at purchase time |
| Healthcare systems | Consent tracking, patient record disputes | Immutable facts with time-travel replay; no separate audit table |
| Insurance | Claims and policy versioning | Full decision trail; compensation events on reversals |
๐งช Subscription Billing: Building the Aggregate, Projection, and Replay
Scenario: A billing platform tracks the lifecycle of each subscription โ CREATED โ ACTIVATED โ SUSPENDED โ CANCELLED. Every state transition is an immutable domain event appended to the subscription's event stream. When a customer disputes a charge, the support team replays the event stream to reconstruct exactly what the account looked like at the moment of the disputed transaction.
Maven Dependency
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.3</version>
</dependency>
Domain Events (Immutable Value Objects)
public record SubscriptionCreatedEvent(
String subscriptionId, String tenantId, String planId, Instant occurredAt) {}
public record SubscriptionActivatedEvent(
String subscriptionId, String tenantId, Instant occurredAt) {}
public record SubscriptionSuspendedEvent(
String subscriptionId, String tenantId, String reason, Instant occurredAt) {}
public record SubscriptionCancelledEvent(
String subscriptionId, String tenantId, String reason, Instant occurredAt) {}
Each event is a value object with no setters. The aggregate assigns IDs and timestamps at the command-handler boundary โ events never generate their own identity.
SubscriptionAggregate
@Aggregate(snapshotTriggerDefinition = "subscriptionSnapshotTrigger")
public class SubscriptionAggregate {
@AggregateIdentifier
private String subscriptionId;
private SubscriptionStatus status;
private String tenantId;
protected SubscriptionAggregate() {} // required by Axon for event-sourced replay
@CommandHandler
public SubscriptionAggregate(CreateSubscriptionCommand cmd) {
AggregateLifecycle.apply(new SubscriptionCreatedEvent(
cmd.subscriptionId(), cmd.tenantId(), cmd.planId(), Instant.now()));
}
@EventSourcingHandler
public void on(SubscriptionCreatedEvent event) {
this.subscriptionId = event.subscriptionId();
this.tenantId = event.tenantId();
this.status = SubscriptionStatus.CREATED;
}
@CommandHandler
public void handle(SuspendSubscriptionCommand cmd) {
if (status != SubscriptionStatus.ACTIVE) {
throw new IllegalStateException(
"Only ACTIVE subscriptions can be suspended; current status: " + status);
}
AggregateLifecycle.apply(new SubscriptionSuspendedEvent(
subscriptionId, tenantId, cmd.reason(), Instant.now()));
}
@EventSourcingHandler
public void on(SubscriptionSuspendedEvent event) {
this.status = SubscriptionStatus.SUSPENDED;
}
}
@CommandHandler enforces invariants then calls AggregateLifecycle.apply(). @EventSourcingHandler is the only place state is mutated โ this strict separation is why replay is always deterministic regardless of how many times it runs.
Snapshot Configuration โ Preventing Cold-Start Replay Tax
@Configuration
public class AxonConfig {
@Bean
public SnapshotTriggerDefinition subscriptionSnapshotTrigger(Snapshotter snapshotter) {
// Capture a snapshot after every 50 events.
// Next load starts from the snapshot and replays only the delta (โค 49 events).
return new EventCountSnapshotTriggerDefinition(snapshotter, 50);
}
}
Without snapshots, a subscription with 500 billing events pays a 500-event replay cost on every command. With a threshold of 50, the worst-case delta is 49 events.
BillingHistoryProjection โ Read Model and Audit Query Handler
@Component
@ProcessingGroup("billing-history")
public class BillingHistoryProjection {
private final BillingHistoryRepository repo;
public BillingHistoryProjection(BillingHistoryRepository repo) {
this.repo = repo;
}
@EventHandler
public void on(SubscriptionCreatedEvent event, @Timestamp Instant eventTimestamp) {
repo.save(new BillingHistoryEntry(
event.subscriptionId(), event.tenantId(),
"CREATED", event.planId(), eventTimestamp));
}
@EventHandler
public void on(SubscriptionSuspendedEvent event, @Timestamp Instant eventTimestamp) {
repo.updateStatus(
event.subscriptionId(), "SUSPENDED", event.reason(), eventTimestamp);
}
@QueryHandler
public List<BillingHistoryEntry> handle(GetBillingHistoryQuery query) {
return repo.findBySubscriptionId(query.subscriptionId());
}
}
Every @EventHandler must be idempotent โ replay will call these methods again during incident recovery and projection refactors. Use upsert semantics keyed on the event's sequence number to guarantee safety.
Replaying the Event Stream for Audit Disputes
When a customer disputes a charge and the team needs the account state at a specific past timestamp, reset the projection's tracking token to replay from the event store:
// Reset and replay the billing-history projection from the beginning of the event store
eventProcessingConfig
.eventProcessorByProcessingGroup("billing-history", TrackingEventProcessor.class)
.ifPresent(processor -> {
processor.shutDown();
processor.resetTokens(); // replays all events in stream order
processor.start();
});
To scope the replay to a specific timestamp window, filter inside the @EventHandler by comparing the injected @Timestamp Instant against the dispute window before persisting. The event store is immutable โ replay always produces the same result, making it a reliable audit mechanism.
๐ Time-Travel Replay
sequenceDiagram
participant Q as QueryClient
participant ES as EventStore
participant Agg as Aggregate
Q->>ES: Query events from t0 to t1
ES-->>Q: Return event slice
Q->>Agg: Replay events t0t1
Agg->>Agg: Apply each event
Agg-->>Q: Past state at t1
Note over Q,Agg: Audit snapshot restored
This sequence diagram illustrates time-travel replay: a QueryClient requests the event slice from t0 to t1, the EventStore returns exactly those events, and the Aggregate replays them in order to reconstruct the system's state at that historical point. Because the Event Store is append-only and immutable, replaying the same window always produces the same result โ making it a reliable foundation for audit investigations and dispute resolution. The key takeaway is that point-in-time state reconstruction is a first-class capability of event sourcing, not a special-case workaround.
โ๏ธ Trade-offs & Failure Modes in Practice
| Failure mode | Symptom | Root cause | First mitigation |
| Long aggregate streams | High command latency on warm-up | No snapshot strategy | Add EventCountSnapshotTriggerDefinition |
| Incompatible old events | ClassCastException during replay after deploy | Schema changed without upcaster | Add SingleEventUpcaster before deploying new event version |
| Projection lag under load | Stale reads; audit disputes on in-flight data | Insufficient processor threads | Increase TrackingEventProcessor thread count |
| Unbounded event store growth | Storage cost; slow tail scans | No retention or archival policy | Archive cold streams; keep hot window in fast storage tier |
| Non-idempotent projection | Duplicate rows after replay | @EventHandler not safe to call twice | Use upsert keyed on aggregate ID + event sequence number |
๐งญ Decision Guide: When Event Sourcing Earns Its Complexity
| Situation | Recommendation |
| Regulatory audit trail required (finance, healthcare, insurance) | Strong fit โ the event log is the compliance record |
| Temporal queries: "what was the state at time T?" | Strong fit โ replay to any past stream position |
| Simple CRUD with no audit or replay requirements | Avoid โ operational overhead is not justified |
| High write throughput (>10k events/sec per stream) | Use with caution โ partition streams; evaluate Axon Server |
| Team unfamiliar with CQRS and aggregate design | Run EventStorming workshops and model the domain first |
๐ง Operator Field Note: Three Production Realities
1. Snapshot monitoring. Track axon_command_bus_handler_latency_seconds per aggregate type. Climbing latency with aggregate age signals snapshots are not firing. Query DomainEventEntry sorted by event count to find outliers.
2. Schema-incompatible old events. A ClassCastException during replay almost always means a missing upcaster. Safe sequence: write a SingleEventUpcaster (V1 โ V2), deploy it before the new event version, then deploy the aggregate code. Never modify stored events in place.
3. Isolated projection replay. Each @ProcessingGroup owns its own tracking token. Resetting billing-history leaves all other processors unaffected. Route audit queries to a dedicated shadow query model so live billing traffic is never blocked during replay.
๐ Hard-Won Lessons from Production Event-Sourced Systems
- Design events for readers, not writers. Rich, self-describing payloads survive upcasting; terse internal codes do not.
- Snapshots are not optional at scale. An aggregate with 1,000 events pays a 1,000-event replay cost on every command without one. Define your threshold before going live.
- Idempotent projections are mandatory. Every
@EventHandlermust be safe to call twice โ replay occurs during incident recovery and schema migration. - Schema evolution is the hardest operational problem. Deploy upcasters before new event versions, never after.
- Replay is a first-class feature. Use it for analytics backfills, fraud investigation, and projection refactors.
๐ TLDR: Summary & Key Takeaways
- Event sourcing stores immutable domain facts rather than mutable state rows; current state is always derivable by replaying the event log in order.
- Aggregates are deterministic state machines:
@CommandHandlerenforces invariants;@EventSourcingHandlermutates state โ nowhere else. This separation makes replay reliable. - Snapshots are essential for long-lived aggregates. Without them, command latency grows linearly with stream length.
- Projections are disposable read models. Because the event store is the source of truth, any query model can be rebuilt from the log at any time โ including for historical audit.
- Schema evolution requires upcasters. Deploy the upcaster before the new event version; test replay in staging before promoting to production.
- Audit trails, temporal queries, and replay-based dispute resolution are built-in features โ not bolt-ons.
๐ Related Posts
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)
TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer โ 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2ร A100 80 GB instead of 8...
Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs
TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...
Watermarking and Late Data Handling in Spark Structured Streaming
TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
