Webhooks Explained: Don't Call Us, We'll Call You
Polling is slow and wasteful. Webhooks are event-driven callbacks that deliver data the moment something happens.
Abstract Algorithms
TLDR: Webhooks let one system push event data to another the moment something happens. Instead of polling ("anything new?"), you expose an endpoint and the provider POSTs signed event payloads to you in near real-time. The key production requirements: signature verification, idempotency, async processing.
๐ The Basics: HTTP Callbacks and Event-Driven Delivery
A webhook is an HTTP callback. When a specific event occurs, the event source (provider) sends an HTTP POST request to a URL you specify โ your webhook endpoint. No polling, no long connections.
The three-step registration flow:
- You register a URL with the provider (e.g.,
https://your-app.com/webhooks/stripe). - An event occurs on the provider's system (a payment succeeds, a commit is pushed).
- The provider sends a POST with a JSON payload describing the event to your URL.
Your app receives the event nearly instantly โ no repeated requests, no idle network traffic.
What makes webhooks different from REST APIs:
| Dimension | REST API (polling) | Webhook |
| Who initiates? | Your app | The provider |
| Latency | Up to poll interval | Near real-time |
| Network cost at idle | Constant โ requests every N seconds | Zero |
| Reliability | Deterministic | Depends on provider retry policy |
| Scalability | Linear with frequency | Driven by event rate |
The biggest practical implication: with webhooks, you never pay network and compute cost for silence. You only receive traffic when something actually happened.
๐ Stop Polling โ Let the Provider Ring Your Doorbell
Polling means your app repeatedly asks a provider "anything new?" โ burning bandwidth and adding latency whether or not anything changed.
Webhooks invert the call: you register a URL, and the provider calls your endpoint the instant an event occurs.
| Model | Who initiates? | Event latency | Network cost at idle |
| Polling | Your app | Up to poll interval | High (constant requests) |
| Webhook | Provider | Near real-time | Very low |
Analogy: Polling is calling the courier every 5 minutes. A webhook is the courier ringing your doorbell the moment the package arrives.
๐ข What a Webhook Payload Looks Like
Most providers send a structured JSON body over HTTPS POST:
{
"id": "evt_101",
"type": "payment.succeeded",
"created": 1772877602,
"data": {
"transaction_id": "txn_9001",
"amount": 4999,
"currency": "USD"
}
}
Key fields in every real-world webhook:
idโ unique event identifier; use for deduplication.typeโ event name; drives routing in your handler.createdโ Unix timestamp; enables replay-window validation.dataโ the event payload; schema varies by event type.
๐ Webhook Delivery Flow
The complete lifecycle from event to processed business logic:
flowchart TD
A[Event occurs at Provider] --> B[Provider signs payload with HMAC]
B --> C[Provider POSTs to your endpoint over HTTPS]
C --> D{Signature valid?}
D -->|No| E[Return 401 โ reject]
D -->|Yes| F{Event ID already seen?}
F -->|Yes| G[Return 200 โ silently ignore]
F -->|No| H[Persist raw payload to DB]
H --> I[Enqueue job for async worker]
I --> J[Return 200 immediately]
J --> K[Worker executes business logic]
K --> L[Mark event as processed]
The critical design principle: your endpoint's only job is to acknowledge receipt (return 200) as fast as possible. All actual work happens asynchronously. This prevents provider retry storms.
Provider retry schedules (common defaults):
| Provider | Retry count | Retry interval strategy |
| Stripe | Up to 3 days | Exponential backoff |
| GitHub | Up to 3 attempts | Immediate, then 1h, 2h |
| Twilio | Up to 11 hours | Exponential |
| Shopify | Up to 48 hours | Exponential backoff |
Providers consider a delivery successful only when they receive a 2xx response within their timeout window (usually 5โ30 seconds).
โ๏ธ The Production-Safe Webhook Handler
A naive endpoint that just processes inline is dangerous: it creates duplicate actions when providers retry. The correct pattern has 5 ordered steps:
flowchart TD
A[Provider POST] --> B{Valid HMAC
signature?}
B -->|No| C[Return 401]
B -->|Yes| D{Duplicate
event_id?}
D -->|Yes| E[Return 200 โ ignore]
D -->|No| F[Persist event
to durable store]
F --> G[Enqueue for async
worker]
G --> H[Return 200 immediately]
H --> I[Worker processes
business logic]
// Node.js / Express โ production-safe handler skeleton
app.post('/webhooks/provider', express.raw({ type: 'application/json' }), (req, res) => {
const sig = req.header('X-Signature');
if (!isValidHmac(req.body, sig, process.env.WEBHOOK_SECRET)) {
return res.status(401).send('invalid signature');
}
const event = JSON.parse(req.body.toString('utf8'));
if (isDuplicate(event.id)) {
return res.status(200).send('duplicate ignored');
}
persistEvent(event); // write to DB before ack
enqueueEvent(event); // hand off to async worker
return res.status(200).send('accepted');
});
Why return 200 before processing? Most providers retry if they don't receive a timely 2xx. If your business logic runs inline and takes too long, the same event fires twice.
๐ง Deep Dive: Why At-Least-Once Delivery Demands Idempotency
Webhook providers guarantee delivery by retrying on failure โ not by ensuring exactly-once. A network timeout after your handler processes an event but before returning 200 causes the same event to arrive again. Idempotency means processing an event twice produces the same result as once. The key: store event.id on first receipt and reject any event whose ID already exists in your store before executing any business logic.
๐ Real-World Applications: Where Webhooks Power Real Systems
| Domain | Provider | Event examples |
| Payments | Stripe, PayPal | payment.succeeded, refund.created, dispute.opened |
| CI/CD | GitHub, GitLab | push, pull_request, deployment_status |
| Customer messaging | Twilio, Slack | message.received, channel.created |
| SaaS integrations | HubSpot, Salesforce | contact.created, deal.updated |
| Infrastructure | PagerDuty, Datadog | alert.triggered, incident.resolved |
โ๏ธ Trade-offs & Failure Modes: Failure Modes You Must Defend Against
Most webhook providers use at-least-once delivery โ duplicates are normal, ordering is not guaranteed.
| Failure mode | Symptom | Root cause | Fix |
| Duplicate processing | Double charge or duplicate action | Provider retry after network timeout | Idempotency key on event.id |
| Signature failure spike | Many 401 responses | Secret mismatch or clock drift | Secret rotation with overlap window + NTP |
| Queue backlog | Delayed domain updates | Worker under-capacity | Autoscale workers; backpressure control |
| Silent data loss | Missing domain updates | Returned 200 before persisting | Persist first, then ack |
| Replay storm | Millions of old events flood handler | Misconfigured replay | Timestamp window validation (reject events > 5 min old) |
๐ Webhook Delivery Sequence
sequenceDiagram
participant PS as Provider System
participant WH as Webhook Endpoint
participant Q as Job Queue
participant W as Worker
PS->>PS: event occurs (payment.succeeded)
PS->>WH: HTTP POST signed payload
WH->>WH: validate HMAC signature
WH->>WH: check idempotency (event.id seen?)
WH->>Q: enqueue job
WH-->>PS: 200 OK (fast ack)
Q->>W: dispatch job
W->>W: execute business logic
The sequence above traces the complete lifecycle of a single webhook event from provider to worker. Notice the three critical checkpoints in the endpoint handler โ signature validation, idempotency check, and job enqueue โ all executed before returning the fast 200 OK acknowledgement. The key takeaway is that business logic never runs inside the HTTP handler itself; it is always delegated to an async worker, keeping the endpoint fast and the provider's retry counter at zero.
๐ Webhook Retry on Failure
sequenceDiagram
participant P as Provider
participant E as Endpoint
P->>E: HTTP POST (attempt 1)
E-->>P: 500 Internal Server Error
Note over P: wait 30s (exponential backoff)
P->>E: HTTP POST (attempt 2)
E-->>P: 500 Internal Server Error
Note over P: wait 60s
P->>E: HTTP POST (attempt 3)
E-->>P: 200 OK
Note over P: delivery confirmed
Estimating worker capacity:
$$ ext{workers needed} pprox \lambda \cdot T$$
where $\lambda$ = incoming events/sec, $T$ = average processing time (sec). Add 2ร safety margin for burst.
๐งช Practical: Setting Up and Testing Webhooks
Local development with a tunnel:
Production webhook endpoints must be publicly reachable over HTTPS. During development, use a tunnel to expose your local server:
# Using ngrok โ creates a public HTTPS URL for localhost:3000
ngrok http 3000
# Forwarding: https://abc123.ngrok.io -> http://localhost:3000
Register the ngrok URL with your provider's webhook settings. Events now flow to your local development server.
Testing your handler:
# Replay a real event from Stripe's dashboard
stripe events resend evt_1234567890
# Trigger a test event via CLI
stripe trigger payment_intent.succeeded
Verifying your HMAC implementation is correct:
const crypto = require('crypto');
function isValidHmac(rawBody, signatureHeader, secret) {
const expected = crypto
.createHmac('sha256', secret)
.update(rawBody) // raw Buffer, not parsed JSON
.digest('hex');
// Use timingSafeEqual to prevent timing attacks
return crypto.timingSafeEqual(
Buffer.from(expected, 'hex'),
Buffer.from(signatureHeader.replace('sha256=', ''), 'hex')
);
}
Three common configuration mistakes:
- Parsing JSON before computing HMAC. The signature is computed over the raw bytes. If you call
JSON.parse()first, the serialized output may differ from the original bytes and the signature will never match. - Logging the raw secret. Treat
WEBHOOK_SECRETlike a password. Never log it, never commit it, rotate it if exposed. - Using a short replay window. A 5-minute timestamp window blocks most replay attacks. A 24-hour window does not.
๐งญ Decision Guide: When Webhooks Are (and Aren't) the Answer
| Situation | Recommendation |
| Need instant event-driven updates with low idle load | Webhooks โ the right default |
| Cannot expose a public HTTPS endpoint | Start with short-interval polling; migrate when infrastructure allows |
| High compliance / audit requirements | Persist raw payload + signature metadata before any processing |
| Multiple providers with different payload schemas | Build a normalized internal event model; translate at ingress |
| Provider doesn't support webhooks | Poll. Or check if they support long-polling / SSE instead |
๐ฏ What to Learn Next
- System Design Protocols: REST, RPC, and TCP/UDP
- System Design Core Concepts
- API Gateway vs. Load Balancer vs. Reverse Proxy
๐ ๏ธ Spring Boot + Svix: A Production-Safe Webhook Receiver
Spring Boot provides the @RestController and @RequestBody annotations needed to build a webhook endpoint in minutes, while Svix is an open-source webhook delivery platform that handles HMAC signing, retry scheduling, idempotency tracking, and an event portal โ addressing every production concern from this post without building that infrastructure from scratch.
Together they solve the three production requirements in the TLDR: Spring Boot handles the receiver endpoint with signature verification, Svix handles the delivery-side guarantees (retries, exponential backoff, event portal for debugging), and Spring's @Async ensures the handler returns 200 before any business logic runs.
// pom.xml dependencies: spring-boot-starter-web, spring-boot-starter-data-jpa
import org.springframework.web.bind.annotation.*;
import org.springframework.http.*;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.beans.factory.annotation.Value;
import jakarta.persistence.*;
import java.nio.charset.StandardCharsets;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.util.*;
// โโ Domain: idempotency store โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
@Entity
public class WebhookEvent {
@Id String eventId;
String type;
String rawPayload;
boolean processed = false;
}
// โโ Signature verification (HMAC-SHA256) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
@Service
public class HmacVerifier {
@Value("${webhook.secret}") // loaded from application.properties
private String webhookSecret;
public boolean verify(byte[] rawBody, String signatureHeader) {
try {
Mac mac = Mac.getInstance("HmacSHA256");
mac.init(new SecretKeySpec(webhookSecret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
String expected = HexFormat.of().formatHex(mac.doFinal(rawBody));
String received = signatureHeader.replace("sha256=", "");
// Constant-time comparison prevents timing attacks
return MessageDigest.isEqual(expected.getBytes(), received.getBytes());
} catch (Exception e) { return false; }
}
}
// โโ Webhook receiver controller โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
@RestController
@RequestMapping("/webhooks")
public class WebhookController {
private final HmacVerifier verifier;
private final WebhookEventRepository eventRepo;
private final WebhookWorker worker;
public WebhookController(HmacVerifier v, WebhookEventRepository r, WebhookWorker w) {
this.verifier = v; this.eventRepo = r; this.worker = w;
}
@PostMapping(value = "/events", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
public ResponseEntity<String> receive(
@RequestBody byte[] rawBody,
@RequestHeader("X-Signature") String sig,
@RequestHeader("X-Event-Id") String eventId,
@RequestHeader("X-Event-Type") String eventType) {
// Step 1: Verify HMAC signature
if (!verifier.verify(rawBody, sig)) {
return ResponseEntity.status(401).body("invalid signature");
}
// Step 2: Idempotency check
if (eventRepo.existsById(eventId)) {
return ResponseEntity.ok("duplicate ignored");
}
// Step 3: Persist raw payload BEFORE returning 200
WebhookEvent event = new WebhookEvent();
event.eventId = eventId; event.type = eventType;
event.rawPayload = new String(rawBody, StandardCharsets.UTF_8);
eventRepo.save(event);
// Step 4: Hand off to async worker and return 200 immediately
worker.process(eventId);
return ResponseEntity.ok("accepted");
}
}
// โโ Async worker: business logic runs outside the HTTP thread โโโโโโโโโโโโโโโโโ
@Service
public class WebhookWorker {
private final WebhookEventRepository eventRepo;
public WebhookWorker(WebhookEventRepository r) { this.eventRepo = r; }
@Async // Spring thread pool โ HTTP response already sent
public void process(String eventId) {
WebhookEvent event = eventRepo.findById(eventId).orElseThrow();
// Execute business logic here (update order, send notification, etc.)
System.out.println("Processing event: " + event.type + " / " + event.eventId);
event.processed = true;
eventRepo.save(event);
}
}
The @Async annotation ensures worker.process() runs on Spring's task executor thread pool โ the HTTP thread returns 200 OK before any database or downstream service calls happen. Combined with eventRepo.existsById(eventId) for deduplication, this matches the exact pattern from the production-safe handler diagram.
For a full deep-dive on Spring Boot webhook receivers and Svix for managed delivery, a dedicated follow-up post is planned.
๐ Lessons from Webhook Systems in Production
Lesson 1: Idempotency is not optional โ it is the foundation.
Every webhook system that goes to production eventually receives duplicate events. A provider network timeout, a retry on infrastructure restart, or an explicit replay will cause the same event.id to arrive twice. If your handler is not idempotent, you will double-charge customers, send duplicate notifications, or corrupt state. Build idempotency on day one.
Lesson 2: The return-200-fast pattern is more important than it looks. Teams that process inline get burned within weeks. A database slowdown causes a processing delay, the provider times out, retries fire, and you process the event twice despite having no explicit bug. The pattern โ persist, enqueue, return 200 โ protects against the entire class of retry-induced duplicates.
Lesson 3: Build an event replay pipeline before you need it. At some point your worker will have a bug, your queue will fill, or your downstream service will go down. You need to be able to re-process events from your persistent store. Design that replay pipeline into the system from the start, not as a fire-drill.
Lesson 4: Monitor signature failure rate as a security signal.
A sudden spike in signature_fail_rate means either your secret rotated without overlap, your endpoint is receiving spoofed requests, or there's a serialization mismatch. It is always worth investigating โ it rarely resolves on its own.
๐ TLDR: Summary & Key Takeaways
- Webhooks invert the polling model โ the provider pushes events to your endpoint the moment they occur.
- Every production webhook handler needs: HMAC signature verification, idempotency check, async processing.
- Return 200 as fast as possible โ inline processing causes duplicate deliveries on retries.
- At-least-once delivery means duplicates are normal; make your handler idempotent by design.
- Monitor
signature_fail_rate,dedup_hit_rate, andqueue_depthas the three core health signals.
๐ Practice Quiz
Why must a webhook handler return 200 before completing its business logic?
- A) Providers reject any response that takes more than 100 ms
- B) Providers retry if they don't receive a timely 2xx, causing duplicate event delivery
- C) HTTP 200 signals the provider to send the next event
- D) Returning 200 prevents HMAC verification
Correct Answer: B โ if your processing runs inline and exceeds the provider's timeout, the delivery is marked failed and the same event is retried, resulting in duplicate processing.
What is the purpose of HMAC signature verification in a webhook handler?
- A) To encrypt the event payload end-to-end
- B) To prove the request came from the legitimate provider using a shared secret, not a spoofed source
- C) To prevent duplicate events from being processed
- D) To compress the payload before storage
Correct Answer: B โ the provider computes HMAC-SHA256 over the payload using a shared secret; verifying this proves authenticity and integrity without exposing the secret.
Your
dedup_hit_ratemetric spikes to 40%. What does this indicate?- A) Your handler is processing events too slowly
- B) The provider is delivering a high proportion of duplicate events โ expected under at-least-once delivery
- C) Your webhook secret has been rotated
- D) Your queue is backing up
Correct Answer: B โ at-least-once delivery guarantees duplicates under retry conditions; a 40% dedup rate means your idempotency layer is working correctly and absorbing those duplicates.
๐ Related Posts
- System Design Protocols: REST, RPC, and TCP/UDP
- System Design Core Concepts
- API Gateway vs. Load Balancer vs. Reverse Proxy
- System Design API Design for Interviews
- System Design Message Queues and Event-Driven Architecture
Tags

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions โ with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy โ but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose โ range, hash, consistent hashing, or directory โ determines whether range queries stay ch...
