13 min readSystem Design

Webhooks Explained: Don't Call Us, We'll Call You

Polling is slow and wasteful. Webhooks are event-driven callbacks that deliver data the moment something happens.

Abstract Algorithms/Feb 13, 2026/How It Works: Internals Explained

Executive TLDR

TLDR: Webhooks let one system push event data to another the moment something happens.
Instead of polling ("anything new?"), you expose an endpoint and the provider POSTs signed event payloads to you in near real time.
The key production requirements: signature verification, idempotency, async processing.
🔍 The Basics: HTTP Callbacks and Event Driven Delivery A webhook is an HTTP callback.

Core mental model

Read this as a system of state, constraints, and failure boundaries.

Polling is slow and wasteful. Webhooks are event-driven callbacks that deliver data the moment something happens.

Explain simpler Compare tradeoffs

Key systems visualization

The article’s conceptual path

🔍 The Basics: HTTP Callbacks and Event-Driven Delivery

📖 Stop Polling — Let the Provider Ring Your Doorbell

🔢 What a Webhook Payload Looks Like

📊 Webhook Delivery Flow

⚙️ The Production-Safe Webhook Handler

TLDR: Webhooks let one system push event data to another the moment something happens. Instead of polling ("anything new?"), you expose an endpoint and the provider POSTs signed event payloads to you in near real-time. The key production requirements: signature verification, idempotency, async processing.

🔍 The Basics: HTTP Callbacks and Event-Driven Delivery

A webhook is an HTTP callback. When a specific event occurs, the event source (provider) sends an HTTP POST request to a URL you specify — your webhook endpoint. No polling, no long connections.

The three-step registration flow:

You register a URL with the provider (e.g., https://your-app.com/webhooks/stripe).
An event occurs on the provider's system (a payment succeeds, a commit is pushed).
The provider sends a POST with a JSON payload describing the event to your URL.

Your app receives the event nearly instantly — no repeated requests, no idle network traffic.

What makes webhooks different from REST APIs:

Dimension	REST API (polling)	Webhook
Who initiates?	Your app	The provider
Latency	Up to poll interval	Near real-time
Network cost at idle	Constant — requests every N seconds	Zero
Reliability	Deterministic	Depends on provider retry policy
Scalability	Linear with frequency	Driven by event rate

The biggest practical implication: with webhooks, you never pay network and compute cost for silence. You only receive traffic when something actually happened.

📖 Stop Polling — Let the Provider Ring Your Doorbell

Polling means your app repeatedly asks a provider "anything new?" — burning bandwidth and adding latency whether or not anything changed.

Webhooks invert the call: you register a URL, and the provider calls your endpoint the instant an event occurs.

Model	Who initiates?	Event latency	Network cost at idle
Polling	Your app	Up to poll interval	High (constant requests)
Webhook	Provider	Near real-time	Very low

Analogy: Polling is calling the courier every 5 minutes. A webhook is the courier ringing your doorbell the moment the package arrives.

🔢 What a Webhook Payload Looks Like

Most providers send a structured JSON body over HTTPS POST:

{
  "id":      "evt_101",
  "type":    "payment.succeeded",
  "created": 1772877602,
  "data": {
    "transaction_id": "txn_9001",
    "amount":         4999,
    "currency":       "USD"
  }
}

Key fields in every real-world webhook:

id — unique event identifier; use for deduplication.
type — event name; drives routing in your handler.
created — Unix timestamp; enables replay-window validation.
data — the event payload; schema varies by event type.

📊 Webhook Delivery Flow

The complete lifecycle from event to processed business logic:

flowchart TD
    A[Event occurs at Provider] --> B[Provider signs payload with HMAC]
    B --> C[Provider POSTs to your endpoint over HTTPS]
    C --> D{Signature valid?}
    D -->|No| E[Return 401  reject]
    D -->|Yes| F{Event ID already seen?}
    F -->|Yes| G[Return 200  silently ignore]
    F -->|No| H[Persist raw payload to DB]
    H --> I[Enqueue job for async worker]
    I --> J[Return 200 immediately]
    J --> K[Worker executes business logic]
    K --> L[Mark event as processed]

The critical design principle: your endpoint's only job is to acknowledge receipt (return 200) as fast as possible. All actual work happens asynchronously. This prevents provider retry storms.

Provider retry schedules (common defaults):

Provider	Retry count	Retry interval strategy
Stripe	Up to 3 days	Exponential backoff
GitHub	Up to 3 attempts	Immediate, then 1h, 2h
Twilio	Up to 11 hours	Exponential
Shopify	Up to 48 hours	Exponential backoff

Providers consider a delivery successful only when they receive a 2xx response within their timeout window (usually 5–30 seconds).

⚙️ The Production-Safe Webhook Handler

A naive endpoint that just processes inline is dangerous: it creates duplicate actions when providers retry. The correct pattern has 5 ordered steps:

flowchart TD
    A[Provider POST] --> B{Valid HMAC
signature?}
    B -->|No| C[Return 401]
    B -->|Yes| D{Duplicate
event_id?}
    D -->|Yes| E[Return 200  ignore]
    D -->|No| F[Persist event
to durable store]
    F --> G[Enqueue for async
worker]
    G --> H[Return 200 immediately]
    H --> I[Worker processes
business logic]

// Node.js / Express — production-safe handler skeleton
app.post('/webhooks/provider', express.raw({ type: 'application/json' }), (req, res) => {
  const sig = req.header('X-Signature');
  if (!isValidHmac(req.body, sig, process.env.WEBHOOK_SECRET)) {
    return res.status(401).send('invalid signature');
  }

  const event = JSON.parse(req.body.toString('utf8'));

  if (isDuplicate(event.id)) {
    return res.status(200).send('duplicate ignored');
  }

  persistEvent(event);    // write to DB before ack
  enqueueEvent(event);    // hand off to async worker
  return res.status(200).send('accepted');
});

Why return 200 before processing? Most providers retry if they don't receive a timely 2xx. If your business logic runs inline and takes too long, the same event fires twice.

🧠 Deep Dive: Why At-Least-Once Delivery Demands Idempotency

Webhook providers guarantee delivery by retrying on failure — not by ensuring exactly-once. A network timeout after your handler processes an event but before returning 200 causes the same event to arrive again. Idempotency means processing an event twice produces the same result as once. The key: store event.id on first receipt and reject any event whose ID already exists in your store before executing any business logic.

🌍 Real-World Applications: Where Webhooks Power Real Systems

Domain	Provider	Event examples
Payments	Stripe, PayPal	`payment.succeeded`, `refund.created`, `dispute.opened`
CI/CD	GitHub, GitLab	`push`, `pull_request`, `deployment_status`
Customer messaging	Twilio, Slack	`message.received`, `channel.created`
SaaS integrations	HubSpot, Salesforce	`contact.created`, `deal.updated`
Infrastructure	PagerDuty, Datadog	`alert.triggered`, `incident.resolved`

⚖️ Trade-offs & Failure Modes: Failure Modes You Must Defend Against

Most webhook providers use at-least-once delivery — duplicates are normal, ordering is not guaranteed.

Failure mode	Symptom	Root cause	Fix
Duplicate processing	Double charge or duplicate action	Provider retry after network timeout	Idempotency key on `event.id`
Signature failure spike	Many 401 responses	Secret mismatch or clock drift	Secret rotation with overlap window + NTP
Queue backlog	Delayed domain updates	Worker under-capacity	Autoscale workers; backpressure control
Silent data loss	Missing domain updates	Returned 200 before persisting	Persist first, then ack
Replay storm	Millions of old events flood handler	Misconfigured replay	Timestamp window validation (reject events > 5 min old)

📊 Webhook Delivery Sequence

sequenceDiagram
    participant PS as Provider System
    participant WH as Webhook Endpoint
    participant Q as Job Queue
    participant W as Worker

    PS->>PS: event occurs (payment.succeeded)
    PS->>WH: HTTP POST signed payload
    WH->>WH: validate HMAC signature
    WH->>WH: check idempotency (event.id seen?)
    WH->>Q: enqueue job
    WH-->>PS: 200 OK (fast ack)
    Q->>W: dispatch job
    W->>W: execute business logic

The sequence above traces the complete lifecycle of a single webhook event from provider to worker. Notice the three critical checkpoints in the endpoint handler — signature validation, idempotency check, and job enqueue — all executed before returning the fast 200 OK acknowledgement. The key takeaway is that business logic never runs inside the HTTP handler itself; it is always delegated to an async worker, keeping the endpoint fast and the provider's retry counter at zero.

📊 Webhook Retry on Failure

sequenceDiagram
    participant P as Provider
    participant E as Endpoint

    P->>E: HTTP POST (attempt 1)
    E-->>P: 500 Internal Server Error
    Note over P: wait 30s (exponential backoff)
    P->>E: HTTP POST (attempt 2)
    E-->>P: 500 Internal Server Error
    Note over P: wait 60s
    P->>E: HTTP POST (attempt 3)
    E-->>P: 200 OK
    Note over P: delivery confirmed

Estimating worker capacity:

$$ ext{workers needed} pprox \lambda \cdot T$$

where $\lambda$ = incoming events/sec, $T$ = average processing time (sec). Add 2× safety margin for burst.

🧪 Practical: Setting Up and Testing Webhooks

Local development with a tunnel:

Production webhook endpoints must be publicly reachable over HTTPS. During development, use a tunnel to expose your local server:

# Using ngrok — creates a public HTTPS URL for localhost:3000
ngrok http 3000
# Forwarding: https://abc123.ngrok.io -> http://localhost:3000

Testing your handler:

# Replay a real event from Stripe's dashboard
stripe events resend evt_1234567890

# Trigger a test event via CLI
stripe trigger payment_intent.succeeded

Verifying your HMAC implementation is correct:

const crypto = require('crypto');

function isValidHmac(rawBody, signatureHeader, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(rawBody)              // raw Buffer, not parsed JSON
    .digest('hex');
  // Use timingSafeEqual to prevent timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(expected, 'hex'),
    Buffer.from(signatureHeader.replace('sha256=', ''), 'hex')
  );
}

Three common configuration mistakes:

Parsing JSON before computing HMAC. The signature is computed over the raw bytes. If you call JSON.parse() first, the serialized output may differ from the original bytes and the signature will never match.
Logging the raw secret. Treat WEBHOOK_SECRET like a password. Never log it, never commit it, rotate it if exposed.
Using a short replay window. A 5-minute timestamp window blocks most replay attacks. A 24-hour window does not.

🧭 Decision Guide: When Webhooks Are (and Aren't) the Answer

Situation	Recommendation
Need instant event-driven updates with low idle load	Webhooks — the right default
Cannot expose a public HTTPS endpoint	Start with short-interval polling; migrate when infrastructure allows
High compliance / audit requirements	Persist raw payload + signature metadata before any processing
Multiple providers with different payload schemas	Build a normalized internal event model; translate at ingress
Provider doesn't support webhooks	Poll. Or check if they support long-polling / SSE instead

🎯 What to Learn Next

🛠️ Spring Boot + Svix: A Production-Safe Webhook Receiver

Spring Boot provides the @RestController and @RequestBody annotations needed to build a webhook endpoint in minutes, while Svix is an open-source webhook delivery platform that handles HMAC signing, retry scheduling, idempotency tracking, and an event portal — addressing every production concern from this post without building that infrastructure from scratch.

Together they solve the three production requirements in the TLDR: Spring Boot handles the receiver endpoint with signature verification, Svix handles the delivery-side guarantees (retries, exponential backoff, event portal for debugging), and Spring's @Async ensures the handler returns 200 before any business logic runs.

// pom.xml dependencies: spring-boot-starter-web, spring-boot-starter-data-jpa

import org.springframework.web.bind.annotation.*;
import org.springframework.http.*;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.beans.factory.annotation.Value;
import jakarta.persistence.*;
import java.nio.charset.StandardCharsets;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.util.*;

// ── Domain: idempotency store ─────────────────────────────────────────────────
@Entity
public class WebhookEvent {
    @Id String eventId;
    String type;
    String rawPayload;
    boolean processed = false;
}

// ── Signature verification (HMAC-SHA256) ──────────────────────────────────────
@Service
public class HmacVerifier {

    @Value("${webhook.secret}")          // loaded from application.properties
    private String webhookSecret;

    public boolean verify(byte[] rawBody, String signatureHeader) {
        try {
            Mac mac = Mac.getInstance("HmacSHA256");
            mac.init(new SecretKeySpec(webhookSecret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
            String expected = HexFormat.of().formatHex(mac.doFinal(rawBody));
            String received = signatureHeader.replace("sha256=", "");
            // Constant-time comparison prevents timing attacks
            return MessageDigest.isEqual(expected.getBytes(), received.getBytes());
        } catch (Exception e) { return false; }
    }
}

// ── Webhook receiver controller ───────────────────────────────────────────────
@RestController
@RequestMapping("/webhooks")
public class WebhookController {

    private final HmacVerifier verifier;
    private final WebhookEventRepository eventRepo;
    private final WebhookWorker worker;

    public WebhookController(HmacVerifier v, WebhookEventRepository r, WebhookWorker w) {
        this.verifier = v; this.eventRepo = r; this.worker = w;
    }

    @PostMapping(value = "/events", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
    public ResponseEntity<String> receive(
            @RequestBody byte[] rawBody,
            @RequestHeader("X-Signature") String sig,
            @RequestHeader("X-Event-Id")  String eventId,
            @RequestHeader("X-Event-Type") String eventType) {

        // Step 1: Verify HMAC signature
        if (!verifier.verify(rawBody, sig)) {
            return ResponseEntity.status(401).body("invalid signature");
        }

        // Step 2: Idempotency check
        if (eventRepo.existsById(eventId)) {
            return ResponseEntity.ok("duplicate ignored");
        }

        // Step 3: Persist raw payload BEFORE returning 200
        WebhookEvent event = new WebhookEvent();
        event.eventId = eventId; event.type = eventType;
        event.rawPayload = new String(rawBody, StandardCharsets.UTF_8);
        eventRepo.save(event);

        // Step 4: Hand off to async worker and return 200 immediately
        worker.process(eventId);
        return ResponseEntity.ok("accepted");
    }
}

// ── Async worker: business logic runs outside the HTTP thread ─────────────────
@Service
public class WebhookWorker {

    private final WebhookEventRepository eventRepo;
    public WebhookWorker(WebhookEventRepository r) { this.eventRepo = r; }

    @Async                   // Spring thread pool — HTTP response already sent
    public void process(String eventId) {
        WebhookEvent event = eventRepo.findById(eventId).orElseThrow();
        // Execute business logic here (update order, send notification, etc.)
        System.out.println("Processing event: " + event.type + " / " + event.eventId);
        event.processed = true;
        eventRepo.save(event);
    }
}

The @Async annotation ensures worker.process() runs on Spring's task executor thread pool — the HTTP thread returns 200 OK before any database or downstream service calls happen. Combined with eventRepo.existsById(eventId) for deduplication, this matches the exact pattern from the production-safe handler diagram.

For a full deep-dive on Spring Boot webhook receivers and Svix for managed delivery, a dedicated follow-up post is planned.

📚 Lessons from Webhook Systems in Production

Lesson 1: Idempotency is not optional — it is the foundation. Every webhook system that goes to production eventually receives duplicate events. A provider network timeout, a retry on infrastructure restart, or an explicit replay will cause the same event.id to arrive twice. If your handler is not idempotent, you will double-charge customers, send duplicate notifications, or corrupt state. Build idempotency on day one.

Lesson 2: The return-200-fast pattern is more important than it looks. Teams that process inline get burned within weeks. A database slowdown causes a processing delay, the provider times out, retries fire, and you process the event twice despite having no explicit bug. The pattern — persist, enqueue, return 200 — protects against the entire class of retry-induced duplicates.

Lesson 3: Build an event replay pipeline before you need it. At some point your worker will have a bug, your queue will fill, or your downstream service will go down. You need to be able to re-process events from your persistent store. Design that replay pipeline into the system from the start, not as a fire-drill.

Lesson 4: Monitor signature failure rate as a security signal. A sudden spike in signature_fail_rate means either your secret rotated without overlap, your endpoint is receiving spoofed requests, or there's a serialization mismatch. It is always worth investigating — it rarely resolves on its own.

📌 TLDR: Summary & Key Takeaways

Webhooks invert the polling model — the provider pushes events to your endpoint the moment they occur.
Every production webhook handler needs: HMAC signature verification, idempotency check, async processing.
Return 200 as fast as possible — inline processing causes duplicate deliveries on retries.
At-least-once delivery means duplicates are normal; make your handler idempotent by design.
Monitor signature_fail_rate, dedup_hit_rate, and queue_depth as the three core health signals.

Quiet AI help

Explain simpler Compare approaches What next?

Article metadata