System Design HLD Example: Notification Service (Email, SMS, Push)

Design a multi-channel notification platform with retries, preferences, and provider failover.

Abstract Algorithms

·Mar 13, 2026·17 min read

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: A notification platform routes events to per-channel Kafka queues, deduplicates with Redis, and tracks delivery via webhooks — ensuring that critical alerts like password resets never get blocked by marketing batches.

Uber sends over 1 million push notifications per minute during peak demand — surge pricing alerts, ride status updates, and driver assignments. When Twilio (a primary SMS provider) degraded in 2021, Uber needed to fail over to a secondary provider in seconds without users receiving duplicates or losing in-flight events. A synchronous, single-provider notification call with no queue buffer would have silently dropped every in-flight message during the switchover window.

Imagine a critical failure: your payment service sends a "Payment Failed" email. The email provider is slow, causing the payment service's thread pool to exhaust as it waits for a response. Now, your entire payment checkout is down because the notification system was synchronous and lacked failure isolation.

Designing a notification service teaches you how to build reliable fan-out over unreliable external providers — combining channel-isolated queues, retry budgets, idempotency keys, and graceful provider failover into a single coherent system. By the end of this guide, you will understand how to design for millions of events while maintaining strict user preferences and delivery guarantees.

📖 Notification Service: Use Cases & Requirements

A notification service is a high-volume bridge between internal services (the "Triggers") and external delivery channels (Email, SMS, Push).

Functional Requirements

Multi-Channel Delivery: Support Email (SendGrid/Mailgun), SMS (Twilio/Plivo), and Push (APNs/FCM).
Template Management: Store and render templates with dynamic variable substitution.
Preference Evaluation: Respect user "opt-out" settings for specific categories (marketing vs. transactional).
Prioritization: Ensure transactional alerts (OTPs, Security) take precedence over marketing blasts.
Deduplication: Prevent the same notification from being sent multiple times due to upstream retries.
Delivery Tracking: Track status from "Queued" to "Delivered" via provider webhooks.

Non-Functional Requirements

Reliability: No notifications should be lost. We use durable queues (Kafka) to guarantee at-least-once delivery.
Availability: 99.9% for transactional messages.
Scalability: Handle 10M+ notifications per day with peaks of 5,000 requests per second.
Low Latency: Transactional p95 end-to-end delivery should be < 5 seconds.
Extensibility: Easily add new providers or channels (e.g., WhatsApp, Slack).

🔍 Basics: Baseline Architecture

The baseline approach moves from a synchronous call to an asynchronous pipeline. Instead of the "Checkout Service" calling "SendGrid" directly, it calls an internal "Notification Service" which buffers the request.

Intake API: Receives requests, validates schema, and returns a 202 Accepted.
Deduplicator: Uses Redis to check if this specific notification was already sent.
Message Queue: Kafka serves as the backbone, storing messages in per-channel topics.
Workers: Channel-specific consumers read from Kafka and call external PSPs (Provider Service Providers).
Status Store: A relational database (PostgreSQL) tracks the lifecycle of every notification.

⚙️ Mechanics: Key Logic

1. The Preference Filter

Before a message is even enqueued, the system must check the notification_preferences table.

If the category is MARKETING and the user has opted out, the request is dropped immediately.
If the category is SECURITY (e.g., password reset), the preference check is bypassed.

2. Rendering the Template

We use a template engine like Mustache or Handlebars. Templates are stored in the database with placeholders like {{first_name}}.

The worker fetches the template from the DB (or Redis cache).
It substitutes variables from the request payload.
It produces the final HTML or text body.

3. Exponential Backoff with DLQ

Providers often fail. We implement a retry policy:

Retry 1: 5 seconds.
Retry 2: 30 seconds.
Retry 3: 5 minutes.
After 5 failed attempts, the message is moved to a Dead Letter Queue (DLQ) for manual inspection.

📐 Estimations & Design Goals

Daily Volume: 10M notifications.
Payload Size: ~1KB per notification (Metadata + payload).
Storage: 10GB per day for notification logs. 90-day retention = 900GB.
Bandwidth: 10M * 1KB / 86400s ≈ 115 KB/sec average.
Peak Load: 500 TPS for transactional; marketing batches can scale to 10k TPS.

📊 High-Level Design

graph TD
    Trigger[Payment/Auth Service] -->|POST /notifications| API[Intake API]
    API -->|1. Dedup Check| Redis[(Redis)]
    API -->|2. Pref Check| DB[(PostgreSQL)]
    API -->|3. Enqueue| Kafka{Kafka}

    subgraph Workers
        Kafka -->|Topic: Email| EmailWorker[Email Worker]
        Kafka -->|Topic: SMS| SMSWorker[SMS Worker]
        Kafka -->|Topic: Push| PushWorker[Push Worker]
    end

    EmailWorker -->|API| SendGrid[SendGrid]
    SMSWorker -->|API| Twilio[Twilio]
    PushWorker -->|API| FCM[Firebase FCM]

    SendGrid -->|Webhook| WebhookHandler[Webhook Handler]
    WebhookHandler -->|Update Status| DB

The diagram maps the complete event flow from trigger to provider delivery. The Intake API is the single synchronous entry point: it performs deduplication and user preference checks before immediately returning a 202 Accepted response and enqueuing the request to Kafka. Kafka acts as the durable buffer separating intake from delivery — a slow email provider cannot back-pressure the intake API or starve the SMS channel. Each channel worker pool is independent, allowing Email, SMS, and Push to scale their consumer parallelism according to per-channel volume. The Webhook Handler closes the delivery loop by updating the notification lifecycle status in PostgreSQL based on provider callbacks — turning an asynchronous fire-and-forget into a fully tracked delivery record.

🧠 Deep Dive: Idempotency Keys, Priority Queues, and Provider Failover Architecture

Internals: How the Priority Router and Idempotency Layer Coordinate at Intake Time

When the Intake API receives a notification request, it executes three sequential checks before returning 202 Accepted. First, it computes the Redis idempotency key as dedup:{idempotency_key} and executes an atomic SET NX EX 86400 — this single Redis command atomically writes the key only if it does not exist and sets a 24-hour expiry. If the command returns nil (key already existed), the API immediately returns the original notification_id from the stored value. This prevents duplicate processing even under concurrent retries from multiple caller instances.

Second, the API checks the user's preference for the requested channel and category. Preferences are cached in Redis as pref:{user_id}:{channel}:{category} with a 1-hour TTL, meaning the Postgres preference table is queried at most once per hour per user per channel. This cache typically achieves a 90–95% hit rate, significantly reducing Postgres load during high-volume notification bursts.

Third, based on the category field (critical, transactional, or marketing), the API selects the appropriate Kafka topic and publishes the notification payload. The three-step atomic sequence — dedup check, preference check, topic routing — ensures that every successfully enqueued notification is unique, welcome (user opted in), and routed to the correct priority worker pool.

Performance Analysis: Throughput Sizing Across Email, SMS, and Push Channels

Metric	Value	Sizing Notes
Daily notification volume	10 million	Across all channels combined
Peak throughput (marketing blast)	10,000 TPS	Marketing batch to 10M users over 16.7 minutes
Transactional throughput	500 TPS	Steady state, 24×7
Kafka partition count (email topic)	50 partitions	Supports 50 concurrent email workers
Redis dedup key storage	10M keys × 150 bytes	~1.5 GB for 24-hour dedup window
Worker instances (email)	20 (normal) → 200 (peak)	Auto-scaled by Kafka consumer lag metric
Provider API rate limit (SendGrid)	1,000 emails/sec per API key	Use 10 rotating API keys for 10,000 emails/sec

The peak marketing blast scenario (10,000 TPS) is 20× the transactional steady-state throughput. Email workers must auto-scale based on Kafka consumer group lag: when lag exceeds 10,000 messages, auto-scaling triggers additional worker instances. Worker count scales linearly with Kafka partition count — adding more partitions at topic creation time is the primary lever for increasing peak throughput capacity.

A notification service has three internal correctness challenges that naive implementations get wrong: sending the same notification twice, letting a batch marketing blast starve a critical OTP delivery, and dropping in-flight messages when a provider degrades. Each requires a specific architectural solution.

Idempotency: Preventing Duplicate Sends with Redis

Upstream services frequently retry on timeout. If the payment service calls the notification API and receives a network timeout (while the API had already enqueued the request), a naive system sends the "Payment Confirmed" email twice. The idempotency key pattern eliminates this:

Every inbound request includes an idempotency_key (typically a UUID generated by the caller). The Intake API performs an atomic SET NX in Redis — "set this key only if it does not exist." If the key already exists, the API returns the original result immediately without re-enqueuing.

Redis Key Pattern	Value	TTL	Purpose
`dedup:{idempotency_key}`	`{notification_id}` (original result)	24 hours	Prevents duplicate processing on upstream retries
`pref:{user_id}:{channel}`	`opted_out` / `opted_in`	1 hour (cache)	Caches preference DB lookups to reduce load
`template:{template_id}:{version}`	Rendered template string	30 minutes	Avoids repeated DB reads for high-volume templates
`rate:{user_id}:{channel}:{hour}`	Integer count	1 hour	Per-user per-channel rate limiting to prevent notification spam

Priority Queues: Ensuring OTPs Are Never Blocked by Marketing Blasts

A critical operational requirement: a password reset OTP email must never be delayed because 5 million marketing batch emails are draining the same worker pool. The solution is per-priority Kafka topics combined with separate worker groups:

Kafka Topic	Priority	Consumer Group	Throughput SLA
`notif.email.critical`	P0 — Security alerts, OTPs	`email-critical-workers` (dedicated)	< 2 seconds end-to-end
`notif.email.transactional`	P1 — Order confirmations, receipts	`email-txn-workers`	< 30 seconds
`notif.email.marketing`	P2 — Newsletters, promotions	`email-mktg-workers` (lower concurrency)	< 5 minutes
`notif.sms.critical`	P0 — OTP, fraud alerts	`sms-critical-workers` (dedicated)	< 5 seconds
`notif.push.all`	P1/P2 mixed	`push-workers`	< 60 seconds

The Intake API routes to the correct topic based on the category field in the request payload. Critical and transactional notifications use dedicated worker groups that are never shared with marketing traffic.

Provider Failover: Switching from SendGrid to Mailgun in Seconds

Notification services call external providers that fail. When Twilio degraded in 2021, companies with a single-provider model silently dropped every in-flight SMS. The failover architecture prevents this:

graph TD
    Worker[Email Worker] --> Primary[Primary: SendGrid]
    Primary -->|Success| Done[Mark Delivered]
    Primary -->|Timeout / 5xx| Secondary[Failover: Mailgun]
    Secondary -->|Success| Done
    Secondary -->|Failure| DLQ[Dead Letter Queue]
    DLQ --> Alert[PagerDuty Alert]

The failover diagram shows the two-provider cascade. The Email Worker always tries the primary provider (SendGrid) first. On a timeout or 5xx error, it immediately retries against the secondary provider (Mailgun) without incrementing the retry counter. Only if both providers fail does the message move to the Dead Letter Queue with an alert. This pattern guarantees that a single provider degradation never causes message loss — the system transparently switches channels.

Notification Lifecycle Data Model

Column	Type	Constraint	Purpose
notification_id	UUID	PRIMARY KEY	Unique notification record
idempotency_key	UUID	UNIQUE	Deduplication key from caller
recipient_id	UUID	NOT NULL, FK → users	Target user
template_id	VARCHAR(64)	FK → templates	Template reference
channel	ENUM	NOT NULL	email / sms / push
category	ENUM	NOT NULL	critical / transactional / marketing
status	ENUM	NOT NULL	queued / sent / delivered / failed
provider	VARCHAR(32)	nullable	Which provider handled delivery
provider_message_id	VARCHAR(128)	nullable	External tracking ID for webhook matching
created_at	TIMESTAMPTZ	DEFAULT NOW()	Intake timestamp
sent_at	TIMESTAMPTZ	nullable	Provider submission timestamp
delivered_at	TIMESTAMPTZ	nullable	Provider-confirmed delivery timestamp
retry_count	SMALLINT	DEFAULT 0	Tracks retry attempts for DLQ threshold

🌍 Real-World Applications: Uber, Duolingo, and Airbnb at Notification Scale

Uber sends over 1 million push notifications per minute during peak demand — surge pricing alerts, driver assignments, ride status updates, and receipt confirmations. Uber's notification service routes through a priority system similar to what is described above, with driver assignment push notifications classified as P0 (delivered via dedicated worker pools) and marketing promotions classified as P2. After the 2021 Twilio degradation, Uber implemented automatic provider failover that switches to a secondary SMS provider within 30 seconds of detecting provider-level error rate spikes.

Duolingo sends over 500 million push notifications per week — primarily daily streak reminders and lesson nudges. Duolingo's challenge is timing optimization: their "Spaced Repetition Notification" system uses machine learning to determine the optimal send time per user based on historical open-rate patterns. The notification service must support scheduled delivery with per-user offsets across 150 time zones. Their architecture uses a Redis Sorted Set as a "scheduled queue" — notifications are stored with a delivery timestamp as the score, and a scheduler process polls for ready notifications.

Airbnb manages a multi-channel notification system where each user action (booking confirmation, host message, guest inquiry) triggers cross-channel coordination. A booking confirmation generates an email, an SMS, and a push notification — but only if the user has not opted out of each specific channel. Airbnb's preference evaluation layer caches opt-out decisions in Redis with a 1-hour TTL, dramatically reducing preference database load during high-volume booking events like holiday season peaks.

⚖️ Trade-offs and Failure Scenarios Every Interviewer Expects You to Handle

Synchronous Preference Check vs. Asynchronous Filtering

The system performs the preference check synchronously at intake time (before enqueuing). An alternative is to check preferences asynchronously in the worker, just before calling the provider. The synchronous approach saves Kafka storage and worker processing time by dropping opted-out notifications immediately, but it adds latency to the intake API call. The asynchronous approach gives faster intake acknowledgment but wastes queue capacity on notifications that will ultimately be dropped. For most systems, synchronous preference filtering at intake is the correct choice: opt-outs are relatively rare (< 5% of notifications), so the latency cost is minimal and queue savings are meaningful.

At-Least-Once Delivery and the Duplication Window

Kafka's at-least-once delivery guarantee means a worker can process the same Kafka message twice — for example, after a consumer crash during message processing but before committing the offset. If the provider call already succeeded before the crash, the replay will attempt a second send. The idempotency key stored in Redis prevents duplicate intake, but the provider call itself may be non-idempotent. Mitigation: check provider_message_id in Postgres before calling the provider. If a delivery record already exists for this notification, skip the provider call.

Dead Letter Queue: The Failure Sink with a Recovery Path

The DLQ is not a discard pile — it is a recovery queue. Every message in the DLQ must be inspectable (what was the payload, what was the failure reason) and replayable (an operator or automated system can re-enqueue after fixing the underlying issue). A DLQ with no replay mechanism is just delayed data loss. Production systems include a DLQ management UI that shows failure reasons, groups failures by category (provider error vs. invalid recipient vs. template render failure), and enables bulk or selective replay.

🧭 Decision Guide: Notification Architecture Choices

Design Decision	Simpler Option	Robust Option	Choose Robust When
Queueing	Single Kafka topic all channels	Per-channel per-priority topics	Marketing volume risks starving critical alerts
Idempotency	DB unique constraint on idempotency_key	Redis NX check + DB constraint	Upstream retry rate > 5% of traffic
Provider failover	Retry same provider 3×	Primary → secondary provider cascade	SLA requires < 5-second recovery from provider failure
Template rendering	Inline in worker	Dedicated template service with Redis cache	> 50 distinct templates or > 10K renders per second
Preference check	Worker-side (async)	Intake API (sync)	Queue capacity is constrained or opt-out rate is high
Delivery tracking	Status enum in notification table	Full lifecycle events table	Audit requirements or per-step analytics are needed

🧪 Interview Delivery Example: Narrating a Notification Service Design in 45 Minutes

Minute 1–5: Requirements scoping. Ask: "Which channels are in scope — email, SMS, push, or all three? Is real-time delivery tracking required? What are the SLAs for transactional versus marketing notifications? Are user notification preferences centrally managed?" These questions establish the scope and immediately reveal whether the interviewer wants a simple single-channel system or a full multi-channel platform.

Minute 6–15: Core architecture decision. Establish the asynchronous pattern early: "I would not have the calling service (payment, auth) directly invoke the email provider. Instead, a Notification Service exposes a single POST /notifications endpoint that validates the request, checks deduplication and user preferences, and enqueues to Kafka — returning 202 Accepted immediately. This isolates the caller from provider latency and failure."

Minute 16–30: Channel isolation and priority. Explain the per-channel per-priority Kafka topic model. Say: "A password reset OTP must never be delayed by a 5-million-email marketing batch. I'd route critical notifications to a dedicated topic with a dedicated high-concurrency consumer group, and marketing to a separate topic with a lower-concurrency consumer group that can fall behind without impacting critical delivery."

Minute 31–40: Reliability patterns. Walk through idempotency keys, exponential backoff retries, and the DLQ. Address provider failover: "If SendGrid returns 5xx errors, I do not increment the retry counter — I immediately try Mailgun as the secondary. Only after both providers fail does the notification enter the DLQ for operator review."

Minute 41–45: Trade-offs. Discuss synchronous vs. asynchronous preference checking, the at-least-once delivery duplication window and how to handle it, and the DLQ replay mechanism. Demonstrating that you have thought about failure recovery paths is the mark of a senior-level answer.

🛠️ Kafka, Redis, and SendGrid: The Production Notification Stack

Apache Kafka is the backbone of every production notification service. Its durable, replayable log ensures that a notification is never lost due to a worker crash mid-processing. Topic partitioning by user_id ensures that all notifications for a given user are processed in order by the same consumer, preventing race conditions on delivery status updates. Kafka's consumer group model allows independent scaling of Email, SMS, and Push workers — each channel's throughput scales without affecting the others.

Redis performs three roles in the notification service: deduplication key storage (atomic SET NX for idempotency), preference caching (reducing preference DB read load by 10–100×), and scheduled notification queuing (Sorted Set with delivery timestamp as score for time-based send optimization). Redis's sub-millisecond response time makes it ideal for the synchronous checks performed at intake API time.

SendGrid, Twilio, and Firebase FCM are the external delivery providers. Each exposes a REST API for sending and a webhook endpoint for delivery status callbacks. Production systems register webhooks for delivered, opened, bounced, and unsubscribed events. The Webhook Handler in the architecture receives these callbacks and updates the notification record's delivered_at and status fields, giving operators a complete delivery audit trail per notification.

📚 Lessons Learned from Production Notification Services

Template Rendering Is a Reliability Risk. A template engine processing a Handlebars or Mustache template against a malformed payload can throw an exception and crash the worker. Always validate the payload schema against the template's required variables at intake time — not in the worker. A failed render in the worker after a Kafka commit wastes a delivery slot and may require manual DLQ replay.

Per-User Rate Limiting Prevents Notification Spam. A bug in an upstream service can cause it to emit thousands of "Your order has shipped" notifications for the same order. Without per-user per-channel rate limiting, this floods the user's inbox and may trigger email provider spam filters that affect your domain reputation. Store a per-user per-channel hourly counter in Redis and reject or throttle requests that exceed the threshold.

Webhook Reliability Is Lower Than You Expect. SendGrid, Twilio, and FCM webhook delivery is not guaranteed. Webhooks can arrive out of order, be delayed by minutes, or fail to arrive at all if your webhook endpoint returns an error. Build a background reconciliation job that periodically queries provider APIs for delivery status of notifications that remain in sent status for more than 10 minutes — this catches webhook delivery gaps before they affect SLA reporting.

DLQ Hygiene Is an Operational Discipline. A DLQ that grows without being drained is a ticking clock — eventually it contains so many messages that replay takes hours and the operational team cannot identify which failures are fresh versus stale. Set a maximum DLQ age policy (e.g., messages older than 7 days are permanently discarded with a logged alert) and assign ownership for DLQ review to a specific on-call rotation.

📌 Key Takeaways: Notification Service System Design

A notification service is an asynchronous fan-out platform that decouples internal event producers from unreliable external delivery channels via Kafka-backed per-channel queues.
Idempotency keys with Redis SET NX prevent duplicate sends when upstream services retry on timeout — this is the most common correctness bug in naive notification systems.
Per-channel per-priority Kafka topics ensure that marketing batch volume never starves critical OTP or security alert delivery — channel and priority isolation is non-negotiable for production systems.
Provider failover (primary → secondary cascade) is the difference between a notification service that survives provider degradations and one that silently drops messages. Plan for every external provider to fail.
The DLQ is a recovery queue, not a discard bin. Every DLQ message must be inspectable, categorized by failure type, and replayable after the underlying issue is resolved.
Webhook delivery from providers is unreliable. Always implement a background reconciliation job to poll provider APIs for delivery status of notifications that remain un-acknowledged.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Watermarking and Late Data Handling in Spark Structured Streaming

TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...

Apr 19, 2026•23 min read