Home/Blog/Circuit Breaker/System Design Advanced: Security, Rate Limiting, and Reliability

Circuit BreakerAdvanced•16 min read•Mar 9, 2026

System Design Advanced: Security, Rate Limiting, and Reliability

How do you protect your API from hackers and traffic spikes? We cover Rate Limiting algorithms (T...

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: Three reliability tools every backend system needs: Rate Limiting prevents API spam and DDoS, Circuit Breakers stop cascading failures when downstream services degrade, and Bulkheads isolate failure blast radius. Knowing when and how to combine them separates junior from senior system design.

📖 The TLDR: A Layered Defense for Distributed Systems

A house electrical panel has three layers of protection:

Fuse/breaker per circuit — no single appliance can knock out the house.
Main breaker — kills everything if total load is too dangerous.
Surge protector — absorbs voltage spikes before they reach appliances.

Distributed systems need the same layered defense at the API gateway, service-to-service, and individual thread pool level. Rate limiting is your surge protector at the edge. Circuit breakers are the fuse on each service call. Bulkheads are the per-circuit isolation that keeps one slow dependency from tripping everything else.

⚙️ Rate Limiting and Circuit Breaking: The Two Inbound Guards

Rate limiting is enforced at the API Gateway or reverse proxy layer before requests reach your application.

Token Bucket Algorithm

Each client gets a bucket of tokens. One token equals one request. Tokens refill at a fixed rate.

Each client starts with a full bucket and spends one token per request. When the bucket is empty, the rate limiter rejects the request with HTTP 429 Too Many Requests and returns a Retry-After header indicating when tokens will replenish. The client must wait until the refill cycle adds enough tokens before retrying.

Algorithm	Burst Handling	Use Case
Token Bucket	Allows small bursts up to bucket size	API rate limits per user
Leaky Bucket	No bursts, constant output rate	Smoothing traffic, QoS
Fixed Window	Large bursts possible at window boundary	Simple, low-overhead admin limits
Sliding Window	Smooth rate, no boundary spikes	Production API gateways (most common)

DDoS Defense: Layered Response

flowchart LR
    Internet[Internet Traffic] --> CDN[CDN
(absorb volumetric attacks)]
    CDN --> WAF[WAF
(block malicious patterns)]
    WAF --> RL[Rate Limiter
(per-IP / per-token limits)]
    RL --> App[Application Servers]
    RL -->|IP repeatedly violates| BH[Blackholing
(drop to /dev/null)]

Blackholing routes the attacker's traffic to a null interface — no response, minimal server overhead. Used by ISPs and CDN providers against volumetric attacks.

Circuit Breaker States

Without a circuit breaker, a slow downstream service blocks all your threads, fills your thread pool, and the cascade propagates upward. A circuit breaker short-circuits this by tracking error rates and failing fast once a threshold is crossed.

stateDiagram-v2
    [*] --> CLOSED : System healthy
    CLOSED --> OPEN : Error rate > threshold (e.g., 50% in 10s)
    OPEN --> HALF_OPEN : After timeout (e.g., 30s)
    HALF_OPEN --> CLOSED : Probe request succeeds
    HALF_OPEN --> OPEN : Probe request fails

State	Behavior	When
CLOSED	All requests pass through	Normal operation
OPEN	All requests fail fast (no actual call)	After too many failures
HALF-OPEN	One probe request allowed	After recovery timeout

Once the circuit breaker library detects that a downstream call has failed beyond the configured threshold — commonly five consecutive failures or a 50% error rate over a sliding window — it opens the circuit and immediately rejects further calls without making any actual network request. This eliminates blocked threads and returns a fast failure to the caller, which can then invoke a fallback response or queue the operation for retry.

🧠 Deep Dive: Reliability Patterns Under the Hood

The Bulkhead pattern is named after ship hull compartments: if one compartment floods, the rest stay dry. In software, give different traffic types separate thread pools and separate connection pools.

Critical payment operations are assigned a dedicated thread pool of around 20 threads, analytics processing a smaller isolated pool of around 5, and background jobs a separate pool of 10. Each pool is bounded so it cannot consume resources beyond its allocation.

If the analytics pool saturates, the payment pool is unaffected. Without bulkheads, one slow operation starves everything.

Internals: How Circuit Breakers Track State

A circuit breaker's core data structure is a sliding window counter — a ring buffer of the last N request outcomes. Each completion (success or failure) is written into the current slot. The window slides periodically and the oldest slot is discarded.

The error rate is computed as the ratio of failures to total calls within the window. When this ratio crosses the configured threshold — commonly 50% — and the minimum call count floor has been met, the breaker transitions to the OPEN state. The half-open probe is a time-delayed single request that asks "is it safe to close again?" without flooding a recovering service.

Bulkhead internals are simpler: a bounded ThreadPoolExecutor per service tier. Calls exceeding the pool queue depth are rejected immediately with a BulkheadFullException rather than queueing indefinitely.

Performance Analysis: Overhead Per Reliability Layer

Pattern	Typical Overhead	Notes
Rate Limiter (in-process)	< 0.05 ms	Atomic counter + timestamp
Rate Limiter (Redis)	1-2 ms	2x Redis round trips per request
Circuit Breaker (local)	~0.1 ms	Lock-free ring buffer read/write
Bulkhead (thread pool)	< 0.01 ms	queue.offer() + thread dispatch

Use local in-process rate limiting for high-throughput endpoints and reserve Redis for cross-node consistency across a fleet of servers.

Mathematical Model: Token Bucket Formalism

Let C = bucket capacity (tokens), r = refill rate (tokens/second), t0 = time of last refill, and tokens(t0) = token count at t0.

Mathematically, the available token count at any time t equals the minimum of the bucket capacity C and the sum of remaining tokens at the last refill time plus the product of the refill rate r and the elapsed seconds since that refill. A request is permitted if at least one token is available; otherwise it is rejected with HTTP 429.

Worked example: With a capacity of 10 tokens and a refill rate of 5 tokens per second, a client starting with an empty bucket can make 5 requests after 1 second and 5 more after 2 seconds. A rapid burst of 6 requests before any refill occurs results in all 6 being rejected immediately.

For circuit breakers, the error rate is measured over a sliding time window. The breaker trips when that rate exceeds the threshold and total call volume exceeds a minimum floor — preventing false opens during low-traffic periods such as overnight deployments.

📊 Token Bucket Rate Limiting: Request Flow

sequenceDiagram
    participant C as Client
    participant RL as Rate Limiter
    participant B as Token Bucket
    participant API as API Server

    C->>RL: Request (API key: user_42)
    RL->>B: Check tokens for user_42
    B-->>RL: tokens = 5 (bucket not empty)
    RL->>B: Decrement token count
    RL->>API: Forward request
    API-->>C: 200 OK

    C->>RL: Request (burst: 10 rapid calls)
    RL->>B: Check tokens for user_42
    B-->>RL: tokens = 0 (bucket empty)
    RL-->>C: 429 Too Many Requests (Retry-After: 1s)
    Note over B: Tokens refill at r/sec
    B->>B: Refill +10 tokens (1s elapsed)
    C->>RL: Retry after 1s
    RL->>B: Check tokens
    B-->>RL: tokens = 10
    RL->>API: Forward request
    API-->>C: 200 OK

This sequence diagram walks through two distinct token bucket scenarios for the same client. In the first scenario, tokens are available and the request is forwarded normally. In the second, a burst of 10 rapid calls exhausts the bucket and the rate limiter returns 429 Too Many Requests with a Retry-After header — critical UX that tells the client exactly when to retry. After 1 second of token refill, the next request succeeds. The takeaway is that a well-implemented rate limiter communicates when to try again, not just whether to block.

🏗️ Advanced Concepts: Combining Patterns for Defense in Depth

Each pattern in isolation solves one failure mode. Together they form defense in depth — no single layer is the only thing standing between your system and failure.

The retry storm problem illustrates why combining patterns matters. Add exponential backoff retries to every service call. A downstream service degrades. All upstream clients retry with growing delays — a thundering herd. A circuit breaker solves this: once OPEN, retries stop entirely until the half-open probe confirms recovery.

Backpressure signals upstream callers to slow their send rate rather than dropping requests silently. Bulkheads emit BulkheadFullException; Kafka consumers use lag metrics to throttle producers; gRPC uses HTTP/2 flow control.

Jitter on circuit close prevents thundering herd on recovery. Instead of all clients retrying at t + 30s, each uses t + 30s + random(0, 5s), smoothing the re-entry spike across a 5-second window.

Failure Mode	Pattern Combination	Key Config
DDoS / API spam	Rate Limiter + WAF	Per-IP limits, burst cap
Cascading slow dependency	Circuit Breaker + timeout	50% error rate, 200ms timeout
Thread starvation	Bulkhead + Circuit Breaker	Isolated pools, fail-fast on full
Retry storms	Circuit Breaker + backoff + jitter	Base 100ms, max 30s, full jitter
Thundering herd on recovery	Jitter on recovery timeout	+/-5s random spread per client

📊 System Flow: Request Through the Reliability Stack

Every inbound request traverses all three layers. The order matters: rate limit first, then bulkhead, then circuit breaker.

flowchart LR
    Client[Client Request] --> RL[Rate Limiter
Token Bucket]
    RL -->|429 if limit hit| Reject[Reject: 429]
    RL --> BH[Bulkhead
Thread Pool]
    BH -->|503 if pool full| Full[Reject: 503]
    BH --> CB[Circuit Breaker
CLOSED / OPEN / HALF-OPEN]
    CB -->|OPEN: fail fast| FB[Fallback Response]
    CB -->|CLOSED: call through| Svc[Downstream Service]
    Svc -->|success| CB
    Svc -->|failure threshold hit| CB

At each gate, a rejection is cheap and deterministic. By the time a request reaches the downstream service it has already proven: within rate limits, a thread is available, and the target is believed healthy. This is fail early, fail fast.

🌍 Real-World Applications: Where These Patterns Run in Production

Rate Limiting in the wild:

GitHub API enforces 5,000 requests/hour per authenticated token. Unauthenticated requests are limited to 60/hour to force authentication.
Stripe limits to 100 requests/second per secret key. Exceeding this returns 429 with a Retry-After header; Stripe SDKs implement automatic exponential backoff.
AWS API Gateway offers per-method throttling and account-level throttling with separate burst limits per deployment stage.

Circuit Breakers in the wild:

Netflix Hystrix was the original widely-adopted circuit breaker library, used to isolate microservices so a recommendations failure would not take down video streaming. Succeeded by Resilience4j.
Istio service mesh implements circuit breaking at the Envoy sidecar proxy so every service-to-service call is guarded without application code changes.

Bulkheads in the wild:

Kubernetes resource limits and namespace quotas act as infrastructure-level bulkheads preventing rogue jobs from consuming all cluster CPU.
Payment processing APIs universally use isolated thread pools so payment authorization never competes with analytics or background jobs.

⚖️ Trade-offs & Failure Modes: Trade-offs and Failure Modes: Choosing the Right Pattern

Scenario	Pattern	Notes
Public API with free and paid tiers	Rate Limiting (sliding window, per API key)	Redis-backed for cross-node consistency
Microservice calling an unreliable external API	Circuit Breaker	Set timeout <= your SLA budget
High-value transaction isolation	Bulkhead (dedicated thread + connection pool)	Payment pool must never share with analytics
Protecting origin from DDoS	CDN + WAF + Rate Limiter layered	Blackholing for repeat offenders
Service-to-service timeout cascade	Circuit Breaker + timeout (aggressive: 200ms)	Timeout must be less than circuit breaker window
Queue consumer falling behind	Backpressure	Consumer signals producer to slow down

🧭 Decision Guide: Which Pattern for Which Problem

Problem	Pattern	Configuration
Protect API from spam / DDoS	Rate Limiting	Token bucket, per-IP + per-token limits
Prevent cascading failure	Circuit Breaker	50% error rate threshold, 30s timeout
Isolate critical from non-critical work	Bulkhead	Separate thread pools per service tier
Handle burst traffic gracefully	Token Bucket	Capacity = max burst, refill = sustained rate
Retry safely without overloading	Exponential backoff + Jitter	Base 100ms, max 30s, full jitter
Circuit breaker keeps false-opening	Raise min_calls floor	Require 20+ calls before evaluating error rate
Recovery thundering herd	Jitter on half-open timeout	+/-5s random spread per client

Non-obvious edge cases:

Set circuit breaker recovery_timeout <= your SLA budget or users wait the full timeout before recovery is even attempted.
Rate limiting without a Retry-After header is UX-hostile — clients cannot back off intelligently without it.
A bulkhead with an unbounded queue is a time bomb. Always set queue_capacity explicitly.

🧪 Practical: Resilience4j Configuration Approach

Resilience4j provides composable annotations that layer circuit breaking, rate limiting, and bulkhead isolation directly on service methods. Each reliability pattern is configured independently by specifying key thresholds: the circuit breaker watches a sliding window of recent calls and trips when the failure rate exceeds a percentage (commonly 50%) across a minimum number of calls (commonly 10), then waits a configured duration in the open state before probing recovery. The rate limiter is set with a call limit per refresh period — for example, 100 requests per second per service instance. The bulkhead caps the maximum number of concurrent calls and defines how long a request waits for a slot before being rejected.

Each service method is decorated with annotations for all three concerns. When the circuit is open, a fallback method returns a safe default response — such as "service temporarily unavailable, please retry." When the rate limit is exceeded, the fallback returns an explicit error with a Retry-After hint. When the bulkhead is full, the fallback rejects immediately rather than queuing indefinitely.

Key observability signals to monitor:

Metric	What It Signals
Circuit breaker state	Whether the breaker is CLOSED, OPEN, or HALF_OPEN
Rate limiter waiting threads	Requests blocked waiting for a rate limit token
Bulkhead available concurrent calls	Remaining bulkhead capacity before rejection

When the circuit opens during a rolling deployment or degraded dependency, all three metrics become visible at once — the breaker moves to OPEN, waiting threads increase at the rate limiter, and bulkhead capacity drops. Probing recovery through the HALF_OPEN state resets these counters as service health is confirmed.

🛠️ Defense Stack: Authentication, Rate Limiting, and Reliability in Layers

A complete defense stack for a production API service combines authentication, per-client rate limiting, bulkhead isolation, and circuit breaking into a sequential request pipeline. Each layer is independent and can fail fast without involving the layers that follow it.

Authentication runs first. An incoming request presents a JWT or API key, which is validated against a signing secret or public key. If authentication fails, the request is rejected immediately with a 401 Unauthorized response before any business logic is executed.

Rate limiting runs second, scoped to the authenticated client identity. A token bucket per client identity is stored in a shared in-memory data store like Redis so that enforcement is consistent across all application nodes in the fleet. When a client exhausts its token budget, the response includes a Retry-After header specifying the exact number of seconds until the bucket refills — allowing well-behaved clients to back off cleanly rather than retrying in a tight loop.

Bulkhead isolation runs third, capping the number of concurrent calls that can reach critical downstream services. A dedicated thread pool per service tier — for example, separate pools for payment processing, analytics, and background jobs — prevents a slow or saturated dependency from starving all other request handling.

Circuit breaking runs last, wrapping the actual downstream call. If the downstream service has been failing beyond the configured error rate and call volume thresholds, the circuit opens and the request is returned a fallback immediately without any network call being made. After the recovery timeout elapses, a single probe request tests whether the service has recovered, and the circuit closes if the probe succeeds.

The combined sequence — authentication → rate limiting → bulkhead → circuit breaker — means that by the time a request reaches the actual downstream call, it has already passed through every protective gate. Rejections at any early stage are cheap and deterministic.

For a full deep-dive on distributed rate limiting with Redis, OAuth2 resource server patterns, and Bucket4j configuration, a dedicated follow-up post is planned.

📚 Key Lessons from Reliability Pattern Failures in Production

Rate limiting without a Retry-After header breaks clients — they cannot back off intelligently without knowing when to retry. Always return the header.
Circuit breakers need careful threshold tuning: too sensitive (low minimumNumberOfCalls) causes false opens during rolling deployments; too loose means cascading failures still propagate.
Bulkheads only work if you correctly identify which work is critical — equal-priority pools for payment and analytics defeats the purpose.
Never set circuit breaker recovery_timeout longer than your SLA — a 60-second recovery_timeout on a 500ms SLA means your fallback runs for a full minute before recovery is attempted.
The order of Resilience4j annotations matters — Bulkhead then Circuit Breaker then Rate Limiter (outermost first). Inverted order means a rate limit rejection counts as a circuit breaker failure.
Combine patterns at both the service mesh level (Istio/Envoy) for zero-code-change protection and at the library level for per-method control.

📌 TLDR: Summary & Key Takeaways

Token Bucket enforces per-client rate limits with allowance for small bursts; tokens(t) = min(C, tokens(t0) + r*(t-t0)).
Circuit Breaker (CLOSED to OPEN to HALF-OPEN) short-circuits failing calls using a sliding window error rate; set min_calls to avoid false opens.
Bulkhead compartmentalizes thread pools so slow dependencies cannot starve critical paths; always configure bounded queues.
DDoS defense is layered: CDN absorbs volume, WAF filters patterns, rate limiter blocks persistent abusers, blackholing drops the worst offenders.
Combine all three for defense in depth and add jitter to prevent thundering herd on circuit recovery.
Operational discipline matters as much as the patterns: monitor CB state, rate limit hits, and bulkhead rejections via Micrometer; tune thresholds against production traffic, not guesses.

Article tools

Explain simpler Compare approaches What next?

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata