All Posts

System Design Advanced: Security, Rate Limiting, and Reliability

How do you protect your API from hackers and traffic spikes? We cover Rate Limiting algorithms (T...

Abstract AlgorithmsAbstract Algorithms
ยทยท14 min read

AI-assisted content.

TLDR: Three reliability tools every backend system needs: Rate Limiting prevents API spam and DDoS, Circuit Breakers stop cascading failures when downstream services degrade, and Bulkheads isolate failure blast radius. Knowing when and how to combine them separates junior from senior system design.


๐Ÿ“– The TLDR: A Layered Defense for Distributed Systems

A house electrical panel has three layers of protection:

  1. Fuse/breaker per circuit โ€” no single appliance can knock out the house.
  2. Main breaker โ€” kills everything if total load is too dangerous.
  3. Surge protector โ€” absorbs voltage spikes before they reach appliances.

Distributed systems need the same layered defense at the API gateway, service-to-service, and individual thread pool level. Rate limiting is your surge protector at the edge. Circuit breakers are the fuse on each service call. Bulkheads are the per-circuit isolation that keeps one slow dependency from tripping everything else.


โš™๏ธ Rate Limiting and Circuit Breaking: The Two Inbound Guards

Rate limiting is enforced at the API Gateway or reverse proxy layer before requests reach your application.

Token Bucket Algorithm

Each client gets a bucket of tokens. One token equals one request. Tokens refill at a fixed rate.

Bucket capacity = 100 requests
Refill rate = 10 tokens/second
If tokens > 0: allow request, decrement token
If tokens == 0: return HTTP 429 Too Many Requests
AlgorithmBurst HandlingUse Case
Token BucketAllows small bursts up to bucket sizeAPI rate limits per user
Leaky BucketNo bursts, constant output rateSmoothing traffic, QoS
Fixed WindowLarge bursts possible at window boundarySimple, low-overhead admin limits
Sliding WindowSmooth rate, no boundary spikesProduction API gateways (most common)

DDoS Defense: Layered Response

flowchart LR
    Internet[Internet Traffic] --> CDN[CDN
(absorb volumetric attacks)]
    CDN --> WAF[WAF
(block malicious patterns)]
    WAF --> RL[Rate Limiter
(per-IP / per-token limits)]
    RL --> App[Application Servers]
    RL -->|IP repeatedly violates| BH[Blackholing
(drop to /dev/null)]

Blackholing routes the attacker's traffic to a null interface โ€” no response, minimal server overhead. Used by ISPs and CDN providers against volumetric attacks.

Circuit Breaker States

Without a circuit breaker, a slow downstream service blocks all your threads, fills your thread pool, and the cascade propagates upward. A circuit breaker short-circuits this by tracking error rates and failing fast once a threshold is crossed.

stateDiagram-v2
    [*] --> CLOSED : System healthy
    CLOSED --> OPEN : Error rate > threshold (e.g., 50% in 10s)
    OPEN --> HALF_OPEN : After timeout (e.g., 30s)
    HALF_OPEN --> CLOSED : Probe request succeeds
    HALF_OPEN --> OPEN : Probe request fails
StateBehaviorWhen
CLOSEDAll requests pass throughNormal operation
OPENAll requests fail fast (no actual call)After too many failures
HALF-OPENOne probe request allowedAfter recovery timeout
from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=30)
def call_payment_service(order_id: str):
    return requests.post(PAYMENT_URL, json={"order_id": order_id}, timeout=2)

When call_payment_service() fails 5 times, subsequent calls raise CircuitBreakerError immediately โ€” no actual network call, no blocked threads.


๐Ÿง  Deep Dive: Reliability Patterns Under the Hood

The Bulkhead pattern is named after ship hull compartments: if one compartment floods, the rest stay dry. In software, give different traffic types separate thread pools and separate connection pools.

Critical Payments Pool:   20 threads  (isolated)
Analytics Pool:            5 threads  (isolated)
Background Job Pool:      10 threads  (isolated)

If the analytics pool saturates, the payment pool is unaffected. Without bulkheads, one slow operation starves everything.

Internals: How Circuit Breakers Track State

A circuit breaker's core data structure is a sliding window counter โ€” a ring buffer of the last N request outcomes. Each completion (success or failure) is written into the current slot. The window slides periodically and the oldest slot is discarded.

Error rate is computed as:

error_rate = failures_in_window / total_in_window

When error_rate crosses the threshold (commonly 50%), the state transitions to OPEN and all requests short-circuit without touching the downstream service. The half-open probe is a time-delayed single request that asks "is it safe to close again?" without flooding a recovering service.

Bulkhead internals are simpler: a bounded ThreadPoolExecutor per service tier. Calls exceeding the pool queue depth are rejected immediately with a BulkheadFullException rather than queueing indefinitely.

Performance Analysis: Overhead Per Reliability Layer

PatternTypical OverheadNotes
Rate Limiter (in-process)< 0.05 msAtomic counter + timestamp
Rate Limiter (Redis)1-2 ms2x Redis round trips per request
Circuit Breaker (local)~0.1 msLock-free ring buffer read/write
Bulkhead (thread pool)< 0.01 msqueue.offer() + thread dispatch

Use local in-process rate limiting for high-throughput endpoints and reserve Redis for cross-node consistency across a fleet of servers.

Mathematical Model: Token Bucket Formalism

Let C = bucket capacity (tokens), r = refill rate (tokens/second), t0 = time of last refill, and tokens(t0) = token count at t0.

At time t, the available token count is:

tokens(t) = min(C, tokens(t0) + r * (t - t0))

A request is allowed if tokens(t) >= 1, then tokens(t) is decremented by 1. A request is rejected (HTTP 429) if tokens(t) = 0.

Worked example: C = 10, r = 5 tokens/sec, tokens(t0) = 0 at t0 = 0.

  • At t = 1s: tokens = min(10, 5) = 5, so 5 requests proceed.
  • At t = 2s after those 5 are consumed: tokens = 5 again.
  • Rapid burst of 6 requests at t = 0.1s with empty bucket: all 6 rejected with 429.

For circuit breakers, the error rate over sliding window W is:

error_rate(t) = errors_in_[t-W, t] / total_calls_in_[t-W, t]

Trip the breaker when error_rate >= threshold AND total_calls >= min_calls. The min_calls floor prevents false opens during low-traffic periods.

๐Ÿ“Š Token Bucket Rate Limiting: Request Flow

sequenceDiagram
    participant C as Client
    participant RL as Rate Limiter
    participant B as Token Bucket
    participant API as API Server

    C->>RL: Request (API key: user_42)
    RL->>B: Check tokens for user_42
    B-->>RL: tokens = 5 (bucket not empty)
    RL->>B: Decrement token count
    RL->>API: Forward request
    API-->>C: 200 OK

    C->>RL: Request (burst: 10 rapid calls)
    RL->>B: Check tokens for user_42
    B-->>RL: tokens = 0 (bucket empty)
    RL-->>C: 429 Too Many Requests (Retry-After: 1s)
    Note over B: Tokens refill at r/sec
    B->>B: Refill +10 tokens (1s elapsed)
    C->>RL: Retry after 1s
    RL->>B: Check tokens
    B-->>RL: tokens = 10
    RL->>API: Forward request
    API-->>C: 200 OK

This sequence diagram walks through two distinct token bucket scenarios for the same client. In the first scenario, tokens are available and the request is forwarded normally. In the second, a burst of 10 rapid calls exhausts the bucket and the rate limiter returns 429 Too Many Requests with a Retry-After header โ€” critical UX that tells the client exactly when to retry. After 1 second of token refill, the next request succeeds. The takeaway is that a well-implemented rate limiter communicates when to try again, not just whether to block.


๐Ÿ—๏ธ Advanced Concepts: Combining Patterns for Defense in Depth

Each pattern in isolation solves one failure mode. Together they form defense in depth โ€” no single layer is the only thing standing between your system and failure.

The retry storm problem illustrates why combining patterns matters. Add exponential backoff retries to every service call. A downstream service degrades. All upstream clients retry with growing delays โ€” a thundering herd. A circuit breaker solves this: once OPEN, retries stop entirely until the half-open probe confirms recovery.

Backpressure signals upstream callers to slow their send rate rather than dropping requests silently. Bulkheads emit BulkheadFullException; Kafka consumers use lag metrics to throttle producers; gRPC uses HTTP/2 flow control.

Jitter on circuit close prevents thundering herd on recovery. Instead of all clients retrying at t + 30s, each uses t + 30s + random(0, 5s), smoothing the re-entry spike across a 5-second window.

Failure ModePattern CombinationKey Config
DDoS / API spamRate Limiter + WAFPer-IP limits, burst cap
Cascading slow dependencyCircuit Breaker + timeout50% error rate, 200ms timeout
Thread starvationBulkhead + Circuit BreakerIsolated pools, fail-fast on full
Retry stormsCircuit Breaker + backoff + jitterBase 100ms, max 30s, full jitter
Thundering herd on recoveryJitter on recovery timeout+/-5s random spread per client

๐Ÿ“Š System Flow: Request Through the Reliability Stack

Every inbound request traverses all three layers. The order matters: rate limit first, then bulkhead, then circuit breaker.

flowchart LR
    Client[Client Request] --> RL[Rate Limiter
Token Bucket]
    RL -->|429 if limit hit| Reject[Reject: 429]
    RL --> BH[Bulkhead
Thread Pool]
    BH -->|503 if pool full| Full[Reject: 503]
    BH --> CB[Circuit Breaker
CLOSED / OPEN / HALF-OPEN]
    CB -->|OPEN: fail fast| FB[Fallback Response]
    CB -->|CLOSED: call through| Svc[Downstream Service]
    Svc -->|success| CB
    Svc -->|failure threshold hit| CB

At each gate, a rejection is cheap and deterministic. By the time a request reaches the downstream service it has already proven: within rate limits, a thread is available, and the target is believed healthy. This is fail early, fail fast.


๐ŸŒ Real-World Applications: Where These Patterns Run in Production

Rate Limiting in the wild:

  • GitHub API enforces 5,000 requests/hour per authenticated token. Unauthenticated requests are limited to 60/hour to force authentication.
  • Stripe limits to 100 requests/second per secret key. Exceeding this returns 429 with a Retry-After header; Stripe SDKs implement automatic exponential backoff.
  • AWS API Gateway offers per-method throttling and account-level throttling with separate burst limits per deployment stage.

Circuit Breakers in the wild:

  • Netflix Hystrix was the original widely-adopted circuit breaker library, used to isolate microservices so a recommendations failure would not take down video streaming. Succeeded by Resilience4j.
  • Istio service mesh implements circuit breaking at the Envoy sidecar proxy so every service-to-service call is guarded without application code changes.

Bulkheads in the wild:

  • Kubernetes resource limits and namespace quotas act as infrastructure-level bulkheads preventing rogue jobs from consuming all cluster CPU.
  • Payment processing APIs universally use isolated thread pools so payment authorization never competes with analytics or background jobs.

โš–๏ธ Trade-offs & Failure Modes: Trade-offs and Failure Modes: Choosing the Right Pattern

ScenarioPatternNotes
Public API with free and paid tiersRate Limiting (sliding window, per API key)Redis-backed for cross-node consistency
Microservice calling an unreliable external APICircuit BreakerSet timeout <= your SLA budget
High-value transaction isolationBulkhead (dedicated thread + connection pool)Payment pool must never share with analytics
Protecting origin from DDoSCDN + WAF + Rate Limiter layeredBlackholing for repeat offenders
Service-to-service timeout cascadeCircuit Breaker + timeout (aggressive: 200ms)Timeout must be less than circuit breaker window
Queue consumer falling behindBackpressureConsumer signals producer to slow down

๐Ÿงญ Decision Guide: Which Pattern for Which Problem

ProblemPatternConfiguration
Protect API from spam / DDoSRate LimitingToken bucket, per-IP + per-token limits
Prevent cascading failureCircuit Breaker50% error rate threshold, 30s timeout
Isolate critical from non-critical workBulkheadSeparate thread pools per service tier
Handle burst traffic gracefullyToken BucketCapacity = max burst, refill = sustained rate
Retry safely without overloadingExponential backoff + JitterBase 100ms, max 30s, full jitter
Circuit breaker keeps false-openingRaise min_calls floorRequire 20+ calls before evaluating error rate
Recovery thundering herdJitter on half-open timeout+/-5s random spread per client

Non-obvious edge cases:

  • Set circuit breaker recovery_timeout <= your SLA budget or users wait the full timeout before recovery is even attempted.
  • Rate limiting without a Retry-After header is UX-hostile โ€” clients cannot back off intelligently without it.
  • A bulkhead with an unbounded queue is a time bomb. Always set queue_capacity explicitly.

๐Ÿงช Practical: Configuring Resilience4j in Spring Boot

Resilience4j provides composable annotations that layer circuit breaking, rate limiting, and bulkhead isolation directly on service methods.

resilience4j:
  circuitbreaker:
    instances:
      paymentService:
        slidingWindowSize: 20
        minimumNumberOfCalls: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
  ratelimiter:
    instances:
      paymentService:
        limitForPeriod: 100
        limitRefreshPeriod: 1s
  bulkhead:
    instances:
      paymentService:
        maxConcurrentCalls: 20
        maxWaitDuration: 10ms
@Service
public class PaymentService {

    @Bulkhead(name = "paymentService", fallbackMethod = "bulkheadFallback")
    @CircuitBreaker(name = "paymentService", fallbackMethod = "circuitFallback")
    @RateLimiter(name = "paymentService", fallbackMethod = "rateFallback")
    public PaymentResult processPayment(Order order) {
        return paymentGatewayClient.charge(order);
    }

    public PaymentResult circuitFallback(Order order, CallNotPermittedException ex) {
        return PaymentResult.retry("Payment gateway temporarily unavailable.");
    }

    public PaymentResult rateFallback(Order order, RequestNotPermitted ex) {
        return PaymentResult.error("Rate limit exceeded. Retry-After: 1s");
    }
}

Testing the circuit breaker: Simulate failures by mocking the gateway to throw 500 errors. After minimumNumberOfCalls (10) with >= 50% failure rate, the circuit opens. Verify via Actuator: curl localhost:8080/actuator/health | jq '.components.circuitBreakers'. The paymentService entry shows state: OPEN. After 30 seconds it transitions to HALF_OPEN and probe calls determine whether it closes.

Micrometer metrics to monitor:

MetricWhat It Signals
resilience4j_circuitbreaker_stateCurrent CB state (0=CLOSED, 1=OPEN, 2=HALF_OPEN)
resilience4j_ratelimiter_waiting_threadsRequests blocked waiting for a token
resilience4j_bulkhead_available_concurrent_callsRemaining bulkhead capacity

๐Ÿ› ๏ธ Spring Security, Bucket4j, and Resilience4j: A Complete Spring Boot Defense Stack

Spring Security is the standard authentication and authorization framework for Spring Boot, providing a filter chain that intercepts every HTTP request. Bucket4j is a Java rate-limiting library built on the Token Bucket algorithm, with optional Redis-backed distributed buckets for fleet-wide enforcement. Together with Resilience4j (shown in the practical section above), these three libraries form a complete, annotation-driven reliability stack for Spring Boot microservices.

// Spring Security JWT filter + Bucket4j per-client rate limiter
@Component
@RequiredArgsConstructor
public class JwtRateLimitingFilter extends OncePerRequestFilter {

    private final JwtTokenProvider    tokenProvider;
    private final BucketRepository    buckets;        // Bucket4j + Redis backend

    @Override
    protected void doFilterInternal(HttpServletRequest  request,
                                    HttpServletResponse response,
                                    FilterChain         chain)
            throws ServletException, IOException {

        // 1. Validate JWT โ€” authenticate the client
        String token = resolveToken(request);
        if (token == null || !tokenProvider.validateToken(token)) {
            response.sendError(HttpServletResponse.SC_UNAUTHORIZED, "Invalid token");
            return;
        }
        String clientId = tokenProvider.getSubject(token);

        // 2. Enforce per-client rate limit via Bucket4j Token Bucket
        Bucket bucket = buckets.getOrCreate(clientId,
                Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))));
        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        if (!probe.isConsumed()) {
            long retryAfterSec = probe.getNanosToWaitForRefill() / 1_000_000_000;
            response.setHeader("X-RateLimit-Remaining", "0");
            response.setHeader("Retry-After", String.valueOf(retryAfterSec));
            response.sendError(429, "Rate limit exceeded. Retry-After: " + retryAfterSec + "s");
            return;
        }
        response.setHeader("X-RateLimit-Remaining",
                String.valueOf(probe.getRemainingTokens()));

        chain.doFilter(request, response);
    }

    private String resolveToken(HttpServletRequest req) {
        String bearer = req.getHeader("Authorization");
        return (bearer != null && bearer.startsWith("Bearer "))
                ? bearer.substring(7) : null;
    }
}

The filter executes in the Spring Security filter chain: JWT validation runs first (authentication gate), then Bucket4j checks the per-client token bucket before any request reaches business logic. Combine with the Resilience4j @CircuitBreaker and @Bulkhead from the Practical section above for complete defense in depth โ€” authentication โ†’ rate limiting โ†’ bulkhead โ†’ circuit breaker.

For a full deep-dive on Spring Security OAuth2 resource server configuration and Bucket4j distributed Redis mode, a dedicated follow-up post is planned.


๐Ÿ“š Key Lessons from Reliability Pattern Failures in Production

  • Rate limiting without a Retry-After header breaks clients โ€” they cannot back off intelligently without knowing when to retry. Always return the header.
  • Circuit breakers need careful threshold tuning: too sensitive (low minimumNumberOfCalls) causes false opens during rolling deployments; too loose means cascading failures still propagate.
  • Bulkheads only work if you correctly identify which work is critical โ€” equal-priority pools for payment and analytics defeats the purpose.
  • Never set circuit breaker recovery_timeout longer than your SLA โ€” a 60-second recovery_timeout on a 500ms SLA means your fallback runs for a full minute before recovery is attempted.
  • The order of Resilience4j annotations matters โ€” Bulkhead then Circuit Breaker then Rate Limiter (outermost first). Inverted order means a rate limit rejection counts as a circuit breaker failure.
  • Combine patterns at both the service mesh level (Istio/Envoy) for zero-code-change protection and at the library level for per-method control.

๐Ÿ“Œ TLDR: Summary & Key Takeaways

  • Token Bucket enforces per-client rate limits with allowance for small bursts; tokens(t) = min(C, tokens(t0) + r*(t-t0)).
  • Circuit Breaker (CLOSED to OPEN to HALF-OPEN) short-circuits failing calls using a sliding window error rate; set min_calls to avoid false opens.
  • Bulkhead compartmentalizes thread pools so slow dependencies cannot starve critical paths; always configure bounded queues.
  • DDoS defense is layered: CDN absorbs volume, WAF filters patterns, rate limiter blocks persistent abusers, blackholing drops the worst offenders.
  • Combine all three for defense in depth and add jitter to prevent thundering herd on circuit recovery.
  • Operational discipline matters as much as the patterns: monitor CB state, rate limit hits, and bulkhead rejections via Micrometer; tune thresholds against production traffic, not guesses.


Share
Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms