API Gateway vs. Load Balancer vs. Reverse Proxy: What's the Difference?
They all sit in front of your servers. But do you need Nginx, HAProxy, or Kong? We clarify the roles of each component.
Abstract Algorithms
Intermediate
For developers with some experience. Builds on fundamentals.
Estimated read time: 13 min
AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: A Reverse Proxy hides your servers and handles caching/SSL. A Load Balancer spreads traffic across server instances. An API Gateway manages API concerns โ auth, rate limiting, routing, and protocol translation. Modern tools (Nginx, AWS ALB, Kong) often combine all three, but understanding what each layer does independently is essential for system design.
๐ The Basics: What Each Layer Does
Three terms โ reverse proxy, load balancer, API gateway โ describe components that all sit in front of your backend, yet serve distinct purposes. Confusing them leads to misconfigurations that are expensive to debug in production.
The one-sentence role of each:
| Component | Core job |
| Reverse Proxy | Hides backend servers; handles SSL, caching, and compression at the edge |
| Load Balancer | Distributes connections across multiple identical server instances |
| API Gateway | Enforces API-level policy: auth, rate limits, routing, and protocol translation |
The confusion arises because modern tools collapse all three. Nginx can act as a reverse proxy, load balancer, and primitive gateway simultaneously. AWS ALB is an L7 load balancer with some gateway features. Kong is a full API gateway with load balancing built in.
Understanding the layers independently makes it possible to diagnose issues correctly โ is the problem at the SSL termination layer, the traffic distribution layer, or the auth enforcement layer?
๐ Three Guards, One Door: The Traffic Handling Hierarchy
When a user's request leaves their browser, it goes through multiple intermediary layers before hitting your application. These layers have overlapping but distinct responsibilities:
Client
โ
โผ
Reverse Proxy โ "Who asked? Cache it if possible. SSL here."
โ
โผ
Load Balancer โ "Which server instance gets this?"
โ
โผ
API Gateway โ "Is this user authorized? Rate-limited? Which microservice?"
โ
โผ
Application Servers
All three components sit in front of your servers. The confusion arises because modern tools blur the boundaries โ Nginx can be all three simultaneously.
๐ Reverse Proxy: Hiding Your Servers and Doing the Boring Work
A Reverse Proxy intercepts incoming requests and forwards them to backend servers on behalf of the client. The client never knows the backend server's address.
What reverse proxies do:
| Responsibility | Why it matters |
| SSL/TLS termination | Offloads encryption from app servers |
| Static content caching | Reduces backend load for repeated requests |
| Compression (gzip, brotli) | Reduces response size to client |
| IP masking | Hides backend topology from clients |
| DDoS absorption | First line of defense against volumetric attacks |
Example (Nginx as reverse proxy):
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://backend_servers; # forwards to backend
}
}
Every request from the internet hits Nginx first. Backend servers only see Nginx's IP โ never the real client.
โ๏ธ Trade-offs & Failure Modes: Load Balancer: Distributing Traffic So No Server Burns
A Load Balancer distributes incoming connections across a pool of identical server instances. The goal is to prevent any single instance from being overwhelmed.
Distribution algorithms:
| Algorithm | How it works | Best for |
| Round Robin | Each server in sequence | Uniform, stateless requests |
| Least Connections | Route to server with fewest active connections | Variable request duration |
| IP Hash | Hash client IP โ sticky server | Sessions requiring same-server affinity |
| Weighted Round Robin | Assign proportionally more traffic to stronger servers | Mixed-capacity fleets |
Health checks: Load balancers continuously probe backends. If /health returns non-200 or times out, the server is removed from rotation โ automatically, without manual intervention.
graph LR
C[Client] --> LB[Load Balancer]
LB --> S1[Server 1]
LB --> S2[Server 2]
LB --> S3[Server 3]
S1 -->|health 200| LB
S2 -->|health 503| LB
LB -.-> S2
style S2 stroke-dasharray: 5 5
L4 vs. L7 load balancers:
- L4 (transport layer): Routes based on IP and TCP port. Extremely fast. Cannot see HTTP headers. Example: AWS NLB.
- L7 (application layer): Can route based on URL path, HTTP headers, cookies. Slightly slower but far more flexible. Example: AWS ALB.
๐ API Gateway: The Smart Layer That Knows About Your APIs
An API Gateway goes beyond traffic routing โ it enforces API-level concerns that are too application-specific for a reverse proxy.
Core API Gateway responsibilities:
| Feature | What it means |
| Authentication/Authorization | Validate JWT, OAuth2, API key before reaching the backend |
| Rate Limiting | Reject requests over the configured threshold (e.g., 100 req/min per user) |
| Request Transformation | Convert REST to gRPC, add/remove headers, rename fields |
| Routing | Route /v1/users to user-service, /v1/orders to order-service |
| Analytics & Logging | Track per-endpoint latency, error rates, usage by API key |
| Circuit Breaking | Stop forwarding to unhealthy downstream services |
Example (Kong route + rate-limit plugin):
services:
- nam
e: user-service
url: http://users:8080
routes:
- nam
e: users-route
paths: ["/v1/users"]
plugins:
- nam
e: rate-limiting
config:
minute: 100
policy: local
- nam
e: jwt
No app code changes needed โ auth and rate limiting are enforced at the gateway.
๐ API Gateway Request Flow
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant AU as Auth Service
participant MS as Microservice
C->>GW: HTTP Request
GW->>AU: Authenticate token
AU-->>GW: Token valid
GW->>GW: Rate limit check
GW->>MS: Route to service
MS-->>C: Response
โ๏ธ Core Mechanics: What Each Layer Actually Does to Requests
Understanding the mechanics helps you choose the right tool and configure it correctly.
Reverse proxy mechanics: The proxy receives the client's request and opens a new connection to the backend on the client's behalf. The backend only ever sees the proxy's IP. SSL is terminated at the proxy โ the backend connection can be plain HTTP internally.
Load balancer mechanics: The load balancer maintains a pool of backend instances and a health check loop. Every incoming connection is assigned to one instance based on the configured algorithm. If a health check fails, the instance is removed from rotation automatically.
API gateway mechanics: The gateway applies a policy pipeline to every request: authenticate โ authorize โ rate-limit check โ request transformation โ route to backend โ response transformation. Each step can be configured independently per route. The gateway can talk to multiple downstream services within a single client request (request fan-out).
The key mechanical difference:
| Mechanism | Operates at | Aware of HTTP content? | Per-user state? |
| Reverse Proxy | L4/L7 | Partially (headers, URL) | No |
| Load Balancer L4 | L4 (TCP/IP) | No | No |
| Load Balancer L7 | L7 (HTTP) | Yes (headers, paths) | Via sticky sessions |
| API Gateway | L7 | Fully | Yes (per API key / per user) |
This table explains why you cannot enforce per-user rate limiting at the load balancer โ it does not maintain per-user state by default.
๐ L4 vs L7 vs Proxy Routing
flowchart LR
TR[Incoming Traffic] --> LB[Load Balancer L4]
TR --> RP[Reverse Proxy L7]
TR --> GW[API Gateway L7+]
LB --> SV[Backend Servers]
RP --> SV
GW --> AU[Auth + Rate Limit]
AU --> MS[Microservices]
๐ง Deep Dive: Why Rate Limiting Belongs at the Gateway, Not the Load Balancer
Reverse proxies and L4 load balancers are stateless โ they route packets without knowing anything about the user making the request. An API Gateway is stateful per consumer: it tracks request counts against rate-limit quotas, validates JWT claims, and maps API keys to consumer identities. This per-user state is what makes rate limiting possible at the gateway layer but not at the load balancer โ which sees only connections, not identities.
๐ How a Request Travels Through All Three Layers
flowchart TD
Client[Browser / Mobile Client] --> RP[Reverse Proxy Nginx SSL, cache, compress]
RP -->|Cache miss or dynamic| LB[Load Balancer AWS ALB distribute traffic]
LB --> GW[API Gateway Kong auth, rate limit, route]
GW --> US[User Service : 8080]
GW --> OS[Order Service : 8081]
GW --> PS[Payment Service : 8082]
US -->|health 200| LB
OS -->|health 200| LB
PS -->|health 503| LB
LB -.->|removed from pool| PS
The three layers handle distinctly different concerns as the request descends:
- Reverse Proxy handles transport: SSL decryption, static caching, DDoS absorption.
- Load Balancer handles distribution: which instance handles this request, and is that instance healthy?
- API Gateway handles application policy: who is this user, are they allowed, which service should respond?
๐ Real-World Applications: Real-World Deployment Patterns
Most production stacks do not cleanly separate these three layers into distinct products โ they use tools that collapse two or three roles, but the logical separation still matters for debugging and ownership:
| Tool | Acts as |
| Nginx | Reverse proxy + basic load balancer |
| AWS ALB | L7 load balancer + some gateway features |
| AWS API Gateway | Full gateway + rate limiting + auth |
| Kong | API gateway + rate limit + plugin ecosystem |
| Envoy | Sidecar proxy + LB + service mesh component |
Netflix uses Nginx at the edge for SSL termination and static asset caching, AWS Global Accelerator for regional traffic routing, and Zuul (their internal gateway) for per-service auth and rate limiting. Each layer is independently operable โ an Nginx config change doesn't touch auth policy, and a Zuul plugin update doesn't affect load-balancing weights.
Stripe runs a similar topology: Nginx terminates SSL and absorbs volumetric floods, HAProxy distributes to API pods, and their internal gateway enforces the API key validation and per-customer rate limits that power their public API offering. The clean separation means their platform team can own rate-limiting policy without touching the Nginx configurations owned by the infra team.
๐งช Practical Configuration Guide
Scenario: Adding rate limiting to a new endpoint without touching application code.
# Kong declarative config โ add rate limiting to /v1/orders
services:
- nam
e: order-service
url: http://orders:8081
routes:
- nam
e: orders-route
paths: ["/v1/orders"]
plugins:
- nam
e: rate-limiting
config:
minute: 60 # 60 requests per minute per consumer
policy: redis # shared state across gateway instances
- nam
e: jwt # auth before rate limiting
No code change to the Order Service. The gateway enforces both auth and rate limiting declaratively.
Scenario: Checking if a server is actually removed from the load balancer pool.
# AWS ALB โ list target health for a target group
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:...
# Output shows which instances are "healthy", "unhealthy", or "draining"
Scenario: Verifying SSL terminates at the reverse proxy, not the app.
# Check the certificate on the public-facing endpoint
openssl s_client -connect api.example.com:443 -brief
# Check the internal connection (should be plain HTTP)
curl -v http://internal-backend:8080/health
๐งญ Decision Guide: Choosing the Right Layer: Decision Guide
| You need... | Use |
| Hide backend server IPs, SSL offload, edge caching | Reverse Proxy (Nginx, Cloudflare) |
| Spread traffic across identical server instances | Load Balancer (ALB, HAProxy) |
| Auth, rate limiting, per-route routing | API Gateway (Kong, AWS API GW) |
| All three in a single tool | Nginx with plugins, or a cloud-managed API Gateway |
| Low-latency L4 routing for TCP services | L4 Load Balancer (NLB) |
| Path/header-based routing | L7 Load Balancer (ALB) |
๐ฏ What to Learn Next
- System Design Networking: DNS, CDNs, and Load Balancers
- System Design Core Concepts: Scalability, CAP, and Consistency
- Webhooks Explained: Don't Call Us, We'll Call You
๐ ๏ธ Spring Cloud Gateway: Routing, Rate Limiting, and Auth Filters in Java
Spring Cloud Gateway (SCG) is a reactive, non-blocking API gateway built on Spring WebFlux that implements the API Gateway layer described in this post โ routing, rate limiting, authentication, and circuit breaking โ without any code changes in downstream services.
Routes are declared in application.yml (or a Java DSL); filters attach cross-cutting policy to each route. SCG integrates natively with Spring Security for JWT validation, Redis for distributed rate-limit counters, and Resilience4j for circuit breaking.
# application.yml โ Spring Cloud Gateway declarative route configuration
spring:
cloud:
gateway:
routes:
# Route 1: user-service โ auth + rate limiting
- id: user-service
uri: lb://user-service # lb:// resolves via Eureka/Kubernetes
predicates:
- Path=/v1/users/**
filters:
- nam
e: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100 # 100 req/sec baseline
redis-rate-limiter.burstCapacity: 200 # allows short bursts
key-resolver: "#{@userKeyResolver}" # rate-limit per user ID
# Route 2: order-service โ circuit breaker + header rewrite
- id: order-service
uri: lb://order-service
predicates:
- Path=/v1/orders/**
filters:
- nam
e: CircuitBreaker
args:
name: orderCB
fallbackUri: forward:/fallback/orders
- AddRequestHeader=X-Gateway-Source, spring-cloud-gateway
- StripPrefix=1 # remove /v1 before forwarding
// KeyResolver โ rate-limit by authenticated user extracted from JWT claim
@Bean
public KeyResolver userKeyResolver() {
return exchange -> exchange.getPrincipal()
.map(java.security.Principal::getName)
.defaultIfEmpty("anonymous");
}
// Fallback controller โ called when circuit breaker trips
@RestController
public class FallbackController {
@GetMapping("/fallback/orders")
public ResponseEntity<Map<String, String>> orderFallback() {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of("error", "Order service is temporarily unavailable"));
}
}
The Order Service and User Service require zero code changes โ rate limiting, circuit breaking, and JWT extraction are enforced entirely at the gateway layer via configuration.
NGINX as a Reverse Proxy (the layer before the gateway):
# nginx.conf โ SSL termination + compression before traffic reaches the gateway
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/certs/api.crt;
ssl_certificate_key /etc/ssl/private/api.key;
gzip on;
gzip_types application/json text/plain;
location / {
proxy_pass http://spring-cloud-gateway:8080;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto https;
}
}
NGINX terminates SSL and compresses responses at the edge; Spring Cloud Gateway enforces auth and routing at the API layer โ each component owns exactly one layer.
For a full deep-dive on Spring Cloud Gateway's filter chain, Spring Security JWT integration, and Resilience4j circuit breaker configuration, a dedicated follow-up post is planned.
๐ Lessons from Production Deployments
Lesson 1: L4 vs. L7 is a latency vs. flexibility trade-off. L4 load balancers route by IP/port and are extremely fast because they don't inspect HTTP content. L7 load balancers parse headers and paths, enabling smart routing โ but add ~0.5โ2ms per request. For most applications the flexibility of L7 is worth the cost. For extremely latency-sensitive TCP services (databases, game servers), L4 is preferable.
Lesson 2: The health check endpoint is a critical dependency.
If /health returns 200 even when the service is degraded (database connection down, dependencies unavailable), the load balancer keeps sending traffic to a broken instance. Implement deep health checks that validate actual functionality, not just HTTP reachability.
Lesson 3: Gateway plugins accumulate and slow down requests. Each gateway plugin (auth, rate-limit, logging, transformation) adds processing time to every request. Profile your gateway pipeline regularly and remove plugins that are no longer needed. A gateway with 8 plugins enabled on every route can add 20โ50ms of latency that appears as application slowness.
Lesson 4: Don't put your entire auth logic in the gateway. The gateway validates tokens (JWT signature, expiry, audience). It does not know your application's permission model. Fine-grained authorization (can this user edit this order?) belongs in the service itself.
๐ TLDR: Summary & Key Takeaways
- Reverse proxy: hides backends, offloads SSL, caches static content. Client never knows your server's real address.
- Load balancer: distributes connections across instances using round-robin, least-connections, or IP hash.
- API Gateway: enforces auth, rate limits, routing, and transformation at the HTTP/API layer.
- Health checks are the load balancer's mechanism to remove failed instances without human intervention.
- L4 load balancers route by IP/port (fast); L7 route by headers/paths (flexible).
๐ Related Posts
- System Design Networking: DNS, CDNs, and Load Balancers
- System Design Core Concepts: Scalability, CAP, and Consistency
- Webhooks Explained: Don't Call Us, We'll Call You
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Clock Skew and Causality Violations: Why Distributed Clocks Lie
TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions โ but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...
Stale Reads and Cascading Failures in Distributed Systems
TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable โ stale reads...
NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data
TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node โ virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...
Split Brain Explained: When Two Nodes Both Think They Are Leader
TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader โ each accepting writes the other never sees. Prevent it with quorum consensus (at least โN/2โ+1 nodes must agree before leadership is g...
