API Gateway vs. Load Balancer vs. Reverse Proxy: What's the Difference?
They all sit in front of your servers. But do you need Nginx, HAProxy, or Kong? We clarify the roles of each component.
Abstract Algorithms
TLDR: A Reverse Proxy hides your servers and handles caching/SSL. A Load Balancer spreads traffic across server instances. An API Gateway manages API concerns โ auth, rate limiting, routing, and protocol translation. Modern tools (Nginx, AWS ALB, Kong) often combine all three, but understanding what each layer does independently is essential for system design.
๐ The Basics: What Each Layer Does
Three terms โ reverse proxy, load balancer, API gateway โ describe components that all sit in front of your backend, yet serve distinct purposes. Confusing them leads to misconfigurations that are expensive to debug in production.
The one-sentence role of each:
| Component | Core job |
| Reverse Proxy | Hides backend servers; handles SSL, caching, and compression at the edge |
| Load Balancer | Distributes connections across multiple identical server instances |
| API Gateway | Enforces API-level policy: auth, rate limits, routing, and protocol translation |
The confusion arises because modern tools collapse all three. Nginx can act as a reverse proxy, load balancer, and primitive gateway simultaneously. AWS ALB is an L7 load balancer with some gateway features. Kong is a full API gateway with load balancing built in.
Understanding the layers independently makes it possible to diagnose issues correctly โ is the problem at the SSL termination layer, the traffic distribution layer, or the auth enforcement layer?
๐ Three Guards, One Door: The Traffic Handling Hierarchy
When a user's request leaves their browser, it goes through multiple intermediary layers before hitting your application. These layers have overlapping but distinct responsibilities:
Client
โ
โผ
Reverse Proxy โ "Who asked? Cache it if possible. SSL here."
โ
โผ
Load Balancer โ "Which server instance gets this?"
โ
โผ
API Gateway โ "Is this user authorized? Rate-limited? Which microservice?"
โ
โผ
Application Servers
All three components sit in front of your servers. The confusion arises because modern tools blur the boundaries โ Nginx can be all three simultaneously.
๐ Reverse Proxy: Hiding Your Servers and Doing the Boring Work
A Reverse Proxy intercepts incoming requests and forwards them to backend servers on behalf of the client. The client never knows the backend server's address.
What reverse proxies do:
| Responsibility | Why it matters |
| SSL/TLS termination | Offloads encryption from app servers |
| Static content caching | Reduces backend load for repeated requests |
| Compression (gzip, brotli) | Reduces response size to client |
| IP masking | Hides backend topology from clients |
| DDoS absorption | First line of defense against volumetric attacks |
Example (Nginx as reverse proxy):
server {
listen 443 ssl;
server_name api.example.com;
ssl_certificate /path/to/cert.pem;
ssl_certificate_key /path/to/key.pem;
location / {
proxy_pass http://backend_servers; # forwards to backend
}
}
Every request from the internet hits Nginx first. Backend servers only see Nginx's IP โ never the real client.
โ๏ธ Trade-offs & Failure Modes: Load Balancer: Distributing Traffic So No Server Burns
A Load Balancer distributes incoming connections across a pool of identical server instances. The goal is to prevent any single instance from being overwhelmed.
Distribution algorithms:
| Algorithm | How it works | Best for |
| Round Robin | Each server in sequence | Uniform, stateless requests |
| Least Connections | Route to server with fewest active connections | Variable request duration |
| IP Hash | Hash client IP โ sticky server | Sessions requiring same-server affinity |
| Weighted Round Robin | Assign proportionally more traffic to stronger servers | Mixed-capacity fleets |
Health checks: Load balancers continuously probe backends. If /health returns non-200 or times out, the server is removed from rotation โ automatically, without manual intervention.
graph LR
C[Client] --> LB[Load Balancer]
LB --> S1[Server 1]
LB --> S2[Server 2]
LB --> S3[Server 3]
S1 -->|health 200| LB
S2 -->|health 503| LB
LB -.-> S2
style S2 stroke-dasharray: 5 5
L4 vs. L7 load balancers:
- L4 (transport layer): Routes based on IP and TCP port. Extremely fast. Cannot see HTTP headers. Example: AWS NLB.
- L7 (application layer): Can route based on URL path, HTTP headers, cookies. Slightly slower but far more flexible. Example: AWS ALB.
๐ API Gateway: The Smart Layer That Knows About Your APIs
An API Gateway goes beyond traffic routing โ it enforces API-level concerns that are too application-specific for a reverse proxy.
Core API Gateway responsibilities:
| Feature | What it means |
| Authentication/Authorization | Validate JWT, OAuth2, API key before reaching the backend |
| Rate Limiting | Reject requests over the configured threshold (e.g., 100 req/min per user) |
| Request Transformation | Convert REST to gRPC, add/remove headers, rename fields |
| Routing | Route /v1/users to user-service, /v1/orders to order-service |
| Analytics & Logging | Track per-endpoint latency, error rates, usage by API key |
| Circuit Breaking | Stop forwarding to unhealthy downstream services |
Example (Kong route + rate-limit plugin):
services:
- nam
e: user-service
url: http://users:8080
routes:
- nam
e: users-route
paths: ["/v1/users"]
plugins:
- nam
e: rate-limiting
config:
minute: 100
policy: local
- nam
e: jwt
No app code changes needed โ auth and rate limiting are enforced at the gateway.
๐ API Gateway Request Flow
sequenceDiagram
participant C as Client
participant GW as API Gateway
participant AU as Auth Service
participant MS as Microservice
C->>GW: HTTP Request
GW->>AU: Authenticate token
AU-->>GW: Token valid
GW->>GW: Rate limit check
GW->>MS: Route to service
MS-->>C: Response
โ๏ธ Core Mechanics: What Each Layer Actually Does to Requests
Understanding the mechanics helps you choose the right tool and configure it correctly.
Reverse proxy mechanics: The proxy receives the client's request and opens a new connection to the backend on the client's behalf. The backend only ever sees the proxy's IP. SSL is terminated at the proxy โ the backend connection can be plain HTTP internally.
Load balancer mechanics: The load balancer maintains a pool of backend instances and a health check loop. Every incoming connection is assigned to one instance based on the configured algorithm. If a health check fails, the instance is removed from rotation automatically.
API gateway mechanics: The gateway applies a policy pipeline to every request: authenticate โ authorize โ rate-limit check โ request transformation โ route to backend โ response transformation. Each step can be configured independently per route. The gateway can talk to multiple downstream services within a single client request (request fan-out).
The key mechanical difference:
| Mechanism | Operates at | Aware of HTTP content? | Per-user state? |
| Reverse Proxy | L4/L7 | Partially (headers, URL) | No |
| Load Balancer L4 | L4 (TCP/IP) | No | No |
| Load Balancer L7 | L7 (HTTP) | Yes (headers, paths) | Via sticky sessions |
| API Gateway | L7 | Fully | Yes (per API key / per user) |
This table explains why you cannot enforce per-user rate limiting at the load balancer โ it does not maintain per-user state by default.
๐ L4 vs L7 vs Proxy Routing
flowchart LR
TR[Incoming Traffic] --> LB[Load Balancer L4]
TR --> RP[Reverse Proxy L7]
TR --> GW[API Gateway L7+]
LB --> SV[Backend Servers]
RP --> SV
GW --> AU[Auth + Rate Limit]
AU --> MS[Microservices]
๐ง Deep Dive: Why Rate Limiting Belongs at the Gateway, Not the Load Balancer
Reverse proxies and L4 load balancers are stateless โ they route packets without knowing anything about the user making the request. An API Gateway is stateful per consumer: it tracks request counts against rate-limit quotas, validates JWT claims, and maps API keys to consumer identities. This per-user state is what makes rate limiting possible at the gateway layer but not at the load balancer โ which sees only connections, not identities.
๐ How a Request Travels Through All Three Layers
flowchart TD
Client[Browser / Mobile Client] --> RP[Reverse Proxy\nNginx โ SSL, cache, compress]
RP -->|Cache miss or dynamic| LB[Load Balancer\nAWS ALB โ distribute traffic]
LB --> GW[API Gateway\nKong โ auth, rate limit, route]
GW --> US[User Service : 8080]
GW --> OS[Order Service : 8081]
GW --> PS[Payment Service : 8082]
US -->|health 200| LB
OS -->|health 200| LB
PS -->|health 503| LB
LB -.->|removed from pool| PS
The three layers handle distinctly different concerns as the request descends:
- Reverse Proxy handles transport: SSL decryption, static caching, DDoS absorption.
- Load Balancer handles distribution: which instance handles this request, and is that instance healthy?
- API Gateway handles application policy: who is this user, are they allowed, which service should respond?
๐ Real-World Applications: Real-World Deployment Patterns
Most production stacks do not cleanly separate these three layers into distinct products โ they use tools that collapse two or three roles, but the logical separation still matters for debugging and ownership:
| Tool | Acts as |
| Nginx | Reverse proxy + basic load balancer |
| AWS ALB | L7 load balancer + some gateway features |
| AWS API Gateway | Full gateway + rate limiting + auth |
| Kong | API gateway + rate limit + plugin ecosystem |
| Envoy | Sidecar proxy + LB + service mesh component |
Netflix uses Nginx at the edge for SSL termination and static asset caching, AWS Global Accelerator for regional traffic routing, and Zuul (their internal gateway) for per-service auth and rate limiting. Each layer is independently operable โ an Nginx config change doesn't touch auth policy, and a Zuul plugin update doesn't affect load-balancing weights.
Stripe runs a similar topology: Nginx terminates SSL and absorbs volumetric floods, HAProxy distributes to API pods, and their internal gateway enforces the API key validation and per-customer rate limits that power their public API offering. The clean separation means their platform team can own rate-limiting policy without touching the Nginx configurations owned by the infra team.
๐งช Practical Configuration Guide
Scenario: Adding rate limiting to a new endpoint without touching application code.
# Kong declarative config โ add rate limiting to /v1/orders
services:
- nam
e: order-service
url: http://orders:8081
routes:
- nam
e: orders-route
paths: ["/v1/orders"]
plugins:
- nam
e: rate-limiting
config:
minute: 60 # 60 requests per minute per consumer
policy: redis # shared state across gateway instances
- nam
e: jwt # auth before rate limiting
No code change to the Order Service. The gateway enforces both auth and rate limiting declaratively.
Scenario: Checking if a server is actually removed from the load balancer pool.
# AWS ALB โ list target health for a target group
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:...
# Output shows which instances are "healthy", "unhealthy", or "draining"
Scenario: Verifying SSL terminates at the reverse proxy, not the app.
# Check the certificate on the public-facing endpoint
openssl s_client -connect api.example.com:443 -brief
# Check the internal connection (should be plain HTTP)
curl -v http://internal-backend:8080/health
๐งญ Decision Guide: Choosing the Right Layer: Decision Guide
| You need... | Use |
| Hide backend server IPs, SSL offload, edge caching | Reverse Proxy (Nginx, Cloudflare) |
| Spread traffic across identical server instances | Load Balancer (ALB, HAProxy) |
| Auth, rate limiting, per-route routing | API Gateway (Kong, AWS API GW) |
| All three in a single tool | Nginx with plugins, or a cloud-managed API Gateway |
| Low-latency L4 routing for TCP services | L4 Load Balancer (NLB) |
| Path/header-based routing | L7 Load Balancer (ALB) |
๐ฏ What to Learn Next
- System Design Networking: DNS, CDNs, and Load Balancers
- System Design Core Concepts: Scalability, CAP, and Consistency
- Webhooks Explained: Don't Call Us, We'll Call You
๐ ๏ธ Spring Cloud Gateway: Routing, Rate Limiting, and Auth Filters in Java
Spring Cloud Gateway (SCG) is a reactive, non-blocking API gateway built on Spring WebFlux that implements the API Gateway layer described in this post โ routing, rate limiting, authentication, and circuit breaking โ without any code changes in downstream services.
Routes are declared in application.yml (or a Java DSL); filters attach cross-cutting policy to each route. SCG integrates natively with Spring Security for JWT validation, Redis for distributed rate-limit counters, and Resilience4j for circuit breaking.
# application.yml โ Spring Cloud Gateway declarative route configuration
spring:
cloud:
gateway:
routes:
# Route 1: user-service โ auth + rate limiting
- id: user-service
uri: lb://user-service # lb:// resolves via Eureka/Kubernetes
predicates:
- Path=/v1/users/**
filters:
- nam
e: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100 # 100 req/sec baseline
redis-rate-limiter.burstCapacity: 200 # allows short bursts
key-resolver: "#{@userKeyResolver}" # rate-limit per user ID
# Route 2: order-service โ circuit breaker + header rewrite
- id: order-service
uri: lb://order-service
predicates:
- Path=/v1/orders/**
filters:
- nam
e: CircuitBreaker
args:
name: orderCB
fallbackUri: forward:/fallback/orders
- AddRequestHeader=X-Gateway-Source, spring-cloud-gateway
- StripPrefix=1 # remove /v1 before forwarding
// KeyResolver โ rate-limit by authenticated user extracted from JWT claim
@Bean
public KeyResolver userKeyResolver() {
return exchange -> exchange.getPrincipal()
.map(java.security.Principal::getName)
.defaultIfEmpty("anonymous");
}
// Fallback controller โ called when circuit breaker trips
@RestController
public class FallbackController {
@GetMapping("/fallback/orders")
public ResponseEntity<Map<String, String>> orderFallback() {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(Map.of("error", "Order service is temporarily unavailable"));
}
}
The Order Service and User Service require zero code changes โ rate limiting, circuit breaking, and JWT extraction are enforced entirely at the gateway layer via configuration.
NGINX as a Reverse Proxy (the layer before the gateway):
# nginx.conf โ SSL termination + compression before traffic reaches the gateway
server {
listen 443 ssl http2;
server_name api.example.com;
ssl_certificate /etc/ssl/certs/api.crt;
ssl_certificate_key /etc/ssl/private/api.key;
gzip on;
gzip_types application/json text/plain;
location / {
proxy_pass http://spring-cloud-gateway:8080;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header X-Forwarded-Proto https;
}
}
NGINX terminates SSL and compresses responses at the edge; Spring Cloud Gateway enforces auth and routing at the API layer โ each component owns exactly one layer.
For a full deep-dive on Spring Cloud Gateway's filter chain, Spring Security JWT integration, and Resilience4j circuit breaker configuration, a dedicated follow-up post is planned.
๐ Lessons from Production Deployments
Lesson 1: L4 vs. L7 is a latency vs. flexibility trade-off. L4 load balancers route by IP/port and are extremely fast because they don't inspect HTTP content. L7 load balancers parse headers and paths, enabling smart routing โ but add ~0.5โ2ms per request. For most applications the flexibility of L7 is worth the cost. For extremely latency-sensitive TCP services (databases, game servers), L4 is preferable.
Lesson 2: The health check endpoint is a critical dependency.
If /health returns 200 even when the service is degraded (database connection down, dependencies unavailable), the load balancer keeps sending traffic to a broken instance. Implement deep health checks that validate actual functionality, not just HTTP reachability.
Lesson 3: Gateway plugins accumulate and slow down requests. Each gateway plugin (auth, rate-limit, logging, transformation) adds processing time to every request. Profile your gateway pipeline regularly and remove plugins that are no longer needed. A gateway with 8 plugins enabled on every route can add 20โ50ms of latency that appears as application slowness.
Lesson 4: Don't put your entire auth logic in the gateway. The gateway validates tokens (JWT signature, expiry, audience). It does not know your application's permission model. Fine-grained authorization (can this user edit this order?) belongs in the service itself.
๐ TLDR: Summary & Key Takeaways
- Reverse proxy: hides backends, offloads SSL, caches static content. Client never knows your server's real address.
- Load balancer: distributes connections across instances using round-robin, least-connections, or IP hash.
- API Gateway: enforces auth, rate limits, routing, and transformation at the HTTP/API layer.
- Health checks are the load balancer's mechanism to remove failed instances without human intervention.
- L4 load balancers route by IP/port (fast); L7 route by headers/paths (flexible).
๐ Practice Quiz
What is the primary job of an L7 load balancer over an L4 load balancer?
- A) L7 is faster because it operates at a lower level
- B) L7 can route based on HTTP headers and URL paths, not just IP and port
- C) L7 handles SSL termination while L4 does not
- D) L7 monitors health checks while L4 does not
Correct Answer: B โ L7 inspects HTTP content, enabling path-based and header-based routing. L4 routes purely on IP address and TCP port, which is faster but less flexible.
A user repeatedly hitting the same shopping cart endpoint exceeds the allowed rate. Which component should enforce the limit?
- A) Reverse Proxy โ it can block requests before they reach the backend
- B) API Gateway โ it enforces per-user or per-endpoint rate limiting policy
- C) Load Balancer โ it drops excess connections
- D) The application service itself
Correct Answer: B โ API Gateways maintain per-consumer state (via Redis or local counters) and are the correct layer for rate limiting. Reverse proxies lack per-user context; load balancers distribute rather than restrict.
Your team deploys a new microservice for search. What is the minimal addition needed to route
/v1/searchtraffic to it without changing existing services?- A) Add a new server to the load balancer pool
- B) Add a route rule in the API Gateway mapping
/v1/searchto the new service - C) Update the reverse proxy SSL certificate
- D) Deploy a new load balancer instance
Correct Answer: B โ the API Gateway's routing configuration is the right place to add new route mappings. Other services and infrastructure remain untouched.
๐ Related Posts
- System Design Networking: DNS, CDNs, and Load Balancers
- System Design Core Concepts: Scalability, CAP, and Consistency
- Webhooks Explained: Don't Call Us, We'll Call You

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions โ with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy โ but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose โ range, hash, consistent hashing, or directory โ determines whether range queries stay ch...
