The 8 Fallacies of Distributed Systems
The Network is Reliable. Latency is Zero. Bandwidth is Infinite. If you believe these, your system will fail. We debunk the 8 fallacies.
Abstract AlgorithmsTLDR: In 1994, L. Peter Deutsch at Sun Microsystems listed 8 assumptions that developers make about distributed systems โ all of which are false. Believing them leads to hard-to-reproduce bugs, timeout cascades, and security holes. Knowing them is a prerequisite for designing systems that actually work at scale.
๐ The Eight Assumptions That Will Break Your System
These are not theoretical warnings. They are a field guide to the most common production bugs in distributed software.
The eight fallacies:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn't change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
๐ข Network Fallacies 1โ4: Reliability, Latency, Bandwidth, and Security
Fallacy 1: The network is reliable.
Packets are dropped. Connections are reset. Load balancers time out. A single function call that succeeds 99.9% of the time fails once every 1,000 requests, and distributed systems call each other thousands of times per second.
Design response: Retry with exponential backoff. Use circuit breakers (Hystrix, Resilience4j). Design for idempotency.
Fallacy 2: Latency is zero.
A function call on the same machine takes nanoseconds. A call to a service in the same data center takes ~0.5 ms. A call across regions takes 50โ200 ms. At 100 chained calls, that is 20+ seconds.
Design response: Avoid deep synchronous call chains. Use async messaging. Parallelize independent calls.
Fallacy 3: Bandwidth is infinite.
Sending large JSON payloads is cheap in development where teams work on fast LANs. In production, AWS cross-AZ bandwidth costs money; serializing large object graphs creates GC pressure.
Design response: Use binary serialization (Protobuf, Avro). Filter fields at the API boundary. Use compression for large payloads.
Fallacy 4: The network is secure.
Traffic between services inside your VPC is not automatically encrypted or authenticated. An attacker who gains access to your network can intercept or inject requests.
Design response: mTLS between services. Zero-trust network model. Never pass secrets in logs.
โ๏ธ Infrastructure Fallacies 5โ8: Topology, Administration, Cost, and Compatibility
Fallacy 5: Topology doesn't change.
Servers fail. Auto-scaling adds and removes instances. Kubernetes restarts pods. Hardcoding IP addresses breaks as soon as a node is replaced.
Design response: Use service discovery (Consul, Kubernetes DNS, AWS Cloud Map). Never hardcode IPs.
Fallacy 6: There is one administrator.
Real systems involve the platform team, the security team, the application team, and the database team. A schema migration "owned" by the app team may be blocked by the DBA team for a week.
Design response: Design for backward and forward compatibility. Feature flags for deployments. Self-service infra via IaC.
Fallacy 7: Transport cost is zero.
Serializing a Java object to JSON, compressing it, encrypting it, sending it over a socket, deserializing it on the other side โ all of this costs CPU cycles, memory, and money (cloud egress charges).
Design response: Right-size payloads. Batch small messages. Cache at the boundary.
Fallacy 8: The network is homogeneous.
Mobile clients, desktop browsers, IoT devices, and internal services all speak different protocols, have different MTUs, and fail in different ways. Expecting all consumers to behave like your tested Java client will lead to interoperability bugs.
Design response: Use standard protocols (HTTP/1.1, HTTP/2, gRPC). Handle content negotiation. Test with diverse client types.
๐ง The Practical Antidote: Designing for Failure
flowchart LR
Call[Service A calls B] --> Retry{Retry logic?}
Retry -- No --> Crash[Hard failure\nno retry = cascading outage]
Retry -- Yes --> CB{Circuit breaker?}
CB -- No --> Flood[B is down\nA floods with retries]
CB -- Yes --> Timeout[Fail fast\nReturn fallback]
The minimal production checklist for every service call:
- Timeout set (never rely on OS default)
- Retry with exponential backoff + jitter
- Circuit breaker to stop cascade
- Bulkhead (limit concurrent calls per downstream)
โ๏ธ Why These Fallacies Still Bite Senior Engineers
These fallacies are taught in university โ and still violated in code every week because:
- Local development masks network problems (everything runs on localhost)
- Unit tests don't simulate network partitions or latency spikes
- Monolith-to-microservices migrations often copy in-process assumptions to network calls
The most common production outage pattern: a service that worked fine in staging fails under load in production because no retry logic handles the 1-in-1,000 packet drop rate.
๐ Key Takeaways
- All 8 fallacies are false assumptions developers make about networks that cause production bugs.
- The four network fallacies: reliability, latency, bandwidth, security.
- The four infrastructure fallacies: topology, administration, transport cost, homogeneity.
- Every service call needs: timeout, retry with backoff, circuit breaker, and idempotency.
- These bugs appear in production โ not in local development โ because localhost masks all of them.
๐งฉ Test Your Understanding
- Your service has no retry logic on a downstream call. Which fallacy are you relying on?
- A developer hardcodes a database IP after migration tests pass. Which fallacy does this violate?
- Cross-AZ traffic in AWS is not free. Which fallacy does billing prove false?
- You add mTLS between services. Which fallacy are you addressing?
๐ Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
