Multi-Agent Systems in LangGraph: Supervisor Pattern, Handoffs, and Agent Networks
Build multi-agent systems in LangGraph: supervisor routing, worker handoffs, subgraphs, and the Send API for parallel agents.
Abstract AlgorithmsTLDR: Split work across specialist agents β supervisor routing beats one overloaded generalist every time.
π The Context Ceiling: Why One Agent Can't Do Everything
Your research agent is writing a 20-page report. It has 15 tools. Its context window is full by page 3. The last 17 pages are hallucinated.
This is not a model quality problem. It is a structural problem: one agent trying to do everything hits three hard limits.
1. The context window ceiling. Every tool call, intermediate result, and reasoning trace consumes tokens. A GPT-4o window of 128k tokens sounds large until you add search results (3,000 tokens each), a growing message history, tool schemas (500 tokens each), and five iterations of reflection. A realistic research task exhales that budget inside 10β15 tool calls. Once the window is full, the model truncates early context β silently discarding the facts it needs most.
2. The specialization gap. A single generalist agent uses a single system prompt, a single temperature setting, and a single model. A research subtask demands precision. A copywriting subtask demands creativity. An SQL analysis subtask demands structured output parsing. Optimizing one hurts the others. Specialists tuned per task consistently outperform generalists across domains.
3. The parallelism bottleneck. A single agent is inherently sequential. Three independent subtasks β literature review, competitor analysis, financial summary β could run in parallel, but a single-agent loop serializes them. Wall-clock latency multiplies with each step.
| Single-agent failure mode | Root cause | Effect |
| Context overflow | Token budget exhaustion | Hallucination / truncation |
| Quality degradation | No task-specific tuning | Errors at task boundaries |
| Serial latency | No parallelism primitive | 3Γ slower than necessary |
| Tool sprawl | 15+ tools in one prompt | Incorrect tool selection |
The solution is not a bigger context window. The solution is decomposition: break the work into specialized agents, each with a focused context, purpose-built tools, and an explicit coordination mechanism.
π Multi-Agent Architectures: Supervisor, Swarm, and Pipeline Compared
LangGraph supports three multi-agent coordination patterns. Each solves a different kind of decomposition problem.
graph TD
A["Supervisor Pattern
(hierarchical)"] --> A1["Router LLM decides
which worker to call"]
A1 --> A2["Worker completes task
returns to supervisor"]
A2 --> A1
B["Agent Swarm
(peer-to-peer)"] --> B1["Agent A hands off
directly to Agent B"]
B1 --> B2["Agent B hands off
to Agent C or back to A"]
C["Pipeline
(sequential)"] --> C1["Agent 1 β Agent 2
β Agent 3 β output"]
Supervisor + Workers (hierarchical) is the most flexible. A routing LLM sits at the top, inspects the current task, and delegates to one of several specialist agents. Each worker returns control to the supervisor after completing its task. The supervisor decides whether to delegate again, to a different worker, or to finalize the output. This pattern handles dynamic, branching workflows where the next step depends on what was just learned.
Agent swarm (peer-to-peer handoffs) is better when agents need to negotiate directly. Agent A does its part and explicitly hands off to Agent B with a Command(goto="agent_b"). No central coordinator exists. The network is a graph where any node can route to any other node. This works well for workflows with well-defined responsibility boundaries β a triage agent hands off to a medical-records agent, which hands off to a billing agent.
Pipeline (sequential delegation) is the simplest pattern. Each agent receives the output of the previous one, does its job, and passes downstream. This is deterministic, easy to trace, and appropriate when every stage must run in order without branching. A data cleaning β enrichment β summarization pipeline is a classic example.
| Pattern | Best for | Control flow | Parallelism |
| Supervisor | Dynamic, branching tasks | Centralized | Via Send from supervisor |
| Swarm | Defined responsibility lanes | Distributed | If agents do not depend on each other |
| Pipeline | Ordered, predictable stages | Sequential | No native parallelism |
βοΈ Building the Supervisor Pattern: Router LLM, Worker Agents, and Handoffs
The supervisor pattern in LangGraph has three components: the shared state, the worker agents as graph nodes, and a supervisor node that routes using an LLM call.
Shared State and Worker Nodes
Every agent in the system reads from and writes to a shared TypedDict state. Workers append their results to the message list. The supervisor reads the full message history and decides what to do next.
from typing import Annotated, Literal
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
class ResearchState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
next_agent: str # supervisor writes this to route
final_report: str # writer agent writes this
Each worker is a plain LangGraph node β a function that receives state, runs a specialized agent loop, and returns a state patch.
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
@tool
def web_search(query: str) -> str:
"""Search the web for current information."""
# real implementation uses Tavily / SerpAPI
return f"[Search results for: {query}]"
@tool
def fact_check(claim: str) -> str:
"""Verify a factual claim against trusted sources."""
return f"[Fact check result for: {claim}]"
researcher_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([web_search])
fact_checker_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([fact_check])
writer_llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
def researcher_node(state: ResearchState) -> dict:
"""Runs the researcher agent with its tool loop."""
response = researcher_llm.invoke(state["messages"])
return {"messages": [response]}
def fact_checker_node(state: ResearchState) -> dict:
"""Runs the fact-checker against claims in the message history."""
response = fact_checker_llm.invoke(state["messages"])
return {"messages": [response]}
def writer_node(state: ResearchState) -> dict:
"""Drafts a final report from everything gathered."""
response = writer_llm.invoke(state["messages"])
return {"messages": [response], "final_report": response.content}
Supervisor Routing with Command
The supervisor node calls the LLM with a structured prompt that asks it to choose the next agent. In LangGraph, the routing decision is expressed as a Command β an explicit instruction to move to a named node and optionally update state.
from langgraph.types import Command
SUPERVISOR_PROMPT = """You are a research supervisor managing three specialist agents:
- researcher: finds information and sources
- fact_checker: verifies claims and catches errors
- writer: synthesizes findings into a final report
Given the current conversation, decide which agent to invoke next,
or respond with 'FINISH' if the report is complete.
Respond with exactly one word: researcher | fact_checker | writer | FINISH
"""
def supervisor_node(state: ResearchState) -> Command[Literal["researcher", "fact_checker", "writer", "__end__"]]:
messages = [{"role": "system", "content": SUPERVISOR_PROMPT}] + state["messages"]
response = ChatOpenAI(model="gpt-4o", temperature=0).invoke(messages)
decision = response.content.strip().lower()
if decision == "finish":
return Command(goto=END)
return Command(
goto=decision,
update={"next_agent": decision}
)
Command(goto="researcher") tells LangGraph to route to the researcher node. update={...} writes into the shared state before the next node runs. The type annotation Command[Literal["researcher", ...]] gives LangGraph the edge set it needs to build the graph.
Wiring the Graph
builder = StateGraph(ResearchState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("fact_checker", fact_checker_node)
builder.add_node("writer", writer_node)
# All workers return to the supervisor after completing
builder.add_edge("researcher", "supervisor")
builder.add_edge("fact_checker", "supervisor")
builder.add_edge("writer", "supervisor")
builder.set_entry_point("supervisor")
graph = builder.compile()
# Run
result = graph.invoke({
"messages": [HumanMessage(content="Write a report on the state of quantum computing in 2025.")]
})
print(result["final_report"])
The graph topology is: supervisor routes to a worker β worker completes β supervisor evaluates β routes again or finishes. This loop continues until the supervisor emits Command(goto=END).
π§ Deep Dive: State Isolation, Subgraphs, and the Send API
The Internals
How subgraph state boundaries work. When you wrap a specialist agent as a subgraph instead of a plain node, it gets its own private TypedDict. The parent graph and the subgraph communicate only at explicit boundary points β the entry edge (parent state β subgraph input mapping) and the exit edge (subgraph output β parent state mapping). This means a subgraph can have its own tool_calls, scratchpad, or intermediate_steps fields that never pollute the parent graph's state.
from langgraph.graph import StateGraph
class ResearcherState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
search_queries: list[str] # private to researcher
sources_found: list[str] # private to researcher
# Build the researcher as a standalone graph
researcher_builder = StateGraph(ResearcherState)
# ... add nodes, tools, loops ...
researcher_subgraph = researcher_builder.compile()
# Mount the subgraph as a node in the parent
def researcher_node(state: ResearchState) -> dict:
# Map parent state β subgraph input
sub_result = researcher_subgraph.invoke({
"messages": state["messages"],
"search_queries": [],
"sources_found": []
})
# Map subgraph output β parent state
return {"messages": sub_result["messages"]}
How Command(goto=) routes between agents. Under the hood, Command is a return value that LangGraph's runtime intercepts before the next edge resolution step. Instead of reading static edge definitions, the runtime reads the goto field from the Command object returned by the current node. The update dict is merged into the channel state before the target node runs β guaranteeing the target receives fresh state. This is what makes supervisor routing dynamic: the edge set is fixed at compile time, but which edge fires is decided at runtime by the LLM.
How the supervisor's tool calls map to agent invocations. An alternative to structured text output is to have the supervisor use tool calling where each worker agent is registered as a tool. The supervisor LLM emits a tool call (e.g., call_researcher(task="find quantum computing breakthroughs")), the runtime interprets that as a Command(goto="researcher"), and the task argument seeds the worker's context. This approach improves reliability because the LLM is constrained to a typed schema rather than free-form text routing.
sequenceDiagram
participant S as Supervisor
participant LLM as Router LLM
participant R as Researcher
participant FC as Fact Checker
participant W as Writer
S->>LLM: Current state + routing prompt
LLM-->>S: Command(goto="researcher")
S->>R: invoke(state)
R-->>S: updated messages (sources found)
S->>LLM: Updated state + routing prompt
LLM-->>S: Command(goto="fact_checker")
S->>FC: invoke(state)
FC-->>S: updated messages (verified claims)
S->>LLM: Updated state + routing prompt
LLM-->>S: Command(goto="writer")
S->>W: invoke(state)
W-->>S: final_report
S->>LLM: Updated state + routing prompt
LLM-->>S: Command(goto=END)
Performance Analysis
Latency of multi-hop delegation. Each supervisor β worker β supervisor round trip adds two LLM calls: one supervisor invocation plus the worker's own inference. For a three-worker sequential workflow, total latency is approximately:
T_total β T_supervisor Γ (N_hops + 1) + Ξ£ T_worker_i
With GPT-4o at ~1.5s per call and three sequential workers, a supervisor loop adds ~6s of routing overhead on top of the work itself. This is acceptable for research tasks but too slow for real-time user interactions β prefer single agents or cached routing there.
Parallelism with the Send API. The Send API lets the supervisor fan out to multiple workers simultaneously instead of sequentially. This eliminates serial overhead when workers are independent.
from langgraph.types import Send
def supervisor_fanout_node(state: ResearchState) -> list[Send]:
"""Fan out to researcher and fact_checker in parallel."""
tasks = [
Send("researcher", {**state, "messages": state["messages"] + [
HumanMessage(content="Find primary sources on quantum computing 2025")
]}),
Send("fact_checker", {**state, "messages": state["messages"] + [
HumanMessage(content="Verify: quantum supremacy was claimed in 2024")
]}),
]
return tasks
Returning a list of Send objects from a node triggers LangGraph to execute all of them as concurrent branches. Results are collected and merged before the next node runs. A three-parallel-worker setup reduces wall-clock time from Ξ£ T_i to max(T_i) β a 3Γ speedup when tasks take equal time.
Token cost of shared message history. The shared messages list grows with every agent turn. By the third worker invocation, it carries the full conversation β including the researcher's raw search results. A researcher output of 2,000 tokens, fed into the fact-checker's context, costs those tokens at every subsequent call. In long chains, shared message history becomes the dominant token cost driver.
Mitigation: use summarization nodes between workers to compress verbose intermediate outputs before they enter the supervisor's next prompt.
Mathematical Model
When do N specialized agents outperform one generalist?
Define the following variables for a task decomposed into $N$ subtasks:
- $C$ = context window capacity (tokens)
- $T_i$ = tokens consumed by subtask $i$ including tool calls and intermediate reasoning
- $S_i \geq 1$ = specialization quality multiplier for subtask $i$ (1 = no benefit; 2 = generalist makes half as many errors)
- $O$ = per-handoff coordination overhead in tokens
- $Q_{\text{generalist}}$ = product quality score for a single-agent run
- $Q_{\text{multi}}$ = product quality score for the multi-agent run
Context overflow condition. The generalist fails (truncates context) when:
$$\sum_{i=1}^{N} T_i + N \cdot O > C$$
Each specialized agent only sees its own subtask, so individual context usage is $T_i + O$, which must satisfy $T_i + O < C$ β a much weaker constraint.
Specialization gain. Define the quality ratio as:
$$\frac{Q_{\text{multi}}}{Q_{\text{generalist}}} = \prod_{i=1}^{N} S_i$$
If each specialist is 20% better than the generalist on its subtask ($S_i = 1.2$), three specialists yield a compound quality ratio of $1.2^3 = 1.73$ β a 73% quality improvement.
Net benefit condition. Multi-agent decomposition is beneficial when either (or both) of these hold:
$$\text{(1) Context overflow: } \sum_{i=1}^{N} T_i > C - N \cdot O$$
$$\text{(2) Quality gain: } \prod_{i=1}^{N} S_i > 1 + \frac{N \cdot O \cdot \lambda}{C}$$
where $\lambda$ converts token cost to quality penalty (empirically ~0.0005 per token for GPT-4o class models).
Worked example. Three subtasks: researcher ($T_1 = 40\text{k}$), fact-checker ($T_2 = 20\text{k}$), writer ($T_3 = 30\text{k}$). Context window $C = 128\text{k}$. Coordination overhead $O = 2\text{k}$ per hop.
- Generalist total: $40 + 20 + 30 + 3 \times 2 = 96\text{k}$ β fits, but barely. Adding one more research cycle overflows.
- Specialization gain (each agent 25% better): $1.25^3 = 1.95$ β nearly 2Γ quality improvement.
- Conclusion: decompose, even without overflow, because compound specialization beats the coordination cost.
ποΈ Scaling Multi-Agent Systems: Edge Cases, Optimizations, and What Teams Get Wrong
Scaling to 10+ Agents
Adding more agents is not free. Every new specialist adds a routing option that the supervisor LLM must evaluate. Beyond 7β10 agents, the routing prompt grows long enough to degrade routing quality β the model starts making errors about which agent is appropriate. Mitigations:
- Hierarchical supervisors: group agents into sub-teams, each managed by a sub-supervisor. The top-level supervisor routes to sub-supervisors, not individual agents. This keeps each routing decision to β€5 options.
- Semantic routing cache: if the same task type recurs, cache the routing decision by task embedding similarity. Skip the LLM call for known task types.
- Agent capability registry: store each agent's capabilities as a structured schema, and use a lightweight classifier (not an LLM) for first-pass routing.
Edge Cases: Cycles, Deadlocks, and Starvation
| Edge Case | When It Occurs | Detection | Fix |
| Routing cycle | Supervisor routes AβBβA indefinitely | Visited-node counter in state | recursion_limit + cycle guard |
| Agent deadlock | Two agents wait for each other's output | No deadlock in single-threaded graph | Only occurs in async multi-process setups β use timeouts |
| Worker starvation | Supervisor always picks the same agent | Routing histogram in traces | Add routing diversity constraint to supervisor prompt |
| Empty state entry | Worker invoked with no relevant messages | Worker receives ambiguous task | Validate preconditions before routing in supervisor node |
Common Misconceptions
"More agents = better quality." False. Each agent boundary introduces a communication loss: the downstream agent only sees what the upstream agent wrote, not what it reasoned. Unnecessary decomposition discards implicit reasoning context. Only decompose when a hard limit (context, specialization, parallelism) requires it.
"The supervisor needs GPT-4 to route reliably." Not always. Routing is a classification task, not a reasoning task. A well-structured routing prompt with enumerated options and explicit state flags often performs reliably with GPT-4o-mini at 10Γ lower cost. Reserve the large model for the workers that do actual synthesis.
"Subgraphs automatically isolate token cost." Subgraph state isolation is logical, not token-cost isolation. If you pass the full parent messages list into a subgraph, the subgraph still processes all those tokens. True token isolation requires passing only a summary or a scoped slice of the message history.
π The Deep Research System Architecture
The following diagram shows the full deep research system: the supervisor fans out to the researcher and fact-checker in parallel, then sequentially routes to the writer, and finally exits.
graph TD
User(["π§ User\nResearch Request"]) --> Supervisor
Supervisor["π§ Supervisor\n(Router LLM)"]
Supervisor -- "Send: parallel fan-out" --> Researcher["π Researcher Agent\n+ web_search tool\n+ arxiv_search tool"]
Supervisor -- "Send: parallel fan-out" --> FactChecker["β
Fact-Checker Agent\n+ fact_check tool\n+ citation_lookup tool"]
Researcher -- "messages: sources + findings" --> Merge["β³ Merge Node\n(collect parallel results)"]
FactChecker -- "messages: verified claims" --> Merge
Merge --> Supervisor
Supervisor -- "Command(goto='writer')" --> Writer["βοΈ Writer Agent\n+ format_report tool"]
Writer -- "final_report" --> Supervisor
Supervisor -- "Command(goto=END)" --> Output(["π Final Report"])
style Supervisor fill:#4A90D9,color:#fff
style Researcher fill:#27AE60,color:#fff
style FactChecker fill:#E67E22,color:#fff
style Writer fill:#8E44AD,color:#fff
style Merge fill:#555,color:#fff
Reading the diagram. The supervisor first fans out using Send β the researcher and fact-checker run in parallel, each with their own tools. Their results merge back into the shared state. The supervisor then evaluates the verified findings and routes sequentially to the writer. The writer produces the final report, control returns to the supervisor once more, and the supervisor exits.
π Real-World Applications: Multi-Agent Systems in Production
Case Study 1: Enterprise Document Intelligence at Scale
A financial services firm built a multi-agent pipeline for regulatory document analysis. The system decomposes each filing into four parallel workstreams: a table extraction agent, a risk clause agent, a cross-reference agent, and a summarization agent. Each agent runs on a focused 20k-token context window rather than feeding the 400-page document to one model.
Input: 400-page SEC 10-K filing (PDF, ~200k tokens extracted).
Architecture: Supervisor routes each document section to the appropriate specialist. The cross-reference agent runs in parallel with risk clause extraction.
Output: Structured JSON with flagged clauses, risk scores, and referenced exhibits.
Scaling note: Parallel fan-out reduced P90 latency from 4 minutes (sequential) to 68 seconds. The supervisor makes ~15 routing decisions per document. Token cost increased by 12% due to coordination overhead but accuracy improved by 31%.
Case Study 2: AI-Powered Customer Support Escalation
A SaaS company built a swarm-pattern multi-agent system where a triage agent classifies the ticket, then hands off directly to a billing agent, technical agent, or escalation agent using Command(goto=). No central supervisor β each agent knows its completion condition and routes onward.
Input: Customer support ticket (text + attachment).
Architecture: Triage β (billing | technical | escalation) via direct handoffs.
Output: Resolved ticket with audit trail of which agent handled each step.
Operational lesson: Agent disagreement emerged when the billing agent re-routed to technical because it detected a product bug in the billing data. The system needed explicit cycle-detection: if the same ticket visits the same agent twice, escalate to human review.
Case Study 3: Code Review Multi-Agent Network
A developer tools team runs three agents in a pipeline: a static analysis agent (linting, complexity metrics), a security agent (OWASP checks, dependency scanning), and a review agent (style, architecture suggestions). Each receives the previous agent's annotations as part of its context.
Scaling note: The pipeline pattern here is intentional β security review must happen after static analysis flags dead code paths. Adding parallelism would cause the security agent to miss coverage gaps that static analysis identifies.
βοΈ Trade-offs and Failure Modes: Coordination Overhead, Agent Disagreement, and Cascade Failures
Performance vs. Coordination Cost
Every inter-agent handoff adds at minimum one LLM call (the supervisor making a routing decision) plus state serialization overhead. For tasks that fit comfortably in a single context window with a coherent tool set, multi-agent decomposition is net-negative: it adds latency and token spend without quality benefit. The threshold is roughly: if your single-agent workflow uses fewer than 8 tools and stays under 50k tokens, stay single-agent.
Agent Disagreement
In a fact-checker + writer pair, the fact-checker may flag a claim as unverified while the writer proceeds to include it. This disagreement is invisible in a system with no arbitration mechanism. Mitigations:
- Add explicit reconciliation nodes between conflicting agents
- Use structured output schemas (Pydantic) so agents express confidence levels, not just content
- Route disagreements back to the supervisor with a
conflict: trueflag in state
Cascade Failures and Infinite Loops
A supervisor that cannot decide emits the same routing decision repeatedly. Without a loop-detection mechanism, this spins indefinitely. LangGraph provides recursion_limit in the RunnableConfig β set it to N_expected_hops Γ 2 as a safety ceiling.
result = graph.invoke(
{"messages": [HumanMessage(content="Research quantum computing")]},
config={"recursion_limit": 25} # 25 hops max
)
A harder failure mode is partial completion: the researcher agent succeeds, but the fact-checker hits a timeout. The supervisor receives an incomplete state and may not detect the gap. Design workers to write explicit completion markers into state (researcher_done: bool) so the supervisor can validate completeness before routing to the writer.
Context Contamination
Shared message history means every agent sees every other agent's outputs. If the researcher returns verbose raw HTML from a web scrape, that content enters the fact-checker's context and wastes tokens on irrelevant content. Solution: use structured state fields (not just the messages list) to carry only processed data between agents.
| Failure Mode | Trigger | Mitigation |
| Infinite supervisor loop | LLM indecision | recursion_limit config |
| Partial completion | Worker timeout / error | Completion flags in state |
| Context contamination | Raw outputs in shared messages | Separate structured state fields |
| Agent disagreement | Conflicting outputs | Reconciliation nodes |
| Cascade timeout | Parallel worker failure | Timeout + fallback nodes |
π§ Decision Guide: Single Agent vs Supervisor vs Swarm
| Situation | Recommendation |
| Task fits in one context window, fewer than 8 tools | Single agent β decomposition adds overhead without benefit |
| Task requires dynamic routing based on intermediate findings | Supervisor pattern β centralized control handles branching |
| Task has well-defined responsibility lanes with no ambiguity at boundaries | Agent swarm β direct handoffs are simpler than a supervisor |
| Subtasks are independent and can run in parallel | Supervisor + Send fan-out β captures parallelism benefit |
| Tasks must run in strict order with no branching | Pipeline pattern β sequential, predictable, easy to trace |
| Each subtask needs a different model or temperature | Always decompose β generalist config cannot simultaneously optimize all subtasks |
| Real-time user interaction (< 1s latency required) | Single agent or cached routing β multi-hop latency is prohibitive |
| Regulatory audit trail required per decision | Supervisor pattern β each routing Command is a traceable event in LangGraph state history |
π§ͺ Practical Example: Deep Research System with Three Specialist Agents
This example demonstrates a deep research pipeline where specialist agents run in parallel via Send-based fan-out β exactly the pattern the post's π§ Internals section describes as the supervisor's coordination mechanism. The deep-research scenario was chosen because it has three fundamentally different agent roles (researcher, fact-checker, writer) that map clearly to the three supervisor routing decisions: fan-out to two agents, wait for both, then hand off sequentially to one. As you read through the supervisor function, watch the Send calls in the fan_out_to_specialists routing β each Send("researcher", ...) and Send("fact_checker", ...) is a parallel task dispatch, and results_complete fires only after both complete; that ordering is the entire multi-agent coordination contract.
from typing import Annotated, Literal
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.types import Command, Send
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, BaseMessage, SystemMessage
from langchain_core.tools import tool
# ββ State ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
class ResearchState(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
researcher_done: bool
fact_checker_done: bool
final_report: str
# ββ Tools βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
@tool
def web_search(query: str) -> str:
"""Search the web for up-to-date information on a topic."""
return f"[Web results for '{query}': Found 5 relevant sources about recent developments.]"
@tool
def fact_check(claim: str) -> str:
"""Verify whether a specific claim is supported by cited sources."""
return f"[Fact check for '{claim}': Supported by 3 sources, 1 contested.]"
# ββ Agent LLMs ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
researcher_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([web_search])
fact_checker_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([fact_check])
writer_llm = ChatOpenAI(model="gpt-4o", temperature=0.4)
router_llm = ChatOpenAI(model="gpt-4o", temperature=0)
# ββ Worker Nodes ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
def researcher_node(state: ResearchState) -> dict:
sys_prompt = SystemMessage(content=(
"You are a research specialist. Use web_search to gather current, "
"accurate information. Be thorough and cite your sources."
))
response = researcher_llm.invoke([sys_prompt] + state["messages"])
return {
"messages": [response],
"researcher_done": True
}
def fact_checker_node(state: ResearchState) -> dict:
sys_prompt = SystemMessage(content=(
"You are a fact-checking specialist. Review all claims in the conversation "
"and use fact_check to verify each one. Flag unverified claims explicitly."
))
response = fact_checker_llm.invoke([sys_prompt] + state["messages"])
return {
"messages": [response],
"fact_checker_done": True
}
def writer_node(state: ResearchState) -> dict:
sys_prompt = SystemMessage(content=(
"You are a professional report writer. Using the research findings and "
"fact-check results in this conversation, write a comprehensive, "
"well-structured final report. Be clear, concise, and accurate."
))
response = writer_llm.invoke([sys_prompt] + state["messages"])
return {
"messages": [response],
"final_report": response.content
}
# ββ Supervisor Node ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ROUTING_PROMPT = """You are a research supervisor. Workers available:
- researcher: gathers information (run first)
- fact_checker: verifies claims (run after researcher)
- writer: writes the final report (run after fact_checker)
- FINISH: end the workflow (run after writer has produced a report)
Current state:
- researcher_done: {researcher_done}
- fact_checker_done: {fact_checker_done}
- has_final_report: {has_final_report}
What is the next step? Reply with exactly one word.
"""
def supervisor_node(state: ResearchState) -> Command[
Literal["researcher", "fact_checker", "writer", "__end__"]
]:
prompt = ROUTING_PROMPT.format(
researcher_done=state.get("researcher_done", False),
fact_checker_done=state.get("fact_checker_done", False),
has_final_report=bool(state.get("final_report")),
)
response = router_llm.invoke(
[SystemMessage(content=prompt)] + state["messages"]
)
decision = response.content.strip().lower()
if decision == "finish":
return Command(goto=END)
valid = {"researcher", "fact_checker", "writer"}
if decision not in valid:
# Fallback: if LLM returns unexpected output, force-finish
return Command(goto=END)
return Command(goto=decision, update={"next_agent": decision})
# ββ Graph Assembly βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
builder = StateGraph(ResearchState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("fact_checker", fact_checker_node)
builder.add_node("writer", writer_node)
# Workers return to supervisor after completing
builder.add_edge("researcher", "supervisor")
builder.add_edge("fact_checker", "supervisor")
builder.add_edge("writer", "supervisor")
builder.set_entry_point("supervisor")
graph = builder.compile()
# ββ Run ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
if __name__ == "__main__":
initial_state = {
"messages": [HumanMessage(content=(
"Write a comprehensive report on the current state of quantum computing "
"in 2025, focusing on recent hardware breakthroughs and enterprise adoption."
))],
"researcher_done": False,
"fact_checker_done": False,
"final_report": "",
}
result = graph.invoke(
initial_state,
config={"recursion_limit": 20}
)
print("=== FINAL REPORT ===")
print(result["final_report"])
What this demonstrates:
- The supervisor inspects explicit state flags (
researcher_done,fact_checker_done) rather than relying on the LLM to infer completion β this makes routing deterministic and auditable. - Each worker has its own system prompt tuned for its role.
- The
recursion_limitguard prevents runaway loops if the router LLM gives unexpected output. - The fallback in
supervisor_nodeforces a clean exit rather than raising an unhandled exception.
To extend this with parallel fan-out, replace the first supervisor call with a Send-based fan-out node that fires the researcher and fact-checker simultaneously, then merge their results before routing to the writer.
π οΈ langgraph-supervisor: The Official Multi-Agent Orchestration Library
LangGraph ships a prebuilt library β langgraph-supervisor β that codifies the supervisor pattern with minimal boilerplate. Instead of writing the routing prompt and Command logic manually, create_supervisor() generates it from your agent list.
pip install langgraph-supervisor
from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
# Create specialist agents using prebuilt ReAct loop
researcher_agent = create_react_agent(
model=model,
tools=[web_search],
name="researcher",
prompt="You are a research specialist. Find accurate, current information."
)
fact_checker_agent = create_react_agent(
model=model,
tools=[fact_check],
name="fact_checker",
prompt="You are a fact-checking specialist. Verify all claims carefully."
)
writer_agent = create_react_agent(
model=model,
tools=[],
name="writer",
prompt="You are a report writer. Synthesize findings into a clear report."
)
# Create the supervisor graph
supervisor = create_supervisor(
agents=[researcher_agent, fact_checker_agent, writer_agent],
model=model,
prompt=(
"You are a research team supervisor. Coordinate the researcher, "
"fact_checker, and writer agents to produce accurate research reports. "
"Always verify facts before writing the final report."
)
).compile()
# Invoke
result = supervisor.invoke({
"messages": [HumanMessage(content="Report on quantum computing in 2025")]
})
What create_supervisor() does under the hood:
- Builds a routing LLM prompt that includes each agent's name and description.
- Registers each agent as a callable tool in the supervisor's tool schema.
- Generates a
Command-based routing node from the supervisor's tool call output. - Wires edges so all workers return to the supervisor.
The prebuilt handles the boilerplate but gives less control over state schema, routing prompt, and completion conditions. For production systems with strict audit requirements or complex state, the manual approach from the previous section gives cleaner control boundaries.
| Feature | create_supervisor() prebuilt | Manual supervisor |
| Setup time | Minutes | Hours |
| State control | Opaque (internal message schema) | Full TypedDict ownership |
| Custom routing logic | Limited | Arbitrary Python |
| Audit / observability | LangSmith traces | LangSmith + custom state fields |
| Best for | Prototyping, standard patterns | Production, complex state |
π Lessons Learned from Building Multi-Agent Systems
Make routing decisions observable, not opaque. The biggest debugging nightmare in multi-agent systems is a supervisor that routes inexplicably. Write the routing decision into an explicit state field (next_agent: str) alongside the reason (routing_reason: str). LangGraph's state history then gives you a complete trace: every routing decision, every state mutation, with the LLM response that caused it.
State flags beat LLM-inferred completion. Letting the supervisor LLM infer whether the researcher "finished" from the message text is fragile. Explicit boolean flags (researcher_done: bool) tied to the worker's return value are deterministic, testable, and immune to prompt drift.
Design for partial failure from day one. Workers time out. APIs fail. Build fallback nodes that write a graceful failure message into state and set completion flags β so the supervisor never receives an ambiguous half-done state that causes a spin loop.
Token cost grows superlinearly with shared history. A three-agent system where every agent sees the full conversation history will spend 3Γ the tokens of a system where agents only see their relevant portion. Profile your token spend across agents early; add summarization compression nodes before costs compound.
The recursion_limit is a correctness guard, not just a resource guard. A supervisor that loops 50 times before hitting the limit has likely already produced garbage. Set a tight limit (15β25 hops for most tasks) and treat a limit breach as a circuit breaker that escalates to human review, not just a hard error.
Never use create_supervisor() for your first production deployment. The prebuilt is excellent for prototyping and validating the architecture pattern. Switch to the manual supervisor before production so you own the state schema, routing conditions, and failure exits.
π TLDR: Summary and Key Takeaways
TLDR: Split work across specialist agents β supervisor routing beats one overloaded generalist every time.
- Single agents hit three hard limits: context window ceiling, inability to specialize per task, and serial execution bottleneck. Decompose when any of these binds.
- Three patterns, three use cases: supervisor for dynamic branching, swarm for defined responsibility lanes, pipeline for ordered deterministic stages.
Command(goto=)is the routing primitive: it expresses the supervisor's routing decision as a typed, auditable instruction that LangGraph resolves before the next node runs.Sendis the parallelism primitive: returning a list ofSendobjects fans out to multiple workers simultaneously β use it when worker tasks are independent.- Subgraphs provide state isolation: wrap a complex agent as a subgraph to prevent its internal state from leaking into the parent graph's message history.
- The mathematical case for decomposition: compound specialization gain ($\prod S_i$) typically exceeds coordination overhead by 3β5 workflow steps in research-style tasks.
- The
langgraph-supervisorprebuilt reduces setup to minutes but sacrifice state ownership β prototype with it, then own the supervisor in production. - Multi-agent failure modes are structural: infinite loops, partial completion, context contamination, and agent disagreement each require explicit design mitigations, not just better prompts.
π Practice Quiz
A single research agent has 12 tools and its context window consistently fills after 8 tool calls. What is the primary architectural fix?
- A) Increase the model's context window
- B) Decompose into specialized agents so each has a focused context
- C) Remove tools until only 8 remain
- D) Switch to a cheaper model with faster inference Correct Answer: B
You build a supervisor + three workers. After the researcher completes, the supervisor routes to the fact-checker, then back to the researcher, then back to the fact-checker β indefinitely. What is the most likely root cause?
- A) The
SendAPI was used incorrectly - B) The researcher and fact-checker nodes are missing return edges to the supervisor
- C) The supervisor LLM lacks explicit completion conditions in its routing prompt
- D) The
recursion_limitis set too high Correct Answer: C
- A) The
You need the researcher and fact-checker to run simultaneously to reduce latency. Which LangGraph primitive enables this fan-out?
- A)
Command(goto=["researcher", "fact_checker"]) - B)
Sendβ return a list ofSendobjects from the supervisor node - C) Add a direct edge between researcher and fact_checker
- D) Use
asyncio.gatheroutside the graph Correct Answer: B
- A)
(Open-ended β no single correct answer) You are deciding whether to use
create_supervisor()fromlanggraph-supervisoror a manually coded supervisor for a production compliance system that requires full audit logs of every routing decision and custom state fields per agent. How would you approach this choice, and what design decisions in the manual supervisor ensure auditability and correctness at scale?
π Related Posts
- AI Architecture Patterns: Routers, Planner-Worker Loops, Memory Layers, and Evaluation Guardrails
- Multistep AI Agents: The Power of Planning
- LLM Skill Registry: Routing and Evaluation for Production Agents

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Software Engineering Principles: Your Complete Learning Roadmap
TLDR: This roadmap organizes the Software Engineering Principles series into a problem-first learning path β starting with the code smell before the principle. New to SOLID? Start with Single Responsibility. Facing messy legacy code? Jump to the smel...
Machine Learning Fundamentals: Your Complete Learning Roadmap
TLDR: πΊοΈ Most ML courses dive into math formulas before explaining what problems they solve. This roadmap guides you through 9 essential posts across 3 phases: understanding ML fundamentals β mastering core algorithms β deploying production models. ...
Low-Level Design Guide: Your Complete Learning Roadmap
TLDR TLDR: LLD interviews ask you to design classes and interfaces β not databases and caches.This roadmap sequences 8 problems across two phases: Phase 1 (6 beginner posts) builds your core OOP vocabulary through increasingly complex domains; Phase...

LLM Engineering: Your Complete Learning Roadmap
TLDR: The LLM space moves so fast that engineers end up reading random blog posts and never build a mental model of how everything connects. This roadmap organizes 35+ LLM Engineering posts into 7 tra
