Multi-Agent Systems in LangGraph: Supervisor Pattern, Handoffs, and Agent Networks

Build multi-agent systems in LangGraph: supervisor routing, worker handoffs, subgraphs, and the Send API for parallel agents.

Agentic AI: LangChain and LangGraph

Abstract Algorithms

·Mar 28, 2026·26 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 26 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Split work across specialist agents — supervisor routing beats one overloaded generalist every time.

📖 The Context Ceiling: Why One Agent Can't Do Everything

Your research agent is writing a 20-page report. It has 15 tools. Its context window is full by page 3. The last 17 pages are hallucinated.

This is not a model quality problem. It is a structural problem: one agent trying to do everything hits three hard limits.

1. The context window ceiling. Every tool call, intermediate result, and reasoning trace consumes tokens. A GPT-4o window of 128k tokens sounds large until you add search results (3,000 tokens each), a growing message history, tool schemas (500 tokens each), and five iterations of reflection. A realistic research task exhales that budget inside 10–15 tool calls. Once the window is full, the model truncates early context — silently discarding the facts it needs most.

2. The specialization gap. A single generalist agent uses a single system prompt, a single temperature setting, and a single model. A research subtask demands precision. A copywriting subtask demands creativity. An SQL analysis subtask demands structured output parsing. Optimizing one hurts the others. Specialists tuned per task consistently outperform generalists across domains.

3. The parallelism bottleneck. A single agent is inherently sequential. Three independent subtasks — literature review, competitor analysis, financial summary — could run in parallel, but a single-agent loop serializes them. Wall-clock latency multiplies with each step.

Single-agent failure mode	Root cause	Effect
Context overflow	Token budget exhaustion	Hallucination / truncation
Quality degradation	No task-specific tuning	Errors at task boundaries
Serial latency	No parallelism primitive	3× slower than necessary
Tool sprawl	15+ tools in one prompt	Incorrect tool selection

The solution is not a bigger context window. The solution is decomposition: break the work into specialized agents, each with a focused context, purpose-built tools, and an explicit coordination mechanism.

🔍 Multi-Agent Architectures: Supervisor, Swarm, and Pipeline Compared

LangGraph supports three multi-agent coordination patterns. Each solves a different kind of decomposition problem.

graph TD
    A[Supervisor Pattern (hierarchical)] --> A1[Router LLM decides which worker to call]
    A1 --> A2[Worker completes task returns to supervisor]
    A2 --> A1

    B[Agent Swarm (peer-to-peer)] --> B1[Agent A hands off directly to Agent B]
    B1 --> B2[Agent B hands off to Agent C or back to A]

    C[Pipeline (sequential)] --> C1[Agent 1  Agent 2  Agent 3  output]

Supervisor + Workers (hierarchical) is the most flexible. A routing LLM sits at the top, inspects the current task, and delegates to one of several specialist agents. Each worker returns control to the supervisor after completing its task. The supervisor decides whether to delegate again, to a different worker, or to finalize the output. This pattern handles dynamic, branching workflows where the next step depends on what was just learned.

Agent swarm (peer-to-peer handoffs) is better when agents need to negotiate directly. Agent A does its part and explicitly hands off to Agent B with a Command(goto="agent_b"). No central coordinator exists. The network is a graph where any node can route to any other node. This works well for workflows with well-defined responsibility boundaries — a triage agent hands off to a medical-records agent, which hands off to a billing agent.

Pipeline (sequential delegation) is the simplest pattern. Each agent receives the output of the previous one, does its job, and passes downstream. This is deterministic, easy to trace, and appropriate when every stage must run in order without branching. A data cleaning → enrichment → summarization pipeline is a classic example.

Pattern	Best for	Control flow	Parallelism
Supervisor	Dynamic, branching tasks	Centralized	Via `Send` from supervisor
Swarm	Defined responsibility lanes	Distributed	If agents do not depend on each other
Pipeline	Ordered, predictable stages	Sequential	No native parallelism

⚙️ Building the Supervisor Pattern: Router LLM, Worker Agents, and Handoffs

The supervisor pattern in LangGraph has three components: the shared state, the worker agents as graph nodes, and a supervisor node that routes using an LLM call.

Shared State and Worker Nodes

Every agent in the system reads from and writes to a shared TypedDict state. Workers append their results to the message list. The supervisor reads the full message history and decides what to do next.

from typing import Annotated, Literal
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage

class ResearchState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    next_agent: str          # supervisor writes this to route
    final_report: str        # writer agent writes this

Each worker is a plain LangGraph node — a function that receives state, runs a specialized agent loop, and returns a state patch.

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def web_search(query: str) -> str:
    """Search the web for current information."""
    # real implementation uses Tavily / SerpAPI
    return f"[Search results for: {query}]"

@tool
def fact_check(claim: str) -> str:
    """Verify a factual claim against trusted sources."""
    return f"[Fact check result for: {claim}]"

researcher_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([web_search])
fact_checker_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([fact_check])
writer_llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

def researcher_node(state: ResearchState) -> dict:
    """Runs the researcher agent with its tool loop."""
    response = researcher_llm.invoke(state["messages"])
    return {"messages": [response]}

def fact_checker_node(state: ResearchState) -> dict:
    """Runs the fact-checker against claims in the message history."""
    response = fact_checker_llm.invoke(state["messages"])
    return {"messages": [response]}

def writer_node(state: ResearchState) -> dict:
    """Drafts a final report from everything gathered."""
    response = writer_llm.invoke(state["messages"])
    return {"messages": [response], "final_report": response.content}

Supervisor Routing with Command

The supervisor node calls the LLM with a structured prompt that asks it to choose the next agent. In LangGraph, the routing decision is expressed as a Command — an explicit instruction to move to a named node and optionally update state.

from langgraph.types import Command

SUPERVISOR_PROMPT = """You are a research supervisor managing three specialist agents:
- researcher: finds information and sources
- fact_checker: verifies claims and catches errors
- writer: synthesizes findings into a final report

Given the current conversation, decide which agent to invoke next,
or respond with 'FINISH' if the report is complete.

Respond with exactly one word: researcher | fact_checker | writer | FINISH
"""

def supervisor_node(state: ResearchState) -> Command[Literal["researcher", "fact_checker", "writer", "__end__"]]:
    messages = [{"role": "system", "content": SUPERVISOR_PROMPT}] + state["messages"]
    response = ChatOpenAI(model="gpt-4o", temperature=0).invoke(messages)
    decision = response.content.strip().lower()

    if decision == "finish":
        return Command(goto=END)

    return Command(
        goto=decision,
        update={"next_agent": decision}
    )

Command(goto="researcher") tells LangGraph to route to the researcher node. update={...} writes into the shared state before the next node runs. The type annotation Command[Literal["researcher", ...]] gives LangGraph the edge set it needs to build the graph.

Wiring the Graph

builder = StateGraph(ResearchState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("fact_checker", fact_checker_node)
builder.add_node("writer", writer_node)

# All workers return to the supervisor after completing
builder.add_edge("researcher", "supervisor")
builder.add_edge("fact_checker", "supervisor")
builder.add_edge("writer", "supervisor")

builder.set_entry_point("supervisor")
graph = builder.compile()

# Run
result = graph.invoke({
    "messages": [HumanMessage(content="Write a report on the state of quantum computing in 2025.")]
})
print(result["final_report"])

The graph topology is: supervisor routes to a worker → worker completes → supervisor evaluates → routes again or finishes. This loop continues until the supervisor emits Command(goto=END).

🧠 Deep Dive: State Isolation, Subgraphs, and the Send API

The Internals

How subgraph state boundaries work. When you wrap a specialist agent as a subgraph instead of a plain node, it gets its own private TypedDict. The parent graph and the subgraph communicate only at explicit boundary points — the entry edge (parent state → subgraph input mapping) and the exit edge (subgraph output → parent state mapping). This means a subgraph can have its own tool_calls, scratchpad, or intermediate_steps fields that never pollute the parent graph's state.

from langgraph.graph import StateGraph

class ResearcherState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    search_queries: list[str]       # private to researcher
    sources_found: list[str]        # private to researcher

# Build the researcher as a standalone graph
researcher_builder = StateGraph(ResearcherState)
# ... add nodes, tools, loops ...
researcher_subgraph = researcher_builder.compile()

# Mount the subgraph as a node in the parent
def researcher_node(state: ResearchState) -> dict:
    # Map parent state → subgraph input
    sub_result = researcher_subgraph.invoke({
        "messages": state["messages"],
        "search_queries": [],
        "sources_found": []
    })
    # Map subgraph output → parent state
    return {"messages": sub_result["messages"]}

How Command(goto=) routes between agents. Under the hood, Command is a return value that LangGraph's runtime intercepts before the next edge resolution step. Instead of reading static edge definitions, the runtime reads the goto field from the Command object returned by the current node. The update dict is merged into the channel state before the target node runs — guaranteeing the target receives fresh state. This is what makes supervisor routing dynamic: the edge set is fixed at compile time, but which edge fires is decided at runtime by the LLM.

How the supervisor's tool calls map to agent invocations. An alternative to structured text output is to have the supervisor use tool calling where each worker agent is registered as a tool. The supervisor LLM emits a tool call (e.g., call_researcher(task="find quantum computing breakthroughs")), the runtime interprets that as a Command(goto="researcher"), and the task argument seeds the worker's context. This approach improves reliability because the LLM is constrained to a typed schema rather than free-form text routing.

sequenceDiagram
    participant S as Supervisor
    participant LLM as Router LLM
    participant R as Researcher
    participant FC as Fact Checker
    participant W as Writer

    S->>LLM: Current state + routing prompt
    LLM-->>S: Command(goto="researcher")
    S->>R: invoke(state)
    R-->>S: updated messages (sources found)
    S->>LLM: Updated state + routing prompt
    LLM-->>S: Command(goto="fact_checker")
    S->>FC: invoke(state)
    FC-->>S: updated messages (verified claims)
    S->>LLM: Updated state + routing prompt
    LLM-->>S: Command(goto="writer")
    S->>W: invoke(state)
    W-->>S: final_report
    S->>LLM: Updated state + routing prompt
    LLM-->>S: Command(goto=END)

Performance Analysis

Latency of multi-hop delegation. Each supervisor → worker → supervisor round trip adds two LLM calls: one supervisor invocation plus the worker's own inference. For a three-worker sequential workflow, total latency is approximately:

T_total ≈ T_supervisor × (N_hops + 1) + Σ T_worker_i

With GPT-4o at ~1.5s per call and three sequential workers, a supervisor loop adds ~6s of routing overhead on top of the work itself. This is acceptable for research tasks but too slow for real-time user interactions — prefer single agents or cached routing there.

Parallelism with the Send API. The Send API lets the supervisor fan out to multiple workers simultaneously instead of sequentially. This eliminates serial overhead when workers are independent.

from langgraph.types import Send

def supervisor_fanout_node(state: ResearchState) -> list[Send]:
    """Fan out to researcher and fact_checker in parallel."""
    tasks = [
        Send("researcher", {**state, "messages": state["messages"] + [
            HumanMessage(content="Find primary sources on quantum computing 2025")
        ]}),
        Send("fact_checker", {**state, "messages": state["messages"] + [
            HumanMessage(content="Verify: quantum supremacy was claimed in 2024")
        ]}),
    ]
    return tasks

Returning a list of Send objects from a node triggers LangGraph to execute all of them as concurrent branches. Results are collected and merged before the next node runs. A three-parallel-worker setup reduces wall-clock time from Σ T_i to max(T_i) — a 3× speedup when tasks take equal time.

Token cost of shared message history. The shared messages list grows with every agent turn. By the third worker invocation, it carries the full conversation — including the researcher's raw search results. A researcher output of 2,000 tokens, fed into the fact-checker's context, costs those tokens at every subsequent call. In long chains, shared message history becomes the dominant token cost driver.

Mitigation: use summarization nodes between workers to compress verbose intermediate outputs before they enter the supervisor's next prompt.

Mathematical Model

When do N specialized agents outperform one generalist?

Define the following variables for a task decomposed into $N$ subtasks:

$C$ = context window capacity (tokens)
$T_i$ = tokens consumed by subtask $i$ including tool calls and intermediate reasoning
$S_i \geq 1$ = specialization quality multiplier for subtask $i$ (1 = no benefit; 2 = generalist makes half as many errors)
$O$ = per-handoff coordination overhead in tokens
$Q_{\text{generalist}}$ = product quality score for a single-agent run
$Q_{\text{multi}}$ = product quality score for the multi-agent run

Context overflow condition. The generalist fails (truncates context) when:

$$\sum_{i=1}^{N} T_i + N \cdot O > C$$

Each specialized agent only sees its own subtask, so individual context usage is $T_i + O$, which must satisfy $T_i + O < C$ — a much weaker constraint.

Specialization gain. Define the quality ratio as:

$$\frac{Q_{\text{multi}}}{Q_{\text{generalist}}} = \prod_{i=1}^{N} S_i$$

If each specialist is 20% better than the generalist on its subtask ($S_i = 1.2$), three specialists yield a compound quality ratio of $1.2^3 = 1.73$ — a 73% quality improvement.

Net benefit condition. Multi-agent decomposition is beneficial when either (or both) of these hold:

$$\text{(1) Context overflow: } \sum_{i=1}^{N} T_i > C - N \cdot O$$

$$\text{(2) Quality gain: } \prod_{i=1}^{N} S_i > 1 + \frac{N \cdot O \cdot \lambda}{C}$$

where $\lambda$ converts token cost to quality penalty (empirically ~0.0005 per token for GPT-4o class models).

Worked example. Three subtasks: researcher ($T_1 = 40\text{k}$), fact-checker ($T_2 = 20\text{k}$), writer ($T_3 = 30\text{k}$). Context window $C = 128\text{k}$. Coordination overhead $O = 2\text{k}$ per hop.

Generalist total: $40 + 20 + 30 + 3 \times 2 = 96\text{k}$ — fits, but barely. Adding one more research cycle overflows.
Specialization gain (each agent 25% better): $1.25^3 = 1.95$ — nearly 2× quality improvement.
Conclusion: decompose, even without overflow, because compound specialization beats the coordination cost.

🏗️ Scaling Multi-Agent Systems: Edge Cases, Optimizations, and What Teams Get Wrong

Scaling to 10+ Agents

Adding more agents is not free. Every new specialist adds a routing option that the supervisor LLM must evaluate. Beyond 7–10 agents, the routing prompt grows long enough to degrade routing quality — the model starts making errors about which agent is appropriate. Mitigations:

Hierarchical supervisors: group agents into sub-teams, each managed by a sub-supervisor. The top-level supervisor routes to sub-supervisors, not individual agents. This keeps each routing decision to ≤5 options.
Semantic routing cache: if the same task type recurs, cache the routing decision by task embedding similarity. Skip the LLM call for known task types.
Agent capability registry: store each agent's capabilities as a structured schema, and use a lightweight classifier (not an LLM) for first-pass routing.

Edge Cases: Cycles, Deadlocks, and Starvation

Edge Case	When It Occurs	Detection	Fix
Routing cycle	Supervisor routes A→B→A indefinitely	Visited-node counter in state	`recursion_limit` + cycle guard
Agent deadlock	Two agents wait for each other's output	No deadlock in single-threaded graph	Only occurs in async multi-process setups — use timeouts
Worker starvation	Supervisor always picks the same agent	Routing histogram in traces	Add routing diversity constraint to supervisor prompt
Empty state entry	Worker invoked with no relevant messages	Worker receives ambiguous task	Validate preconditions before routing in supervisor node

Common Misconceptions

"More agents = better quality." False. Each agent boundary introduces a communication loss: the downstream agent only sees what the upstream agent wrote, not what it reasoned. Unnecessary decomposition discards implicit reasoning context. Only decompose when a hard limit (context, specialization, parallelism) requires it.

"The supervisor needs GPT-4 to route reliably." Not always. Routing is a classification task, not a reasoning task. A well-structured routing prompt with enumerated options and explicit state flags often performs reliably with GPT-4o-mini at 10× lower cost. Reserve the large model for the workers that do actual synthesis.

"Subgraphs automatically isolate token cost." Subgraph state isolation is logical, not token-cost isolation. If you pass the full parent messages list into a subgraph, the subgraph still processes all those tokens. True token isolation requires passing only a summary or a scoped slice of the message history.

📊 The Deep Research System Architecture

The following diagram shows the full deep research system: the supervisor fans out to the researcher and fact-checker in parallel, then sequentially routes to the writer, and finally exits.

graph TD
    User([ User Research Request]) --> Supervisor

    Supervisor[ Supervisor (Router LLM)]

    Supervisor -- "Send: parallel fan-out" --> Researcher[ Researcher Agent + web_search tool + arxiv_search tool]
    Supervisor -- "Send: parallel fan-out" --> FactChecker[ Fact-Checker Agent + fact_check tool + citation_lookup tool]

    Researcher -- "messages: sources + findings" --> Merge[ Merge Node (collect parallel results)]
    FactChecker -- "messages: verified claims" --> Merge

    Merge --> Supervisor

    Supervisor -- "Command(goto='writer')" --> Writer[ Writer Agent + format_report tool]

    Writer -- "final_report" --> Supervisor

    Supervisor -- "Command(goto=END)" --> Output([ Final Report])

    style Supervisor fill:#4A90D9,color:#fff
    style Researcher fill:#27AE60,color:#fff
    style FactChecker fill:#E67E22,color:#fff
    style Writer fill:#8E44AD,color:#fff
    style Merge fill:#555,color:#fff

Reading the diagram. The supervisor first fans out using Send — the researcher and fact-checker run in parallel, each with their own tools. Their results merge back into the shared state. The supervisor then evaluates the verified findings and routes sequentially to the writer. The writer produces the final report, control returns to the supervisor once more, and the supervisor exits.

🌍 Real-World Applications: Multi-Agent Systems in Production

Case Study 1: Enterprise Document Intelligence at Scale

A financial services firm built a multi-agent pipeline for regulatory document analysis. The system decomposes each filing into four parallel workstreams: a table extraction agent, a risk clause agent, a cross-reference agent, and a summarization agent. Each agent runs on a focused 20k-token context window rather than feeding the 400-page document to one model.

Input: 400-page SEC 10-K filing (PDF, ~200k tokens extracted).
Architecture: Supervisor routes each document section to the appropriate specialist. The cross-reference agent runs in parallel with risk clause extraction.
Output: Structured JSON with flagged clauses, risk scores, and referenced exhibits.
Scaling note: Parallel fan-out reduced P90 latency from 4 minutes (sequential) to 68 seconds. The supervisor makes ~15 routing decisions per document. Token cost increased by 12% due to coordination overhead but accuracy improved by 31%.

Case Study 2: AI-Powered Customer Support Escalation

A SaaS company built a swarm-pattern multi-agent system where a triage agent classifies the ticket, then hands off directly to a billing agent, technical agent, or escalation agent using Command(goto=). No central supervisor — each agent knows its completion condition and routes onward.

Input: Customer support ticket (text + attachment).
Architecture: Triage → (billing | technical | escalation) via direct handoffs.
Output: Resolved ticket with audit trail of which agent handled each step.
Operational lesson: Agent disagreement emerged when the billing agent re-routed to technical because it detected a product bug in the billing data. The system needed explicit cycle-detection: if the same ticket visits the same agent twice, escalate to human review.

Case Study 3: Code Review Multi-Agent Network

A developer tools team runs three agents in a pipeline: a static analysis agent (linting, complexity metrics), a security agent (OWASP checks, dependency scanning), and a review agent (style, architecture suggestions). Each receives the previous agent's annotations as part of its context.

Scaling note: The pipeline pattern here is intentional — security review must happen after static analysis flags dead code paths. Adding parallelism would cause the security agent to miss coverage gaps that static analysis identifies.

⚖️ Trade-offs and Failure Modes: Coordination Overhead, Agent Disagreement, and Cascade Failures

Performance vs. Coordination Cost

Every inter-agent handoff adds at minimum one LLM call (the supervisor making a routing decision) plus state serialization overhead. For tasks that fit comfortably in a single context window with a coherent tool set, multi-agent decomposition is net-negative: it adds latency and token spend without quality benefit. The threshold is roughly: if your single-agent workflow uses fewer than 8 tools and stays under 50k tokens, stay single-agent.

Agent Disagreement

In a fact-checker + writer pair, the fact-checker may flag a claim as unverified while the writer proceeds to include it. This disagreement is invisible in a system with no arbitration mechanism. Mitigations:

Add explicit reconciliation nodes between conflicting agents
Use structured output schemas (Pydantic) so agents express confidence levels, not just content
Route disagreements back to the supervisor with a conflict: true flag in state

Cascade Failures and Infinite Loops

A supervisor that cannot decide emits the same routing decision repeatedly. Without a loop-detection mechanism, this spins indefinitely. LangGraph provides recursion_limit in the RunnableConfig — set it to N_expected_hops × 2 as a safety ceiling.

result = graph.invoke(
    {"messages": [HumanMessage(content="Research quantum computing")]},
    config={"recursion_limit": 25}  # 25 hops max
)

A harder failure mode is partial completion: the researcher agent succeeds, but the fact-checker hits a timeout. The supervisor receives an incomplete state and may not detect the gap. Design workers to write explicit completion markers into state (researcher_done: bool) so the supervisor can validate completeness before routing to the writer.

Context Contamination

Shared message history means every agent sees every other agent's outputs. If the researcher returns verbose raw HTML from a web scrape, that content enters the fact-checker's context and wastes tokens on irrelevant content. Solution: use structured state fields (not just the messages list) to carry only processed data between agents.

Failure Mode	Trigger	Mitigation
Infinite supervisor loop	LLM indecision	`recursion_limit` config
Partial completion	Worker timeout / error	Completion flags in state
Context contamination	Raw outputs in shared messages	Separate structured state fields
Agent disagreement	Conflicting outputs	Reconciliation nodes
Cascade timeout	Parallel worker failure	Timeout + fallback nodes

🧭 Decision Guide: Single Agent vs Supervisor vs Swarm

Situation	Recommendation
Task fits in one context window, fewer than 8 tools	Single agent — decomposition adds overhead without benefit
Task requires dynamic routing based on intermediate findings	Supervisor pattern — centralized control handles branching
Task has well-defined responsibility lanes with no ambiguity at boundaries	Agent swarm — direct handoffs are simpler than a supervisor
Subtasks are independent and can run in parallel	Supervisor + `Send` fan-out — captures parallelism benefit
Tasks must run in strict order with no branching	Pipeline pattern — sequential, predictable, easy to trace
Each subtask needs a different model or temperature	Always decompose — generalist config cannot simultaneously optimize all subtasks
Real-time user interaction (< 1s latency required)	Single agent or cached routing — multi-hop latency is prohibitive
Regulatory audit trail required per decision	Supervisor pattern — each routing `Command` is a traceable event in LangGraph state history

🧪 Practical Example: Deep Research System with Three Specialist Agents

This example demonstrates a deep research pipeline where specialist agents run in parallel via Send-based fan-out — exactly the pattern the post's 🧠 Internals section describes as the supervisor's coordination mechanism. The deep-research scenario was chosen because it has three fundamentally different agent roles (researcher, fact-checker, writer) that map clearly to the three supervisor routing decisions: fan-out to two agents, wait for both, then hand off sequentially to one. As you read through the supervisor function, watch the Send calls in the fan_out_to_specialists routing — each Send("researcher", ...) and Send("fact_checker", ...) is a parallel task dispatch, and results_complete fires only after both complete; that ordering is the entire multi-agent coordination contract.

from typing import Annotated, Literal
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.types import Command, Send
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, BaseMessage, SystemMessage
from langchain_core.tools import tool

# ── State ──────────────────────────────────────────────────────────────────────

class ResearchState(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]
    researcher_done: bool
    fact_checker_done: bool
    final_report: str

# ── Tools ─────────────────────────────────────────────────────────────────────

@tool
def web_search(query: str) -> str:
    """Search the web for up-to-date information on a topic."""
    return f"[Web results for '{query}': Found 5 relevant sources about recent developments.]"

@tool
def fact_check(claim: str) -> str:
    """Verify whether a specific claim is supported by cited sources."""
    return f"[Fact check for '{claim}': Supported by 3 sources, 1 contested.]"

# ── Agent LLMs ────────────────────────────────────────────────────────────────

researcher_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([web_search])
fact_checker_llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools([fact_check])
writer_llm = ChatOpenAI(model="gpt-4o", temperature=0.4)
router_llm = ChatOpenAI(model="gpt-4o", temperature=0)

# ── Worker Nodes ──────────────────────────────────────────────────────────────

def researcher_node(state: ResearchState) -> dict:
    sys_prompt = SystemMessage(content=(
        "You are a research specialist. Use web_search to gather current, "
        "accurate information. Be thorough and cite your sources."
    ))
    response = researcher_llm.invoke([sys_prompt] + state["messages"])
    return {
        "messages": [response],
        "researcher_done": True
    }

def fact_checker_node(state: ResearchState) -> dict:
    sys_prompt = SystemMessage(content=(
        "You are a fact-checking specialist. Review all claims in the conversation "
        "and use fact_check to verify each one. Flag unverified claims explicitly."
    ))
    response = fact_checker_llm.invoke([sys_prompt] + state["messages"])
    return {
        "messages": [response],
        "fact_checker_done": True
    }

def writer_node(state: ResearchState) -> dict:
    sys_prompt = SystemMessage(content=(
        "You are a professional report writer. Using the research findings and "
        "fact-check results in this conversation, write a comprehensive, "
        "well-structured final report. Be clear, concise, and accurate."
    ))
    response = writer_llm.invoke([sys_prompt] + state["messages"])
    return {
        "messages": [response],
        "final_report": response.content
    }

# ── Supervisor Node ────────────────────────────────────────────────────────────

ROUTING_PROMPT = """You are a research supervisor. Workers available:
- researcher: gathers information (run first)
- fact_checker: verifies claims (run after researcher)
- writer: writes the final report (run after fact_checker)
- FINISH: end the workflow (run after writer has produced a report)

Current state:
- researcher_done: {researcher_done}
- fact_checker_done: {fact_checker_done}
- has_final_report: {has_final_report}

What is the next step? Reply with exactly one word.
"""

def supervisor_node(state: ResearchState) -> Command[
    Literal["researcher", "fact_checker", "writer", "__end__"]
]:
    prompt = ROUTING_PROMPT.format(
        researcher_done=state.get("researcher_done", False),
        fact_checker_done=state.get("fact_checker_done", False),
        has_final_report=bool(state.get("final_report")),
    )
    response = router_llm.invoke(
        [SystemMessage(content=prompt)] + state["messages"]
    )
    decision = response.content.strip().lower()

    if decision == "finish":
        return Command(goto=END)

    valid = {"researcher", "fact_checker", "writer"}
    if decision not in valid:
        # Fallback: if LLM returns unexpected output, force-finish
        return Command(goto=END)

    return Command(goto=decision, update={"next_agent": decision})

# ── Graph Assembly ─────────────────────────────────────────────────────────────

builder = StateGraph(ResearchState)
builder.add_node("supervisor", supervisor_node)
builder.add_node("researcher", researcher_node)
builder.add_node("fact_checker", fact_checker_node)
builder.add_node("writer", writer_node)

# Workers return to supervisor after completing
builder.add_edge("researcher", "supervisor")
builder.add_edge("fact_checker", "supervisor")
builder.add_edge("writer", "supervisor")

builder.set_entry_point("supervisor")

graph = builder.compile()

# ── Run ────────────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    initial_state = {
        "messages": [HumanMessage(content=(
            "Write a comprehensive report on the current state of quantum computing "
            "in 2025, focusing on recent hardware breakthroughs and enterprise adoption."
        ))],
        "researcher_done": False,
        "fact_checker_done": False,
        "final_report": "",
    }

    result = graph.invoke(
        initial_state,
        config={"recursion_limit": 20}
    )

    print("=== FINAL REPORT ===")
    print(result["final_report"])

What this demonstrates:

The supervisor inspects explicit state flags (researcher_done, fact_checker_done) rather than relying on the LLM to infer completion — this makes routing deterministic and auditable.
Each worker has its own system prompt tuned for its role.
The recursion_limit guard prevents runaway loops if the router LLM gives unexpected output.
The fallback in supervisor_node forces a clean exit rather than raising an unhandled exception.

To extend this with parallel fan-out, replace the first supervisor call with a Send-based fan-out node that fires the researcher and fact-checker simultaneously, then merge their results before routing to the writer.

🛠️ langgraph-supervisor: The Official Multi-Agent Orchestration Library

LangGraph ships a prebuilt library — langgraph-supervisor — that codifies the supervisor pattern with minimal boilerplate. Instead of writing the routing prompt and Command logic manually, create_supervisor() generates it from your agent list.

pip install langgraph-supervisor

from langgraph_supervisor import create_supervisor
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")

# Create specialist agents using prebuilt ReAct loop
researcher_agent = create_react_agent(
    model=model,
    tools=[web_search],
    name="researcher",
    prompt="You are a research specialist. Find accurate, current information."
)

fact_checker_agent = create_react_agent(
    model=model,
    tools=[fact_check],
    name="fact_checker",
    prompt="You are a fact-checking specialist. Verify all claims carefully."
)

writer_agent = create_react_agent(
    model=model,
    tools=[],
    name="writer",
    prompt="You are a report writer. Synthesize findings into a clear report."
)

# Create the supervisor graph
supervisor = create_supervisor(
    agents=[researcher_agent, fact_checker_agent, writer_agent],
    model=model,
    prompt=(
        "You are a research team supervisor. Coordinate the researcher, "
        "fact_checker, and writer agents to produce accurate research reports. "
        "Always verify facts before writing the final report."
    )
).compile()

# Invoke
result = supervisor.invoke({
    "messages": [HumanMessage(content="Report on quantum computing in 2025")]
})

What create_supervisor() does under the hood:

Builds a routing LLM prompt that includes each agent's name and description.
Registers each agent as a callable tool in the supervisor's tool schema.
Generates a Command-based routing node from the supervisor's tool call output.
Wires edges so all workers return to the supervisor.

The prebuilt handles the boilerplate but gives less control over state schema, routing prompt, and completion conditions. For production systems with strict audit requirements or complex state, the manual approach from the previous section gives cleaner control boundaries.

Feature	`create_supervisor()` prebuilt	Manual supervisor
Setup time	Minutes	Hours
State control	Opaque (internal message schema)	Full TypedDict ownership
Custom routing logic	Limited	Arbitrary Python
Audit / observability	LangSmith traces	LangSmith + custom state fields
Best for	Prototyping, standard patterns	Production, complex state

📚 Lessons Learned from Building Multi-Agent Systems

Make routing decisions observable, not opaque. The biggest debugging nightmare in multi-agent systems is a supervisor that routes inexplicably. Write the routing decision into an explicit state field (next_agent: str) alongside the reason (routing_reason: str). LangGraph's state history then gives you a complete trace: every routing decision, every state mutation, with the LLM response that caused it.

State flags beat LLM-inferred completion. Letting the supervisor LLM infer whether the researcher "finished" from the message text is fragile. Explicit boolean flags (researcher_done: bool) tied to the worker's return value are deterministic, testable, and immune to prompt drift.

Design for partial failure from day one. Workers time out. APIs fail. Build fallback nodes that write a graceful failure message into state and set completion flags — so the supervisor never receives an ambiguous half-done state that causes a spin loop.

Token cost grows superlinearly with shared history. A three-agent system where every agent sees the full conversation history will spend 3× the tokens of a system where agents only see their relevant portion. Profile your token spend across agents early; add summarization compression nodes before costs compound.

The recursion_limit is a correctness guard, not just a resource guard. A supervisor that loops 50 times before hitting the limit has likely already produced garbage. Set a tight limit (15–25 hops for most tasks) and treat a limit breach as a circuit breaker that escalates to human review, not just a hard error.

Never use create_supervisor() for your first production deployment. The prebuilt is excellent for prototyping and validating the architecture pattern. Switch to the manual supervisor before production so you own the state schema, routing conditions, and failure exits.

📌 TLDR: Summary and Key Takeaways

TLDR: Split work across specialist agents — supervisor routing beats one overloaded generalist every time.

Single agents hit three hard limits: context window ceiling, inability to specialize per task, and serial execution bottleneck. Decompose when any of these binds.
Three patterns, three use cases: supervisor for dynamic branching, swarm for defined responsibility lanes, pipeline for ordered deterministic stages.
Command(goto=) is the routing primitive: it expresses the supervisor's routing decision as a typed, auditable instruction that LangGraph resolves before the next node runs.
Send is the parallelism primitive: returning a list of Send objects fans out to multiple workers simultaneously — use it when worker tasks are independent.
Subgraphs provide state isolation: wrap a complex agent as a subgraph to prevent its internal state from leaking into the parent graph's message history.
The mathematical case for decomposition: compound specialization gain ($\prod S_i$) typically exceeds coordination overhead by 3–5 workflow steps in research-style tasks.
The langgraph-supervisor prebuilt reduces setup to minutes but sacrifice state ownership — prototype with it, then own the supervisor in production.
Multi-agent failure modes are structural: infinite loops, partial completion, context contamination, and agent disagreement each require explicit design mitigations, not just better prompts.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read

Multi-Agent Systems in LangGraph: Supervisor Pattern, Handoffs, and Agent Networks

Intermediate

📖 The Context Ceiling: Why One Agent Can't Do Everything

🔍 Multi-Agent Architectures: Supervisor, Swarm, and Pipeline Compared

⚙️ Building the Supervisor Pattern: Router LLM, Worker Agents, and Handoffs

Shared State and Worker Nodes

Supervisor Routing with Command

Wiring the Graph

🧠 Deep Dive: State Isolation, Subgraphs, and the Send API

The Internals

Performance Analysis

Mathematical Model

🏗️ Scaling Multi-Agent Systems: Edge Cases, Optimizations, and What Teams Get Wrong

Scaling to 10+ Agents

Edge Cases: Cycles, Deadlocks, and Starvation

Common Misconceptions

📊 The Deep Research System Architecture

🌍 Real-World Applications: Multi-Agent Systems in Production

Case Study 1: Enterprise Document Intelligence at Scale

Case Study 2: AI-Powered Customer Support Escalation

Case Study 3: Code Review Multi-Agent Network

⚖️ Trade-offs and Failure Modes: Coordination Overhead, Agent Disagreement, and Cascade Failures

Performance vs. Coordination Cost

Agent Disagreement

Cascade Failures and Infinite Loops

Context Contamination

🧭 Decision Guide: Single Agent vs Supervisor vs Swarm

🧪 Practical Example: Deep Research System with Three Specialist Agents

🛠️ langgraph-supervisor: The Official Multi-Agent Orchestration Library

📚 Lessons Learned from Building Multi-Agent Systems

📌 TLDR: Summary and Key Takeaways

🔗 Related Posts

Test Your Knowledge

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive