LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools

Wire real capabilities into LangGraph agents: @tool decorator, ToolNode, bind_tools, parallel execution, and error handling.

Agentic AI: LangChain and LangGraph

Abstract Algorithms

·Mar 28, 2026·17 min read

Cover Image for LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 17 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Wire @tool, ToolNode, and bind_tools into LangGraph for agents that call APIs at runtime.

📖 The Stale Knowledge Problem: Why LLMs Need Runtime Tools

Your agent confidently tells you the current stock price of NVIDIA. It's from its training data — six months out of date. The model doesn't know what it doesn't know. It has no concept of "today." Every fact it gives you was frozen at training cutoff, and it will recite that frozen snapshot with the same unwavering confidence it uses to state that water is wet.

This is the fundamental limitation of a bare LLM in production: it is a lookup table, not a live system. It can reason, summarize, and plan. What it cannot do on its own is fetch a live API response, run a shell command, query a database, or check whether a flight has been delayed. For agentic applications — systems that must act on real-world state — this is a hard blocker.

Tools are the solution. A tool is simply a Python function with a well-defined signature that the LLM can choose to invoke at runtime. The LLM doesn't execute the function itself; it emits a structured instruction ("call get_stock_price with ticker='NVDA'"), and your graph's execution layer runs the function and feeds the result back. The LLM then reasons over the live result.

LangGraph provides a clean, composable infrastructure for this entire loop: the @tool decorator to define tools, bind_tools() to register them with any LLM, ToolNode to execute them, and tools_condition to route the graph based on whether the model wants to call a tool or is ready to respond. Together, these four pieces turn a static LLM into an agent that interacts with the real world.

🔍 Tool Fundamentals: @tool, Schemas, and the bind_tools() Pattern

Before wiring tools into a graph, you need to understand what a "tool" is from LangGraph's perspective: a Python callable decorated with @tool that carries enough metadata for an LLM to decide when and how to call it.

Defining a Tool with @tool

The @tool decorator from langchain_core.tools does three things. It wraps your function, uses the docstring as the tool description (what the model reads to decide whether to call it), and derives the JSON schema for the function's arguments from Python type hints.

from langchain_core.tools import tool

@tool
def get_stock_price(ticker: str) -> str:
    """
    Fetch the current stock price for a given ticker symbol.
    Returns a formatted string with the latest price.
    """
    # In production, call a real market data API here
    prices = {"NVDA": "118.42", "AAPL": "213.07", "TSLA": "172.30"}
    price = prices.get(ticker.upper(), "unknown")
    return f"{ticker.upper()} is currently trading at ${price}"

The docstring is not decoration — it's the model's only window into what this tool does. Write it like documentation for a smart but uninformed colleague. Mention what the tool accepts, what it returns, and when it should (or shouldn't) be called.

Custom Schemas with Pydantic for Complex Inputs

For tools that require structured or multi-field inputs, you can define the input schema explicitly using a Pydantic BaseModel. This gives you validation, default values, and richer descriptions per field.

from pydantic import BaseModel, Field
from langchain_core.tools import tool

class WebSearchInput(BaseModel):
    query: str = Field(description="The search query string")
    max_results: int = Field(default=5, description="Maximum number of results to return")

@tool("web_search", args_schema=WebSearchInput)
def web_search(query: str, max_results: int = 5) -> str:
    """Search the web for current information on a topic."""
    # In production: call Tavily, SerpAPI, or Brave Search here
    return f"Top {max_results} results for '{query}': [result1, result2, ...]"

LLM-Agnostic Binding with bind_tools()

Once your tools are defined, you attach them to any LangChain-compatible LLM using bind_tools(). This method injects the tool schemas into the model's system context so it knows what tools exist and can emit tool_calls in its response when appropriate.

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

tools = [get_stock_price, web_search]

# OpenAI
llm_openai = ChatOpenAI(model="gpt-4o").bind_tools(tools)

# Anthropic — same API, different model
llm_anthropic = ChatAnthropic(model="claude-3-5-sonnet-20241022").bind_tools(tools)

# Groq (fast inference)
from langchain_groq import ChatGroq
llm_groq = ChatGroq(model="llama-3.3-70b-versatile").bind_tools(tools)

The key insight is that bind_tools() is LLM-agnostic — you swap the model class without changing the tool definitions or the graph structure. This is one of LangGraph's most important architectural properties: tool-augmented agents are portable across providers.

LLM Provider	Class	Notes
OpenAI	`ChatOpenAI`	GPT-4o natively excellent at tool selection
Anthropic	`ChatAnthropic`	Claude 3.5 Sonnet reliable for structured calls
Groq	`ChatGroq`	Low-latency; use with Llama/Mistral
Ollama (local)	`ChatOllama`	Works with Llama3-based models

⚙️ Wiring Tools into LangGraph: ToolNode, Conditional Routing, and the Tool Loop

With tools defined and bound to the LLM, the next step is wiring them into the graph. This requires three components: a state, an agent node, a ToolNode, and conditional routing between them.

State and the Agent Node

The graph state holds the conversation as a list of messages. The agent node calls the LLM with the current state and appends the model's response.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

def agent_node(state: AgentState):
    """Call the LLM with the current message history."""
    response = llm_openai.invoke(state["messages"])
    return {"messages": [response]}

ToolNode: The Prebuilt Executor

ToolNode is a prebuilt LangGraph node that reads the tool_calls field from the last AIMessage in the state, executes each referenced tool, and appends ToolMessage results back to the state. You don't need to write dispatch logic yourself.

tool_node = ToolNode(tools)  # Pass the same list you used in bind_tools()

Conditional Routing with tools_condition

tools_condition is a prebuilt routing function that inspects the last message in state. If it contains tool_calls, it returns "tools" to route to ToolNode. If there are no tool calls, it returns END, signalling the agent is done.

graph_builder = StateGraph(AgentState)

graph_builder.add_node("agent", agent_node)
graph_builder.add_node("tools", tool_node)

graph_builder.set_entry_point("agent")

graph_builder.add_conditional_edges(
    "agent",
    tools_condition,          # Routes to "tools" or END
)
graph_builder.add_edge("tools", "agent")  # After tools run, go back to agent

graph = graph_builder.compile()

The add_edge("tools", "agent") line is what creates the loop: after tools execute and results are added to state, control returns to the agent node for the next reasoning step. The loop exits when the model chooses to respond directly without calling any tools.

🧠 Deep Dive: How LangGraph Executes Tool Calls

The Internals

When the LLM with bound tools generates a response that includes tool calls, it returns an AIMessage with a tool_calls field. Each entry in this list is a dict with three keys: id (a unique call identifier), name (the tool function name), and args (a dict of parsed arguments).

# Example AIMessage.tool_calls structure:
[
    {
        "id": "call_abc123",
        "name": "get_stock_price",
        "args": {"ticker": "NVDA"}
    },
    {
        "id": "call_def456",
        "name": "web_search",
        "args": {"query": "NVIDIA Q4 2025 earnings", "max_results": 3}
    }
]

ToolNode iterates this list and dispatches each call to the matching function by name. It uses the tool registry it was initialized with — the same tools list passed to ToolNode(tools). After execution, it wraps each result in a ToolMessage that carries the matching tool_call_id so the LLM can correlate each result to the call that produced it.

Parallel execution: When a single AIMessage contains multiple tool_calls, ToolNode executes them concurrently using Python's ThreadPoolExecutor. This means if your agent asks for get_stock_price("NVDA") and web_search("NVIDIA news") in the same turn, both calls happen simultaneously. The state receives all ToolMessage results before the agent node is invoked again.

State threading: LangGraph's add_messages reducer deduplicates messages by id. Because each ToolMessage carries the original tool_call_id, the message list remains coherent even with multiple parallel results arriving at once.

Performance Analysis

Latency composition: A single tool-calling round trip adds at least three network hops — the initial LLM call (to decide to use a tool), the tool execution (calling the external API), and a second LLM call (to interpret the result). For a typical setup with OpenAI and an external search API, this means 1–3 seconds per round trip under normal conditions.

Parallel vs. sequential: If the agent calls two tools in a single step (parallel), total latency is max(tool_A_latency, tool_B_latency) plus two LLM calls. If the tools were called in two sequential turns, the latency would be tool_A + tool_B plus three LLM calls. Designing prompts that encourage parallel tool calls for independent sub-tasks is a meaningful optimization.

Timeout handling: ToolNode does not enforce timeouts by default. For tools that call external APIs, wrap the underlying call with concurrent.futures.TimeoutError handling or use httpx with explicit timeout= parameters. Unhandled slow tools will block the graph thread indefinitely.

Scenario	LLM Calls	Tool Calls	Approx. Latency
Direct answer (no tools)	1	0	~0.8s
Single tool call	2	1	~2.0s
Two tools, sequential	3	2	~3.5s
Two tools, parallel (one step)	2	2	~2.2s

📊 The Tool Calling Loop: Graph Diagram

The core pattern is a tight loop between an agent node and a tool executor, gated by a conditional router:

flowchart TD
    A([User Input]) --> B[Agent Node LLM + bind_tools]
    B --> C{tools_condition}
    C -- has tool_calls --> D[ToolNode Execute tools]
    C -- no tool_calls --> E([Final Response])
    D --> B
    style A fill:#e8f4f8,stroke:#2196f3
    style E fill:#e8f5e9,stroke:#4caf50
    style C fill:#fff3e0,stroke:#ff9800
    style D fill:#fce4ec,stroke:#e91e63

The agent loop: the LLM reasons → calls tools if needed → receives results → reasons again. The cycle exits when the model produces a direct answer.

Every iteration of the loop passes the full message history to the LLM, so the model always has context on what tools it called and what results they returned. This is how multi-step reasoning works: the agent sees its own prior actions and builds on them.

🌍 Real-World Applications: How Production Agents Use Tool Calling

Financial Research Assistants

A hedge fund assistant needs to answer: "Should we increase our NVIDIA position given today's macro environment?" A bare LLM would hallucinate a confident answer based on stale training data. With tools, the agent calls get_stock_price, get_earnings_data, and web_search in parallel, receives live data, then synthesizes a grounded analysis.

Input: "Evaluate NVIDIA given today's market."
Process: Parallel tool calls → stock price + recent news + analyst ratings fetched live
Output: Structured memo citing today's actual data, not training-cutoff prices.

Customer Support Bots with CRM Access

A support agent needs to check an order's shipping status. Without tools, it tells the customer to "check the website." With a get_order_status(order_id: str) tool bound to the CRM API, the agent can answer "Your order #4829 shipped yesterday and arrives Thursday" — pulling the live record in real time.

Scaling note: In production deployments, tool calls are typically logged as audit events. The tool_call_id in each ToolMessage gives you a correlation key to trace exactly which tool was called, with what arguments, and what it returned — essential for debugging and compliance.

Code Execution Agents

Agents equipped with a python_repl tool can run generated code and feed the result back to the LLM for interpretation. This pattern powers tools like Jupyter AI and OpenAI's Code Interpreter. The critical safety requirement is sandboxing: tools that execute arbitrary code must run in isolated containers with restricted syscalls.

⚖️ Trade-offs and Failure Modes: When Tool Calling Breaks

Hallucinated Tool Calls

LLMs occasionally invent tool call arguments that violate the schema, or call a tool with plausible-sounding but wrong input (e.g., passing "NVIDIA" instead of "NVDA" to a ticker lookup). Pydantic args_schema catches schema violations, but semantic errors still slip through. Mitigation: validate arguments inside the tool body and return structured error messages the model can recover from.

API Failures and Transient Errors

If get_stock_price raises an exception, the default ToolNode wraps it in an error message and feeds it back to the agent. This prevents graph crashes but risks infinite retry loops — the model may keep trying the same failing tool. Use handle_tool_errors=True (the ToolNode default) and implement retry limits at the tool level.

Infinite Tool Loops

An agent can get stuck in a loop: call tool → get result → call tool again in the next step indefinitely. Guard against this with a recursion_limit on the graph:

graph.invoke({"messages": [HumanMessage(content="research NVDA")]},
             config={"recursion_limit": 10})

Latency Creep at Scale

Every tool call adds a network round trip. A five-step agent chain with three tools per step can easily exceed 30 seconds of wall-clock time. Design agents to batch tool calls in a single step (parallel execution) wherever tools are independent, and use streaming (graph.stream()) to provide intermediate feedback to the user.

Failure Mode	Root Cause	Mitigation
Hallucinated args	LLM schema misinterpretation	Pydantic `args_schema` + tool-level validation
API error loop	Transient failures not caught	`try/except` inside tool; explicit error return string
Infinite tool loop	No exit condition	`recursion_limit` in graph config
Slow response	Sequential tool calls	Design prompts to encourage parallel calls
Stale data edge case	Tool cache not invalidated	TTL-based caching in tool body

🧭 Decision Guide: ToolNode vs Custom Tool Execution vs LangChain AgentExecutor

Situation	Recommendation
Use ToolNode when	You want the standard agent loop with automatic parallel execution, error handling, and message threading. It covers 90% of production use cases.
Use custom tool execution when	You need fine-grained control: custom retry policies per tool, dynamic tool selection at runtime, or tool output post-processing before it hits the LLM.
Use LangChain AgentExecutor when	You're prototyping or working with an existing LangChain-based codebase that predates LangGraph. AgentExecutor is simpler but less composable and harder to debug.
Avoid ToolNode when	Your tools are stateful and must execute in a strict sequence with intermediate graph decisions between each call. Use separate nodes per tool with explicit edges instead.
Edge case: tool-as-node pattern	For tools that trigger multi-step subgraphs (e.g., a "research" tool that itself spawns a RAG pipeline), implement them as a LangGraph subgraph node rather than a `@tool` function.

🧪 Practical Example: Market Research Agent with Parallel Tool Calls

This example demonstrates the full tool-calling loop: tool definition, LLM binding, ToolNode wiring, and parallel execution in a single runnable agent. The market research scenario was chosen because it requires live data from multiple independent sources simultaneously — exactly the case where LangGraph's parallel ToolNode execution pays off over a sequential loop. As you read through the code, watch for step 2 in the execution trace: the LLM emits two tool_calls in a single AIMessage, which ToolNode executes concurrently — that is the parallel tool calling pattern in action.

Here is the complete market research agent with three tools — web_search, get_stock_price, and summarize_findings — with parallel execution and graceful error handling.

from typing import Annotated
from typing_extensions import TypedDict
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition

# ── Tool definitions ────────────────────────────────────────────────
@tool
def web_search(query: str) -> str:
    """Search the web for recent news and analysis on a topic."""
    # Replace with Tavily or SerpAPI in production
    return f"[Search results for '{query}']: Market analysts expect strong Q1. Supply constraints easing."

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a publicly traded company by ticker symbol."""
    mock_prices = {"NVDA": "118.42", "AAPL": "213.07", "MSFT": "415.30"}
    price = mock_prices.get(ticker.upper(), "Price unavailable")
    return f"{ticker.upper()}: ${price}"

@tool
def summarize_findings(raw_data: str) -> str:
    """Condense multiple data points into a brief investment summary."""
    return f"Summary: Based on provided data — {raw_data[:80]}... — outlook is cautiously positive."

tools = [web_search, get_stock_price, summarize_findings]

# ── LLM with bound tools ─────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)

# ── State ─────────────────────────────────────────────────────────────
class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]

# ── Agent node ────────────────────────────────────────────────────────
def agent_node(state: ResearchState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# ── Graph assembly ────────────────────────────────────────────────────
tool_node = ToolNode(tools)

builder = StateGraph(ResearchState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", tools_condition)
builder.add_edge("tools", "agent")
graph = builder.compile()

# ── Run ───────────────────────────────────────────────────────────────
result = graph.invoke(
    {"messages": [HumanMessage(
        content="Research NVIDIA (NVDA): get the current price and recent news. Then summarize."
    )]},
    config={"recursion_limit": 10}
)

for msg in result["messages"]:
    print(f"[{msg.__class__.__name__}] {msg.content[:120]}")

What happens at runtime:

The agent calls the LLM with the user's question.
The LLM returns an AIMessage with two simultaneous tool_calls: get_stock_price(ticker="NVDA") and web_search(query="NVIDIA recent news").
ToolNode executes both calls in parallel and adds two ToolMessage results.
The agent calls the LLM again with the full updated state. The LLM now calls summarize_findings with the combined results.
ToolNode runs the summary tool.
The agent calls the LLM one final time. The model produces a direct answer with no more tool calls.
tools_condition routes to END.

The parallel execution in step 2 means the stock price and news search happen simultaneously, not one after the other — shaving roughly 1 second off the total latency.

🛠️ LangChain Tools Hub: Prebuilt Tools You Can Use Today

You don't need to build every tool from scratch. LangChain's langchain_community package ships dozens of production-ready tool integrations:

# Tavily web search (recommended for agents)
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults(max_results=3)

# Python REPL for code execution
from langchain_experimental.tools import PythonREPLTool
repl = PythonREPLTool()

# Wikipedia lookup
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

# DuckDuckGo search (no API key required)
from langchain_community.tools import DuckDuckGoSearchRun
ddg = DuckDuckGoSearchRun()

# Combine prebuilt and custom tools seamlessly
all_tools = [search, repl, wiki, get_stock_price]
llm_with_tools = ChatOpenAI(model="gpt-4o").bind_tools(all_tools)
tool_node = ToolNode(all_tools)

Prebuilt tools follow the same @tool-like interface — they expose .name, .description, and .args_schema — so they drop into a ToolNode without any adaptation. Mix them freely with your custom @tool functions.

For a full deep-dive on building LangChain applications with memory and chains, see LangChain Development Guide.

📚 Lessons Learned

1. Docstrings are the model's tool selection logic. A vague docstring like "Get price" will lead the model to call the wrong tool or skip it entirely. Write one-sentence-clear descriptions: what it does, what it takes, what it returns, and when not to use it.

2. Don't let exceptions crash your graph silently. The default ToolNode catches exceptions and returns an error string, but you should handle errors inside the tool body too. Return a structured error message (e.g., "ERROR: Ticker NFLXX not found") so the model can decide to retry with a corrected argument rather than hallucinating a response.

3. Parallel tool calls are the cheapest latency optimization. If you find your agent doing two tool calls across two separate turns where the calls are independent, that's a prompt design problem — not a LangGraph limitation. Instruct the model in the system prompt to batch independent lookups into a single step.

4. Don't use AgentExecutor for new projects. LangGraph's graph-based architecture gives you full observability (every state transition is a traceable checkpoint), controllable loops (recursion limit, human-in-the-loop), and composability (subgraphs, parallel branches). AgentExecutor is a black box by comparison.

5. Test tools independently before wiring them into the graph. Call your @tool-decorated function directly in a unit test to verify its output shape before the LLM ever sees it. A tool that returns a dict when the model expects a string is a silent failure source.

📌 TLDR: Summary and Key Takeaways

The capability gap is real: LLMs are frozen at training cutoff; tools give them runtime access to live systems.
@tool is your entry point: Decorate any Python function; the docstring becomes the model's tool description; type hints become the argument schema.
bind_tools() is LLM-agnostic: The same tool list works with OpenAI, Anthropic, Groq, and Ollama — swap the model class without touching your tools or graph.
ToolNode handles the heavy lifting: It dispatches tool calls, executes them in parallel when the LLM emits multiple calls in a single step, and threads results back into state as ToolMessage objects.
tools_condition creates the loop: Route to ToolNode when tool calls are present; route to END when the model is ready to respond directly.
Guard against failure modes: Set recursion_limit, handle exceptions inside tool bodies, and validate arguments with Pydantic args_schema.
The memorable rule: An LLM without tools is a knowledgeable advisor who has never left the library. With tools, it becomes an agent that picks up the phone.

LangChain Development Guide — The LangChain foundation: chains, memory, and prompts before you reach LangGraph
AI Agents Explained: When LLMs Start Using Tools — Conceptual grounding for why agents need tools and how the ReAct loop works
Multi-Step AI Agents: The Power of Planning — How agents decompose complex tasks and chain tool calls across multiple reasoning steps

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read