All Posts

LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools

Wire real capabilities into LangGraph agents: @tool decorator, ToolNode, bind_tools, parallel execution, and error handling.

Abstract AlgorithmsAbstract Algorithms
Β·Β·18 min read
Cover Image for LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Wire @tool, ToolNode, and bind_tools into LangGraph for agents that call APIs at runtime.

πŸ“– The Stale Knowledge Problem: Why LLMs Need Runtime Tools

Your agent confidently tells you the current stock price of NVIDIA. It's from its training data β€” six months out of date. The model doesn't know what it doesn't know. It has no concept of "today." Every fact it gives you was frozen at training cutoff, and it will recite that frozen snapshot with the same unwavering confidence it uses to state that water is wet.

This is the fundamental limitation of a bare LLM in production: it is a lookup table, not a live system. It can reason, summarize, and plan. What it cannot do on its own is fetch a live API response, run a shell command, query a database, or check whether a flight has been delayed. For agentic applications β€” systems that must act on real-world state β€” this is a hard blocker.

Tools are the solution. A tool is simply a Python function with a well-defined signature that the LLM can choose to invoke at runtime. The LLM doesn't execute the function itself; it emits a structured instruction ("call get_stock_price with ticker='NVDA'"), and your graph's execution layer runs the function and feeds the result back. The LLM then reasons over the live result.

LangGraph provides a clean, composable infrastructure for this entire loop: the @tool decorator to define tools, bind_tools() to register them with any LLM, ToolNode to execute them, and tools_condition to route the graph based on whether the model wants to call a tool or is ready to respond. Together, these four pieces turn a static LLM into an agent that interacts with the real world.


πŸ” Tool Fundamentals: @tool, Schemas, and the bind_tools() Pattern

Before wiring tools into a graph, you need to understand what a "tool" is from LangGraph's perspective: a Python callable decorated with @tool that carries enough metadata for an LLM to decide when and how to call it.

Defining a Tool with @tool

The @tool decorator from langchain_core.tools does three things. It wraps your function, uses the docstring as the tool description (what the model reads to decide whether to call it), and derives the JSON schema for the function's arguments from Python type hints.

from langchain_core.tools import tool

@tool
def get_stock_price(ticker: str) -> str:
    """
    Fetch the current stock price for a given ticker symbol.
    Returns a formatted string with the latest price.
    """
    # In production, call a real market data API here
    prices = {"NVDA": "118.42", "AAPL": "213.07", "TSLA": "172.30"}
    price = prices.get(ticker.upper(), "unknown")
    return f"{ticker.upper()} is currently trading at ${price}"

The docstring is not decoration β€” it's the model's only window into what this tool does. Write it like documentation for a smart but uninformed colleague. Mention what the tool accepts, what it returns, and when it should (or shouldn't) be called.

Custom Schemas with Pydantic for Complex Inputs

For tools that require structured or multi-field inputs, you can define the input schema explicitly using a Pydantic BaseModel. This gives you validation, default values, and richer descriptions per field.

from pydantic import BaseModel, Field
from langchain_core.tools import tool

class WebSearchInput(BaseModel):
    query: str = Field(description="The search query string")
    max_results: int = Field(default=5, description="Maximum number of results to return")

@tool("web_search", args_schema=WebSearchInput)
def web_search(query: str, max_results: int = 5) -> str:
    """Search the web for current information on a topic."""
    # In production: call Tavily, SerpAPI, or Brave Search here
    return f"Top {max_results} results for '{query}': [result1, result2, ...]"

LLM-Agnostic Binding with bind_tools()

Once your tools are defined, you attach them to any LangChain-compatible LLM using bind_tools(). This method injects the tool schemas into the model's system context so it knows what tools exist and can emit tool_calls in its response when appropriate.

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

tools = [get_stock_price, web_search]

# OpenAI
llm_openai = ChatOpenAI(model="gpt-4o").bind_tools(tools)

# Anthropic β€” same API, different model
llm_anthropic = ChatAnthropic(model="claude-3-5-sonnet-20241022").bind_tools(tools)

# Groq (fast inference)
from langchain_groq import ChatGroq
llm_groq = ChatGroq(model="llama-3.3-70b-versatile").bind_tools(tools)

The key insight is that bind_tools() is LLM-agnostic β€” you swap the model class without changing the tool definitions or the graph structure. This is one of LangGraph's most important architectural properties: tool-augmented agents are portable across providers.

LLM ProviderClassNotes
OpenAIChatOpenAIGPT-4o natively excellent at tool selection
AnthropicChatAnthropicClaude 3.5 Sonnet reliable for structured calls
GroqChatGroqLow-latency; use with Llama/Mistral
Ollama (local)ChatOllamaWorks with Llama3-based models

βš™οΈ Wiring Tools into LangGraph: ToolNode, Conditional Routing, and the Tool Loop

With tools defined and bound to the LLM, the next step is wiring them into the graph. This requires three components: a state, an agent node, a ToolNode, and conditional routing between them.

State and the Agent Node

The graph state holds the conversation as a list of messages. The agent node calls the LLM with the current state and appends the model's response.

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

def agent_node(state: AgentState):
    """Call the LLM with the current message history."""
    response = llm_openai.invoke(state["messages"])
    return {"messages": [response]}

ToolNode: The Prebuilt Executor

ToolNode is a prebuilt LangGraph node that reads the tool_calls field from the last AIMessage in the state, executes each referenced tool, and appends ToolMessage results back to the state. You don't need to write dispatch logic yourself.

tool_node = ToolNode(tools)  # Pass the same list you used in bind_tools()

Conditional Routing with tools_condition

tools_condition is a prebuilt routing function that inspects the last message in state. If it contains tool_calls, it returns "tools" to route to ToolNode. If there are no tool calls, it returns END, signalling the agent is done.

graph_builder = StateGraph(AgentState)

graph_builder.add_node("agent", agent_node)
graph_builder.add_node("tools", tool_node)

graph_builder.set_entry_point("agent")

graph_builder.add_conditional_edges(
    "agent",
    tools_condition,          # Routes to "tools" or END
)
graph_builder.add_edge("tools", "agent")  # After tools run, go back to agent

graph = graph_builder.compile()

The add_edge("tools", "agent") line is what creates the loop: after tools execute and results are added to state, control returns to the agent node for the next reasoning step. The loop exits when the model chooses to respond directly without calling any tools.


🧠 Deep Dive: How LangGraph Executes Tool Calls

The Internals

When the LLM with bound tools generates a response that includes tool calls, it returns an AIMessage with a tool_calls field. Each entry in this list is a dict with three keys: id (a unique call identifier), name (the tool function name), and args (a dict of parsed arguments).

# Example AIMessage.tool_calls structure:
[
    {
        "id": "call_abc123",
        "name": "get_stock_price",
        "args": {"ticker": "NVDA"}
    },
    {
        "id": "call_def456",
        "name": "web_search",
        "args": {"query": "NVIDIA Q4 2025 earnings", "max_results": 3}
    }
]

ToolNode iterates this list and dispatches each call to the matching function by name. It uses the tool registry it was initialized with β€” the same tools list passed to ToolNode(tools). After execution, it wraps each result in a ToolMessage that carries the matching tool_call_id so the LLM can correlate each result to the call that produced it.

Parallel execution: When a single AIMessage contains multiple tool_calls, ToolNode executes them concurrently using Python's ThreadPoolExecutor. This means if your agent asks for get_stock_price("NVDA") and web_search("NVIDIA news") in the same turn, both calls happen simultaneously. The state receives all ToolMessage results before the agent node is invoked again.

State threading: LangGraph's add_messages reducer deduplicates messages by id. Because each ToolMessage carries the original tool_call_id, the message list remains coherent even with multiple parallel results arriving at once.

Performance Analysis

Latency composition: A single tool-calling round trip adds at least three network hops β€” the initial LLM call (to decide to use a tool), the tool execution (calling the external API), and a second LLM call (to interpret the result). For a typical setup with OpenAI and an external search API, this means 1–3 seconds per round trip under normal conditions.

Parallel vs. sequential: If the agent calls two tools in a single step (parallel), total latency is max(tool_A_latency, tool_B_latency) plus two LLM calls. If the tools were called in two sequential turns, the latency would be tool_A + tool_B plus three LLM calls. Designing prompts that encourage parallel tool calls for independent sub-tasks is a meaningful optimization.

Timeout handling: ToolNode does not enforce timeouts by default. For tools that call external APIs, wrap the underlying call with concurrent.futures.TimeoutError handling or use httpx with explicit timeout= parameters. Unhandled slow tools will block the graph thread indefinitely.

ScenarioLLM CallsTool CallsApprox. Latency
Direct answer (no tools)10~0.8s
Single tool call21~2.0s
Two tools, sequential32~3.5s
Two tools, parallel (one step)22~2.2s

πŸ“Š The Tool Calling Loop: Graph Diagram

The core pattern is a tight loop between an agent node and a tool executor, gated by a conditional router:

flowchart TD
    A([User Input]) --> B[Agent Node\nLLM + bind_tools]
    B --> C{tools_condition}
    C -- has tool_calls --> D[ToolNode\nExecute tools]
    C -- no tool_calls --> E([Final Response])
    D --> B
    style A fill:#e8f4f8,stroke:#2196f3
    style E fill:#e8f5e9,stroke:#4caf50
    style C fill:#fff3e0,stroke:#ff9800
    style D fill:#fce4ec,stroke:#e91e63

The agent loop: the LLM reasons β†’ calls tools if needed β†’ receives results β†’ reasons again. The cycle exits when the model produces a direct answer.

Every iteration of the loop passes the full message history to the LLM, so the model always has context on what tools it called and what results they returned. This is how multi-step reasoning works: the agent sees its own prior actions and builds on them.


🌍 Real-World Applications: How Production Agents Use Tool Calling

Financial Research Assistants

A hedge fund assistant needs to answer: "Should we increase our NVIDIA position given today's macro environment?" A bare LLM would hallucinate a confident answer based on stale training data. With tools, the agent calls get_stock_price, get_earnings_data, and web_search in parallel, receives live data, then synthesizes a grounded analysis.

Input: "Evaluate NVIDIA given today's market."
Process: Parallel tool calls β†’ stock price + recent news + analyst ratings fetched live
Output: Structured memo citing today's actual data, not training-cutoff prices.

Customer Support Bots with CRM Access

A support agent needs to check an order's shipping status. Without tools, it tells the customer to "check the website." With a get_order_status(order_id: str) tool bound to the CRM API, the agent can answer "Your order #4829 shipped yesterday and arrives Thursday" β€” pulling the live record in real time.

Scaling note: In production deployments, tool calls are typically logged as audit events. The tool_call_id in each ToolMessage gives you a correlation key to trace exactly which tool was called, with what arguments, and what it returned β€” essential for debugging and compliance.

Code Execution Agents

Agents equipped with a python_repl tool can run generated code and feed the result back to the LLM for interpretation. This pattern powers tools like Jupyter AI and OpenAI's Code Interpreter. The critical safety requirement is sandboxing: tools that execute arbitrary code must run in isolated containers with restricted syscalls.


βš–οΈ Trade-offs and Failure Modes: When Tool Calling Breaks

Hallucinated Tool Calls

LLMs occasionally invent tool call arguments that violate the schema, or call a tool with plausible-sounding but wrong input (e.g., passing "NVIDIA" instead of "NVDA" to a ticker lookup). Pydantic args_schema catches schema violations, but semantic errors still slip through. Mitigation: validate arguments inside the tool body and return structured error messages the model can recover from.

API Failures and Transient Errors

If get_stock_price raises an exception, the default ToolNode wraps it in an error message and feeds it back to the agent. This prevents graph crashes but risks infinite retry loops β€” the model may keep trying the same failing tool. Use handle_tool_errors=True (the ToolNode default) and implement retry limits at the tool level.

Infinite Tool Loops

An agent can get stuck in a loop: call tool β†’ get result β†’ call tool again in the next step indefinitely. Guard against this with a recursion_limit on the graph:

graph.invoke({"messages": [HumanMessage(content="research NVDA")]},
             config={"recursion_limit": 10})

Latency Creep at Scale

Every tool call adds a network round trip. A five-step agent chain with three tools per step can easily exceed 30 seconds of wall-clock time. Design agents to batch tool calls in a single step (parallel execution) wherever tools are independent, and use streaming (graph.stream()) to provide intermediate feedback to the user.

Failure ModeRoot CauseMitigation
Hallucinated argsLLM schema misinterpretationPydantic args_schema + tool-level validation
API error loopTransient failures not caughttry/except inside tool; explicit error return string
Infinite tool loopNo exit conditionrecursion_limit in graph config
Slow responseSequential tool callsDesign prompts to encourage parallel calls
Stale data edge caseTool cache not invalidatedTTL-based caching in tool body

🧭 Decision Guide: ToolNode vs Custom Tool Execution vs LangChain AgentExecutor

SituationRecommendation
Use ToolNode whenYou want the standard agent loop with automatic parallel execution, error handling, and message threading. It covers 90% of production use cases.
Use custom tool execution whenYou need fine-grained control: custom retry policies per tool, dynamic tool selection at runtime, or tool output post-processing before it hits the LLM.
Use LangChain AgentExecutor whenYou're prototyping or working with an existing LangChain-based codebase that predates LangGraph. AgentExecutor is simpler but less composable and harder to debug.
Avoid ToolNode whenYour tools are stateful and must execute in a strict sequence with intermediate graph decisions between each call. Use separate nodes per tool with explicit edges instead.
Edge case: tool-as-node patternFor tools that trigger multi-step subgraphs (e.g., a "research" tool that itself spawns a RAG pipeline), implement them as a LangGraph subgraph node rather than a @tool function.

πŸ§ͺ Practical Example: Market Research Agent with Parallel Tool Calls

This example demonstrates the full tool-calling loop: tool definition, LLM binding, ToolNode wiring, and parallel execution in a single runnable agent. The market research scenario was chosen because it requires live data from multiple independent sources simultaneously β€” exactly the case where LangGraph's parallel ToolNode execution pays off over a sequential loop. As you read through the code, watch for step 2 in the execution trace: the LLM emits two tool_calls in a single AIMessage, which ToolNode executes concurrently β€” that is the parallel tool calling pattern in action.

Here is the complete market research agent with three tools β€” web_search, get_stock_price, and summarize_findings β€” with parallel execution and graceful error handling.

from typing import Annotated
from typing_extensions import TypedDict
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition

# ── Tool definitions ────────────────────────────────────────────────
@tool
def web_search(query: str) -> str:
    """Search the web for recent news and analysis on a topic."""
    # Replace with Tavily or SerpAPI in production
    return f"[Search results for '{query}']: Market analysts expect strong Q1. Supply constraints easing."

@tool
def get_stock_price(ticker: str) -> str:
    """Get the current stock price for a publicly traded company by ticker symbol."""
    mock_prices = {"NVDA": "118.42", "AAPL": "213.07", "MSFT": "415.30"}
    price = mock_prices.get(ticker.upper(), "Price unavailable")
    return f"{ticker.upper()}: ${price}"

@tool
def summarize_findings(raw_data: str) -> str:
    """Condense multiple data points into a brief investment summary."""
    return f"Summary: Based on provided data β€” {raw_data[:80]}... β€” outlook is cautiously positive."

tools = [web_search, get_stock_price, summarize_findings]

# ── LLM with bound tools ─────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)

# ── State ─────────────────────────────────────────────────────────────
class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]

# ── Agent node ────────────────────────────────────────────────────────
def agent_node(state: ResearchState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# ── Graph assembly ────────────────────────────────────────────────────
tool_node = ToolNode(tools)

builder = StateGraph(ResearchState)
builder.add_node("agent", agent_node)
builder.add_node("tools", tool_node)
builder.set_entry_point("agent")
builder.add_conditional_edges("agent", tools_condition)
builder.add_edge("tools", "agent")
graph = builder.compile()

# ── Run ───────────────────────────────────────────────────────────────
result = graph.invoke(
    {"messages": [HumanMessage(
        content="Research NVIDIA (NVDA): get the current price and recent news. Then summarize."
    )]},
    config={"recursion_limit": 10}
)

for msg in result["messages"]:
    print(f"[{msg.__class__.__name__}] {msg.content[:120]}")

What happens at runtime:

  1. The agent calls the LLM with the user's question.
  2. The LLM returns an AIMessage with two simultaneous tool_calls: get_stock_price(ticker="NVDA") and web_search(query="NVIDIA recent news").
  3. ToolNode executes both calls in parallel and adds two ToolMessage results.
  4. The agent calls the LLM again with the full updated state. The LLM now calls summarize_findings with the combined results.
  5. ToolNode runs the summary tool.
  6. The agent calls the LLM one final time. The model produces a direct answer with no more tool calls.
  7. tools_condition routes to END.

The parallel execution in step 2 means the stock price and news search happen simultaneously, not one after the other β€” shaving roughly 1 second off the total latency.


πŸ› οΈ LangChain Tools Hub: Prebuilt Tools You Can Use Today

You don't need to build every tool from scratch. LangChain's langchain_community package ships dozens of production-ready tool integrations:

# Tavily web search (recommended for agents)
from langchain_community.tools.tavily_search import TavilySearchResults
search = TavilySearchResults(max_results=3)

# Python REPL for code execution
from langchain_experimental.tools import PythonREPLTool
repl = PythonREPLTool()

# Wikipedia lookup
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
wiki = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

# DuckDuckGo search (no API key required)
from langchain_community.tools import DuckDuckGoSearchRun
ddg = DuckDuckGoSearchRun()

# Combine prebuilt and custom tools seamlessly
all_tools = [search, repl, wiki, get_stock_price]
llm_with_tools = ChatOpenAI(model="gpt-4o").bind_tools(all_tools)
tool_node = ToolNode(all_tools)

Prebuilt tools follow the same @tool-like interface β€” they expose .name, .description, and .args_schema β€” so they drop into a ToolNode without any adaptation. Mix them freely with your custom @tool functions.

For a full deep-dive on building LangChain applications with memory and chains, see LangChain Development Guide.


πŸ“š Lessons Learned

1. Docstrings are the model's tool selection logic. A vague docstring like "Get price" will lead the model to call the wrong tool or skip it entirely. Write one-sentence-clear descriptions: what it does, what it takes, what it returns, and when not to use it.

2. Don't let exceptions crash your graph silently. The default ToolNode catches exceptions and returns an error string, but you should handle errors inside the tool body too. Return a structured error message (e.g., "ERROR: Ticker NFLXX not found") so the model can decide to retry with a corrected argument rather than hallucinating a response.

3. Parallel tool calls are the cheapest latency optimization. If you find your agent doing two tool calls across two separate turns where the calls are independent, that's a prompt design problem β€” not a LangGraph limitation. Instruct the model in the system prompt to batch independent lookups into a single step.

4. Don't use AgentExecutor for new projects. LangGraph's graph-based architecture gives you full observability (every state transition is a traceable checkpoint), controllable loops (recursion limit, human-in-the-loop), and composability (subgraphs, parallel branches). AgentExecutor is a black box by comparison.

5. Test tools independently before wiring them into the graph. Call your @tool-decorated function directly in a unit test to verify its output shape before the LLM ever sees it. A tool that returns a dict when the model expects a string is a silent failure source.


πŸ“Œ TLDR: Summary and Key Takeaways

  • The capability gap is real: LLMs are frozen at training cutoff; tools give them runtime access to live systems.
  • @tool is your entry point: Decorate any Python function; the docstring becomes the model's tool description; type hints become the argument schema.
  • bind_tools() is LLM-agnostic: The same tool list works with OpenAI, Anthropic, Groq, and Ollama β€” swap the model class without touching your tools or graph.
  • ToolNode handles the heavy lifting: It dispatches tool calls, executes them in parallel when the LLM emits multiple calls in a single step, and threads results back into state as ToolMessage objects.
  • tools_condition creates the loop: Route to ToolNode when tool calls are present; route to END when the model is ready to respond directly.
  • Guard against failure modes: Set recursion_limit, handle exceptions inside tool bodies, and validate arguments with Pydantic args_schema.
  • The memorable rule: An LLM without tools is a knowledgeable advisor who has never left the library. With tools, it becomes an agent that picks up the phone.

πŸ“ Practice Quiz

  1. What does the @tool decorator use to generate the tool's JSON argument schema?

    • A) The function's return type annotation
    • B) The function's parameter type hints
    • C) A separate schema= keyword argument
    • D) The tool's docstring Correct Answer: B
  2. Your LangGraph agent calls get_stock_price and web_search in the same AIMessage. How does ToolNode execute them by default?

    • A) Sequentially, in the order they appear in tool_calls
    • B) Randomly, depending on the Python GIL scheduler
    • C) In parallel, using a thread pool
    • D) Only the first call runs; the second is queued for the next turn Correct Answer: C
  3. You set recursion_limit=10 in the graph config, but your agent keeps calling a failing tool. After how many total node invocations will LangGraph stop the graph?

    • A) After 10 tool calls specifically
    • B) After 10 total node invocations across the entire graph
    • C) After 10 round trips between the agent node and ToolNode
    • D) Never β€” recursion_limit only affects subgraphs Correct Answer: B
  4. (Open-ended) You're building an agent that fetches data from five different APIs, but only two of those APIs are relevant for any given user query. How would you design the tool definitions, the system prompt, and the graph routing so the agent consistently picks the right two tools without calling the unnecessary three? Consider the trade-offs between tool description clarity, schema constraints, and graph-level filtering.


Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms