LangChain Tools and Agents: The Classic Agent Loop

Build tool-using agents with LangChain: @tool decorator, AgentExecutor, ReAct reasoning, and when to reach for LangGraph instead.

Agentic AI: LangChain and LangGraph

Abstract Algorithms

·Mar 28, 2026·20 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 20 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

🎯 Quick TLDR: The Classic Agent Loop

TLDR: LangChain's @tool decorator plus AgentExecutor give you a working tool-calling agent in about 30 lines of Python. The ReAct loop — Thought → Action → Observation — drives every reasoning step. For simple linear tasks this works well; when you need branching, cycles, or a human approval step, graduate to LangGraph.

📖 The Gap Between "Saying" and "Doing": Why LLMs Need Tools

Consider this prompt to a raw language model:

"What's the current weather in Tokyo, and how does today's high compare to yesterday's?"

The model answers confidently. It might even produce a plausible-sounding temperature. But that answer is fabricated — the model's weights were frozen at training cutoff and it has no mechanism to observe the world at inference time. It cannot make an HTTP request. It cannot read a database. It cannot execute a Python expression. It can only do one thing: predict the next token.

This is the fundamental constraint that every production agent builder runs into: a language model is a stateless, read-only function over text. You feed it tokens; it emits tokens. Nothing more. When a developer asks "can the model check the current weather?", the technically precise answer is no — but an agent built around the model can.

The insight that unlocks agentic AI is separating the LLM's role (reasoning about what to do) from the execution layer's role (actually doing it). The LLM decides to look up the weather; a Python function does the HTTP call; the result comes back as text; the LLM reads it and continues reasoning. That Python function is a tool.

LangChain was one of the first frameworks to codify this pattern. Its @tool decorator, combined with AgentExecutor and the ReAct prompting strategy, gave developers a working loop in a weekend. Understanding how this loop works — and where its seams are — is the prerequisite for every more sophisticated agent pattern that follows.

🔍 What a Tool Actually Is: A Python Function the LLM Can Request

A tool, in LangChain's model, is nothing exotic. It is a regular Python function that:

Has a name the LLM uses to refer to it.
Has a description (from the docstring) that the LLM reads to decide when to call it.
Has a typed signature that defines what arguments the LLM must provide.
Returns a string (or something coercible to a string) that feeds back into the conversation.

The LLM never executes the function itself. It emits a structured request — a tool call — and the agent framework executes the function, captures the result, and injects it back as an observation in the conversation.

Think of the relationship like a manager and a specialist. The manager (LLM) reads a brief (the docstring), decides whether to bring in the specialist (the tool), and issues a specific request ("run a web search for X"). The specialist returns their findings. The manager reads the findings and decides the next move. The manager never leaves the conference room; the specialist never makes strategy decisions.

This division of labor is what makes tools so powerful: the LLM contributes reasoning and language understanding; the tool contributes real-world access. Neither can replace the other.

⚙️ Turning Python Functions into LLM Tools with the @tool Decorator

The Decorator Approach

The @tool decorator from langchain_core.tools is the simplest way to define a tool. Decorate a function, write a meaningful docstring, and annotate the parameters — that's it.

# NOTE: External API calls below are mocked for local development.
# Replace the mock return values with real API calls in production.

from langchain_core.tools import tool

@tool
def get_current_weather(city: str) -> str:
    """
    Fetch the current weather for a given city.
    Returns a plain-text description including temperature and conditions.
    Use this when the user asks about current or today's weather anywhere.
    """
    # Mock: in production, call OpenWeatherMap or similar
    mock_data = {
        "tokyo": "Tokyo: 18°C, partly cloudy, humidity 65%",
        "london": "London: 12°C, overcast, humidity 80%",
        "new york": "New York: 22°C, sunny, humidity 45%",
    }
    return mock_data.get(city.lower(), f"Weather data unavailable for {city}")

@tool
def get_yesterday_weather(city: str) -> str:
    """
    Fetch yesterday's high temperature for a given city.
    Returns a plain-text description of yesterday's weather summary.
    Use this when the user wants to compare today's weather to yesterday's.
    """
    # Mock: in production, query a historical weather API
    mock_data = {
        "tokyo": "Tokyo yesterday: high of 15°C, light rain",
        "london": "London yesterday: high of 10°C, cloudy",
        "new york": "New York yesterday: high of 20°C, partly cloudy",
    }
    return mock_data.get(city.lower(), f"Historical weather unavailable for {city}")

The docstring is not documentation for human readers — it is the tool's entire identity to the LLM. The model reads it to decide whether to call the tool at all. A vague docstring like "Gets weather" leaves the model unsure when to use it. A precise docstring tells the model exactly what the tool covers and when to reach for it.

Enforcing Input Structure with Pydantic

For tools that accept multiple fields or need validation, define the input schema explicitly using a Pydantic BaseModel. The LLM uses the field descriptions to construct valid arguments.

from pydantic import BaseModel, Field
from langchain_core.tools import tool

class StockLookupInput(BaseModel):
    ticker: str = Field(description="Stock ticker symbol, e.g. AAPL, TSLA, NVDA")
    currency: str = Field(default="USD", description="Currency for the price output (USD or EUR)")

@tool("get_stock_price", args_schema=StockLookupInput)
def get_stock_price(ticker: str, currency: str = "USD") -> str:
    """
    Return the current stock price for a given ticker symbol.
    Use this when the user asks about stock prices or market data.
    Do NOT use this for cryptocurrency prices.
    """
    # Mock: in production, call Yahoo Finance, Alpha Vantage, or Polygon.io
    mock_prices = {"AAPL": 213.07, "TSLA": 172.30, "NVDA": 118.42, "MSFT": 415.20}
    price = mock_prices.get(ticker.upper())
    if price is None:
        return f"No price data found for {ticker}"
    if currency == "EUR":
        price = round(price * 0.92, 2)
    return f"{ticker.upper()} is trading at {currency} {price:.2f}"

Binding Tools to Any LLM

With tools defined, you attach them to the LLM using bind_tools(). This injects tool schemas into the model's context so it knows what capabilities are available.

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

tools = [get_current_weather, get_yesterday_weather, get_stock_price]

# OpenAI
llm_openai = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)

# Anthropic — same pattern, different provider
llm_anthropic = ChatAnthropic(model="claude-3-5-sonnet-20241022").bind_tools(tools)

The bind_tools() call is LLM-agnostic: swap the class without changing tool definitions or agent logic. This is a key design property — your tools are portable across providers.

LLM Provider	Class	Notes
OpenAI	`ChatOpenAI`	GPT-4o: reliable tool selection, supports parallel calls
Anthropic	`ChatAnthropic`	Claude 3.5 Sonnet: strong at multi-step reasoning
Groq	`ChatGroq`	Fast inference; Llama 3 works well for tool calling
Ollama (local)	`ChatOllama`	Llama 3-based models with function-calling support

🧠 Deep Dive: How AgentExecutor Orchestrates the ReAct Loop

The Internals: AgentExecutor as a Configurable Runnable Loop

AgentExecutor is LangChain's agent runner. Internally, it is a while-loop wrapped around a runnable chain: on each iteration, it calls the LLM, inspects the output, decides whether to invoke a tool, runs the tool if needed, appends the observation to the message history, and loops back. It exits when the LLM produces a final answer (no more tool calls) or when a configured maximum step count is reached.

The loop follows the ReAct (Reasoning + Acting) pattern, published by Yao et al. (2022). Each iteration consists of three phases:

Thought — the LLM reasons about what to do next (emitted as reasoning text or tool call intent).
Action — the agent framework executes the selected tool with the LLM's arguments.
Observation — the tool's return value is injected back into the conversation as a ToolMessage.

The LLM never sees raw Python output. It sees a text observation and must then decide: is this enough to answer the question, or do I need another tool call? This continues until the LLM either returns a final answer or the executor hits max_iterations.

Here is the minimal setup to wire AgentExecutor with a ReAct prompt:

from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_openai import ChatOpenAI

# Pull the standard ReAct prompt from LangChain Hub
# It instructs the LLM to emit Thought/Action/Observation cycles
react_prompt = hub.pull("hwchase17/react")

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [get_current_weather, get_yesterday_weather, get_stock_price]

agent = create_react_agent(llm=llm, tools=tools, prompt=react_prompt)

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,          # print the full reasoning trace
    max_iterations=6,      # hard stop to prevent runaway loops
    handle_parsing_errors=True,  # recover from malformed tool calls
)

result = executor.invoke({"input": "What's the current weather in Tokyo and how does it compare to yesterday?"})
print(result["output"])

When you run this with verbose=True, you see every reasoning step printed to stdout:

> Entering new AgentExecutor chain...
Thought: I need to fetch the current weather in Tokyo and yesterday's weather to compare them.
Action: get_current_weather
Action Input: {"city": "tokyo"}
Observation: Tokyo: 18°C, partly cloudy, humidity 65%

Thought: Now I need yesterday's weather to compare.
Action: get_yesterday_weather
Action Input: {"city": "tokyo"}
Observation: Tokyo yesterday: high of 15°C, light rain

Thought: I have both data points. Today is 18°C and partly cloudy; yesterday peaked at 15°C with rain.
Final Answer: Tokyo is currently 18°C and partly cloudy — 3 degrees warmer than yesterday's high of 15°C, and the rain has cleared.

Each Thought → Action → Observation triplet is one pass through the ReAct loop. The LLM controls this entirely through its text output; AgentExecutor is just the harness that parses those outputs and dispatches tool calls.

Performance Analysis: Token Budget, Loop Depth, and Latency

Every iteration through the ReAct loop costs tokens and time. Since the full message history is appended on each pass, the context window grows linearly with the number of steps: if each tool call adds ~150 tokens of observation, a 5-step agent uses roughly 750 tokens of observations alone — plus the full prompt, reasoning, and history.

Metric	Impact	Mitigation
Loop depth	Each step adds observation tokens to context	Set `max_iterations` (typically 5–8 for simple agents)
Tool latency	Synchronous tool calls block the loop	Use fast APIs; avoid slow DB queries in inner tools
LLM calls per query	N tool calls = N+1 LLM calls	Design tools to return dense, pre-formatted summaries
Reasoning verbosity	ReAct prompts generate long Thought traces	Use structured tool-calling mode (no free-form Thought text)

The practical ceiling for AgentExecutor in production is around 5–8 tool calls per request. Beyond that, token costs escalate, latency compounds, and the probability of the LLM losing its reasoning thread increases. If your use case routinely needs more steps, that is a signal to move to a graph-based architecture with bounded subgraphs.

📊 Visualizing the ReAct Loop: Thought → Action → Observation → Answer

The flowchart below shows the complete execution path for a single agent invocation:

flowchart TD
    A([User Query]) --> B[Agent: LLM + Prompt]
    B --> C{Tool call in response?}
    C -- Yes --> D[AgentExecutor: Dispatch Tool]
    D --> E[Tool Function Executes]
    E --> F[Observation appended to history]
    F --> G{Max iterations reached?}
    G -- No --> B
    G -- Yes --> H([Force-stop: return partial answer])
    C -- No --> I([Final Answer returned to user])

Each loop back from Observation to Agent is one complete ReAct cycle. The LLM decides at every step whether to call another tool or produce a final answer.

The sequence below zooms into the message-passing protocol between components during one tool call cycle:

sequenceDiagram
    participant U as User
    participant AE as AgentExecutor
    participant LLM as LLM (gpt-4o)
    participant T as Tool Function

    U->>AE: invoke({"input": "Weather in Tokyo vs yesterday?"})
    AE->>LLM: [SystemPrompt + UserMessage]
    LLM-->>AE: AIMessage(tool_calls=[get_current_weather])
    AE->>T: get_current_weather(city="tokyo")
    T-->>AE: "Tokyo: 18C, partly cloudy"
    AE->>LLM: [History + ToolMessage(observation)]
    LLM-->>AE: AIMessage(tool_calls=[get_yesterday_weather])
    AE->>T: get_yesterday_weather(city="tokyo")
    T-->>AE: "Tokyo yesterday: high of 15C, light rain"
    AE->>LLM: [History + ToolMessage(observation)]
    LLM-->>AE: AIMessage(content="Tokyo is 3C warmer today...")
    AE-->>U: {"output": "Tokyo is 3C warmer today..."}

Two tool calls, three LLM calls, one final answer. Every message is appended to the conversation before the next LLM call — this is what "context grows with each step" means in practice.

🌍 Real-World Applications of Built-in LangChain Tools

LangChain ships several ready-to-use tools that cover the most common "reach the internet" use cases. Knowing when to use each one saves you from writing boilerplate integrations.

DuckDuckGo Search is the no-API-key option for web search. It returns a handful of snippets and URLs. Use it when the agent needs current news, general knowledge lookups, or quick factual checks.

from langchain_community.tools import DuckDuckGoSearchRun

search = DuckDuckGoSearchRun()
print(search.run("LangChain AgentExecutor latest release"))

Wikipedia returns the summary section of a Wikipedia article. Use it for well-established topics where a summary paragraph is sufficient — biographies, historical events, scientific concepts.

from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper(top_k_results=1, doc_content_chars_max=1000))
print(wikipedia.run("ReAct prompting technique"))

Python REPL lets the LLM execute arbitrary Python code. This unlocks calculations, data transformations, and anything that requires real computation. Use it carefully — a production Python REPL tool must run in a sandboxed subprocess.

from langchain_experimental.tools import PythonREPLTool

python_repl = PythonREPLTool()
# The LLM can now ask: "calculate compound interest at 7% over 10 years on $10,000"
# and the tool will run Python to produce the exact answer

Tool	Best for	API key required?	Production-safe?
`DuckDuckGoSearchRun`	Web search, current events	No	Yes (rate limits apply)
`WikipediaQueryRun`	Factual knowledge, summaries	No	Yes
`PythonREPLTool`	Calculations, data transforms	No	Only with sandboxing
Custom `@tool`	Domain-specific APIs, DBs	Depends	Yes, by design

🧪 Worked Example: A Research Assistant with Web Search, Calculator, and Stock Price

This example wires three tools into a single AgentExecutor — a DuckDuckGo search, a custom calculator, and a custom stock price lookup. The goal is a "Research Assistant" agent that can answer compound questions.

# NOTE: Install dependencies:
# pip install langchain langchain-openai langchain-community duckduckgo-search

from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_openai import ChatOpenAI
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent

# --- Tool 1: Web search (built-in, no API key) ---
web_search = DuckDuckGoSearchRun(name="web_search")

# --- Tool 2: Calculator ---
@tool
def calculator(expression: str) -> str:
    """
    Evaluate a mathematical expression and return the numeric result.
    Input must be a valid Python arithmetic expression, e.g. "1500 * 1.07 ** 10".
    Use this for any calculation that requires precise arithmetic.
    """
    try:
        result = eval(expression, {"__builtins__": {}}, {})  # restricted eval
        return f"Result: {result}"
    except Exception as exc:
        return f"Error evaluating expression: {exc}"

# --- Tool 3: Stock price lookup ---
@tool
def get_stock_price(ticker: str) -> str:
    """
    Return the current stock price for a given ticker symbol (e.g. AAPL, MSFT, NVDA).
    Use this when the user asks about stock prices or market valuations.
    NOTE: Returns mock data — replace with a real market data API in production.
    """
    mock_prices = {
        "AAPL": 213.07, "TSLA": 172.30, "NVDA": 118.42,
        "MSFT": 415.20, "GOOGL": 175.50, "AMZN": 198.90,
    }
    price = mock_prices.get(ticker.upper())
    if price is None:
        return f"No price data found for ticker: {ticker}"
    return f"{ticker.upper()} current price: ${price:.2f} USD"

# --- Assemble the agent ---
tools = [web_search, calculator, get_stock_price]
llm = ChatOpenAI(model="gpt-4o", temperature=0)
react_prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm=llm, tools=tools, prompt=react_prompt)
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=8,
    handle_parsing_errors=True,
)

# --- Run a compound research query ---
response = executor.invoke({
    "input": (
        "What is NVIDIA's current stock price? "
        "If I invested $5,000 at that price, how many shares would I own? "
        "Also search for the latest news about NVIDIA's AI chips."
    )
})
print("\n=== FINAL ANSWER ===")
print(response["output"])

Execution trace (abbreviated):

> Entering new AgentExecutor chain...
Thought: I need three pieces of information: NVIDIA stock price, share count calculation, and news.
Action: get_stock_price
Action Input: {"ticker": "NVDA"}
Observation: NVDA current price: $118.42 USD

Thought: Now I'll calculate shares.
Action: calculator
Action Input: {"expression": "5000 / 118.42"}
Observation: Result: 42.22...

Thought: Now I'll search for NVIDIA AI chip news.
Action: web_search
Action Input: NVIDIA AI chips latest news 2026
Observation: [DuckDuckGo snippets about Blackwell GPU architecture, data center demand...]

Thought: I have all three pieces of data. I can compose the final answer.
Final Answer: NVIDIA is currently trading at $118.42. With $5,000 you would own approximately
42.2 shares. Recent news highlights strong demand for Blackwell GPUs in AI data centers...

The agent correctly sequences three distinct tool types — a mock API, a calculation, and a live web search — and synthesizes them into a coherent answer. No user code controlled the sequencing; the LLM decided the order based on its reasoning.

Switching LLM Providers

The same agent works identically with Anthropic Claude:

from langchain_anthropic import ChatAnthropic

llm_anthropic = ChatAnthropic(model="claude-3-5-sonnet-20241022")
agent_anthropic = create_react_agent(llm=llm_anthropic, tools=tools, prompt=react_prompt)
executor_anthropic = AgentExecutor(agent=agent_anthropic, tools=tools, verbose=True)

result = executor_anthropic.invoke({"input": "What is APPLE stock price right now?"})

Swap ChatOpenAI for ChatAnthropic, ChatGroq, or ChatOllama — the tool definitions, agent logic, and executor configuration are unchanged.

🛠️ AgentExecutor Under the Hood: Runnable Sequences and the Tool-Calling vs. ReAct Distinction

AgentExecutor is built on LangChain's LCEL (LangChain Expression Language) runnable interface. Under the hood, create_react_agent assembles a runnable chain:

prompt | llm | ReActOutputParser

The ReActOutputParser inspects the LLM's text output for the Action: / Action Input: pattern (ReAct) or for structured tool_calls in the message (tool-calling mode). AgentExecutor wraps this chain in a Python while-loop, feeding the result of each tool back into the chain as an observation.

There are two distinct agent modes in LangChain:

Mode	How the LLM signals tool use	Prompt required	Best for
ReAct	Free-text: `Action: tool_name\nAction Input: {...}`	ReAct system prompt	Older models, transparency
Tool-calling	Structured JSON in `AIMessage.tool_calls`	Standard chat	GPT-4, Claude 3.5, Llama 3

For modern models (GPT-4o, Claude 3.5 Sonnet, Llama 3.1), the tool-calling mode is preferred. Create a tool-calling agent with create_tool_calling_agent instead of create_react_agent:

from langchain.agents import create_tool_calling_agent

# No special prompt needed — the LLM handles tool schema injection natively
agent_tc = create_tool_calling_agent(llm=llm, tools=tools, prompt=react_prompt)
executor_tc = AgentExecutor(agent=agent_tc, tools=tools, verbose=True)

The external behavior is identical — Thought → Action → Observation — but the LLM communicates via structured JSON rather than free-form text, which is more reliable and cheaper to parse.

For a full deep-dive on LangGraph's ToolNode and parallel tool execution, see LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools.

⚖️ AgentExecutor Trade-offs and Failure Modes

AgentExecutor is a pragmatic tool with well-understood limitations. Being explicit about them up front saves a lot of debugging time in production.

Linear execution only. The ReAct loop is a single thread: one tool call at a time, no parallelism. If an agent needs to call three independent APIs simultaneously, it makes three sequential round trips, each costing LLM time and token budget. LangGraph supports parallel tool calls; AgentExecutor does not.

No branching or conditional paths. The agent cannot execute different sub-sequences based on intermediate results. If tool A returns "user is not found," you cannot route to a "create user" flow; the agent can only ask the LLM to reason about it and call another tool. Complex conditional logic embedded in LLM reasoning is fragile and hard to test.

No persistent state across sessions. AgentExecutor holds no memory between invocations. Each call to executor.invoke() starts with a blank slate. Conversation history must be managed externally and injected as chat_history into the prompt on every call.

No human-in-the-loop. There is no native mechanism to pause execution, request human approval for a high-stakes tool call, and then resume. Everything runs to completion automatically.

Runaway loop risk. A poorly written docstring can cause the LLM to call a tool repeatedly with slightly different inputs, consuming iterations and budget. Always set max_iterations (typically 5–8) and handle_parsing_errors=True.

Failure Mode	Symptom	Mitigation
Tool selected for wrong query	Wrong tool called, poor answer	Write precise, scoped docstrings
Argument hallucination	Tool called with invalid args	Use Pydantic schemas with field descriptions
Infinite loop	`AgentError: max_iterations`	Set `max_iterations`, add guard clauses in tools
Context overflow	LLM refuses to respond	Reduce `max_iterations`; summarize observations
Silent tool failure	Exception swallowed	Return error strings from tools, never raise raw exceptions

🧭 Decision Guide: Classic AgentExecutor or LangGraph?

The right tool depends on the complexity of the agent's decision-making surface.

Situation	Recommendation
Use AgentExecutor when	The task is linear (call 1–5 tools, get an answer); no branching; no human approval; prototype or internal tooling.
Avoid AgentExecutor when	You need conditional routing, parallel tool calls, looping with memory, or human-in-the-loop checkpoints.
Upgrade to LangGraph when	Your graph needs explicit branching, subgraphs, state persistence across sessions, or the ability to pause and resume.
Edge case: hybrid	Use `AgentExecutor` for a simple subtask node inside a larger LangGraph — the two compose cleanly.

For simple linear tool chains, AgentExecutor works well and is fast to ship. When your agent needs to loop, branch based on tool results, or resume after human review, you have hit its ceiling — that is where LangGraph begins.

The transition is not a rewrite. LangGraph nodes are functions; AgentExecutor can be a node. Start with the simpler tool, validate the agent's capabilities, and graduate to the graph when the feature requirements demand it.

📚 Lessons Learned from Building Tool-Using LangChain Agents

Write docstrings for the model, not for humans. Every word in a tool docstring is LLM prompt text. The model reads it to decide (a) whether to call the tool at all, and (b) what arguments to supply. Vague docstrings cause wrong tool selection. Precise docstrings — including what the tool does NOT cover — dramatically improve selection accuracy.

Return strings, not exceptions. If a tool hits an error (API timeout, unknown ticker, invalid city), return a descriptive error string: "Error: No weather data for city 'xyz'". Do not raise a Python exception — the agent framework will catch it, but the LLM's next step will be unpredictable. A string observation lets the LLM gracefully reason: "the city name may be wrong; let me ask the user for clarification."

Mock external APIs during development. Tools that call real APIs add latency, cost, and flakiness to every test iteration. Build tools with a DEV_MODE flag that returns deterministic mock data. Switch to live APIs only when validating the full end-to-end flow.

Always set max_iterations. In production, without a hard ceiling, a confused LLM can spin for 20+ iterations before a timeout. Set max_iterations=6 for most agents; increase for complex multi-step research tasks; never leave it at the default 15 unless you have very robust error handling.

Use handle_parsing_errors=True from day one. ReAct output parsing fails intermittently when the LLM produces malformed Action Input JSON. This flag tells the executor to inject the parse error as an observation and let the LLM retry, rather than crashing the entire request.

Test with verbose=True, deploy with verbose=False. The reasoning trace is invaluable during development — it tells you exactly which tool was called, with which arguments, and what came back. In production, log tool calls to an observability system instead of stdout.

📌 Summary & Key Takeaways

LLMs only predict tokens — they cannot fetch data, run calculations, or execute code. Tools are the mechanism that bridges language to action.
The @tool decorator transforms a Python function into an LLM-callable capability. The docstring is the model's only guide to when and how to use it — write it with care.
The ReAct loop (Thought → Action → Observation) drives AgentExecutor. Each iteration appends to the message history, so costs grow linearly with loop depth.
AgentExecutor is a while-loop around a runnable chain. It is simple, fast to set up, and correct for linear single-threaded tool chains.
Two agent modes exist: ReAct (free-text) and tool-calling (structured JSON). For GPT-4o and Claude 3.5+, prefer the tool-calling mode — it is more reliable and cheaper to parse.
Built-in tools (DuckDuckGo, Wikipedia, Python REPL) cover the most common integration patterns out of the box.
Hard limits: no branching, no parallelism, no state persistence, no human-in-the-loop. When you need any of these, LangGraph is the right upgrade path.
One-liner to remember: Build the agent you can ship today; know the ceiling so you can plan the upgrade.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read