AI Agents Explained: When LLMs Start Using Tools
An LLM can talk, but an AI Agent can *act*. We explain how Agents use the ReAct framework to brow...
Abstract Algorithms
AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: A standard LLM is a brain in a jar — it can reason but cannot act. An AI Agent connects that brain to tools (web search, code execution, APIs). Instead of just answering a question, an agent executes a loop of Thought → Action → Observation until the goal is reached.
📖 Brain in a Jar vs Brain with Arms
A plain LLM generates text. Give it "What is the weather in Tokyo today?" and it will:
- Answer from training data (which is months or years old).
- Confidently hallucinate a plausible-sounding answer.
An AI agent would:
- Recognize it needs current weather data.
- Call a weather API tool.
- Return the real, live answer.
The difference: the agent can act on the world, not just describe it.
🔍 The Basics: What Is an AI Agent
An AI agent is a program that uses a large language model as its reasoning engine and connects it to external tools so it can take real actions in the world. Where a plain LLM is purely generative — text in, text out — an agent augments the LLM with capabilities like web search, code execution, file reading, database queries, and API calls.
Three components make up almost every agent:
- The LLM — the reasoning core that decides what to do next.
- Tools — callable functions the LLM is allowed to invoke (e.g.,
search_web,run_python). - The loop — the agent keeps taking steps (Thought → Action → Observation) until it decides the goal is complete.
The key insight is that the LLM is never given raw access to tools. Instead, the tool descriptions (names and docstrings) are injected into the system prompt. The model reads those descriptions and decides — at each loop step — which tool to call, with what arguments, and why.
This architecture is called ReAct (Reasoning + Acting), introduced in a 2022 paper by Yao et al. ReAct proved that chaining reasoning traces with tool calls dramatically outperforms either pure prompting or pure tool-use alone.
How is an agent different from a simple API call? A single API call is one-shot: input → output. An agent is iterative: it observes the result of each action and can decide to take another action, fix an error, try a different tool, or produce a final answer — all without human intervention in the loop.
⚙️ The ReAct Loop: Thought → Action → Observation
The dominant pattern for agents is ReAct (Reasoning + Acting). The model cycles through three steps until the task is complete:
| Step | Type | Content |
| 1 | Thought | I need to find out when the movie Titanic was released. |
| 2 | Action | search("Titanic movie release date") |
| 3 | Observation | "Titanic was released in December 1997." |
| 4 | Thought | Now I need to find who was US president in December 1997. |
| 5 | Action | search("US President December 1997") |
| 6 | Observation | "Bill Clinton was US President in December 1997." |
| 7 | Thought | I have all the information. I can answer. |
| 8 | Final Answer | Bill Clinton was president when Titanic was released. |
flowchart TD
Start([User Goal]) --> T[Thought: What do I need?]
T --> A[Action: Call a Tool]
A --> O[Observation: Tool Result]
O --> D{Goal reached?}
D -- No --> T
D -- Yes --> Answer([Return Final Answer])
This loop continues until the model decides it has enough information to answer.
📊 ReAct Loop Sequence
sequenceDiagram
participant U as User
participant L as LLM
participant T as Tool
U->>L: User request
L-->>L: Thought: what to do
L->>T: Action: call tool
T-->>L: Observation: result
L-->>L: Next thought
L-->>U: Final Answer
📊 How an Agent Processes a Request
When a user sends a question to an agent-powered system, several layers of logic activate before any answer is returned. Understanding this flow helps you reason about latency, costs, and failure points.
flowchart TD
U([User Request]) --> SYS[Inject Tool Schemas into LLM Context]
SYS --> LLM1[LLM Reasoning: Thought What do I need?]
LLM1 --> TC[Tool Selection: Which tool and args?]
TC --> TE[Tool Execution: Call external function]
TE --> OBS[Observation: Capture tool output]
OBS --> CHK{Goal reached?}
CHK -- No --> LLM1
CHK -- Yes --> ANS([Return Final Answer to User])
What happens at each node:
- Inject Tool Schemas — before the first LLM call, all tool names, descriptions, and input types are serialised into the system prompt so the model knows what it can use.
- LLM Reasoning — the model outputs a structured "Thought" explaining its next move. This trace is not shown to the user but is the core of the agent's transparency.
- Tool Selection — based on the Thought, the model emits a structured tool call (name + JSON arguments). Modern LLMs use function-calling APIs to enforce valid JSON here.
- Tool Execution — your application code runs the actual function, calling the real API or executing code in a sandbox.
- Observation — the result is appended to the conversation history so the model can read it on the next iteration.
- Goal check — the model itself decides when it has enough information. If it emits a "Final Answer", the loop exits.
🔢 Tool Definitions: How an Agent Knows What It Can Do
A tool is a function the model can call. In LangChain you define tools with a name, description, and input schema:
from langchain.tools import tool
@tool
def search_web(query: str) -> str:
"""Search the web for current information. Use this for recent events or facts."""
return web_search_api(query)
@tool
def run_python(code: str) -> str:
"""Execute Python code and return the output. Use this for calculations."""
return exec_sandbox(code)
The model receives the tool descriptions in its system prompt and decides which to call (and with what arguments) based on the task. It never sees the implementation — only the name and docstring.
📊 Agent Tool Decision Flow
flowchart TD
I[Input from User] --> LM[LLM Reasoning]
LM --> D{Need a tool?}
D -- No --> R[Return Answer]
D -- Yes --> TS[Select Tool]
TS --> EX[Execute Tool]
EX --> OB[Observe Output]
OB --> LM
🧠 Deep Dive: Building a Simple Agent with LangChain
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [search_web, run_python]
prompt = hub.pull("hwchase17/react") # standard ReAct prompt
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({"input": "Who was US president when Titanic was released?"})
print(result["output"])
The verbose=True flag shows you the full Thought/Action/Observation chain — invaluable for debugging.
🌍 Real-World Applications: Real-World Agent Use Cases
| Use case | Tools used |
| Customer support triage | CRM lookup, ticket creation, knowledge base search |
| Data analyst bot | SQL runner, Python executor, chart renderer |
| Code reviewer agent | GitHub file reader, linter, test runner |
| Travel booking | Flight search API, hotel API, calendar API |
| Research assistant | Web search, PDF reader, citation manager |
🧪 Practical: Debugging Agent Behavior
Agents fail in opaque ways that are hard to catch without systematic debugging. Here is a practical workflow for diagnosing misbehaving agents.
Step 1 — Enable verbose output. In LangChain, set verbose=True on the AgentExecutor. This prints the full Thought/Action/Observation chain to stdout, letting you see exactly what the model decided at each step and what each tool returned.
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=10)
Step 2 — Inspect the tool call arguments. The most common failure is the model calling the right tool with the wrong arguments. Look at the Action Input in the verbose log. If the model is passing None, an empty string, or a hallucinated value, the docstring on your tool is probably too vague.
Step 3 — Check for loops. If the agent keeps calling the same tool with the same arguments and getting no progress, it is stuck. Add max_iterations=10 (or lower) to break the loop and inspect the last observation. Usually the tool is returning an error the model does not know how to handle.
Step 4 — Shrink the context. If the agent starts forgetting earlier observations or repeating itself, the context window may be filling up. Consider summarising older observations or limiting how many tool calls are preserved in history.
Common failure patterns at a glance:
| Symptom | Probable Cause | Fix |
| Tool called with empty args | Vague tool description | Improve docstring with examples |
| Infinite loop | Tool returns silent error | Return readable error message |
| Wrong tool chosen | Tool names too similar | Rename for clarity |
| Context window exceeded | Too many iterations | Summarise older observations |
| Hallucinated answer | No relevant tool available | Add the needed tool or constrain scope |
⚖️ Trade-offs & Failure Modes: When Agents Fail: Hallucinations, Loops, and Cost Blowouts
Agents introduce new failure modes beyond plain LLMs:
Hallucinated tool calls — the model invents arguments or calls a non-existent tool. Fix: validate tool schemas strictly; use structured outputs.
Infinite loops — the agent gets stuck in Thought→Action→Observation cycles with no progress. Fix: set a hard max_iterations limit.
Cost explosion — each loop iteration is an API call + tool call. A task that needs 15 iterations with GPT-4 can cost $1 per query. Fix: use cheaper models for planning steps; cache repeated tool results.
Context overflow — long observation histories can push earlier context out of the window. Fix: summarize or prune old observations periodically.
🧭 Decision Guide: When to Use an AI Agent
Use an agent when the task requires multiple steps, tool use, or dynamic decision-making that can't be scripted upfront. Prefer a simple LLM call when a single prompt is sufficient — agents add latency and cost. Use agents for research, code-generation loops, or multi-tool workflows; avoid them for single-turn Q&A or classification tasks.
🛠️ LangChain AgentExecutor: Wiring the ReAct Loop in Five Lines of Python
LangChain is an open-source Python framework for composing LLM-powered applications; its agent module provides create_react_agent, AgentExecutor, and @tool decorator to implement the Thought → Action → Observation ReAct loop with minimal boilerplate.
create_react_agent wires an LLM, a list of tools, and a ReAct prompt template into an executable agent. AgentExecutor runs the loop: it calls the LLM, parses tool invocations, executes the tools, appends observations to context, and stops when the model emits a final answer or max_iterations is reached.
from langchain.agents import create_react_agent, AgentExecutor
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain import hub
# 1. Define tools — the LLM only ever sees the name and docstring
@tool
def search_web(query: str) -> str:
"""Search the web for current facts, news, or recent events. Returns a text summary."""
# In production: call Tavily, SerpAPI, or a custom search client
return f"[Search result for '{query}': Titanic was released December 19, 1997]"
@tool
def run_python(code: str) -> str:
"""Execute Python code in a sandboxed environment. Returns stdout. Use for math and data."""
import io, contextlib
buf = io.StringIO()
with contextlib.redirect_stdout(buf):
exec(code, {"__builtins__": __builtins__})
return buf.getvalue() or "(no output)"
# 2. Wire the agent — LLM + tools + standard ReAct prompt from LangChain Hub
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
tools = [search_web, run_python]
prompt = hub.pull("hwchase17/react") # standard ReAct template
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # prints full Thought/Action/Observation chain
max_iterations=10, # hard cap — prevents runaway cost loops
handle_parsing_errors=True, # recover gracefully from malformed tool calls
)
# 3. Run — the executor manages the full ReAct loop automatically
result = executor.invoke({
"input": "What is the square root of the year Titanic was released?"
})
print(result["output"])
# Thought: I need the release year of Titanic.
# Action: search_web("Titanic movie release year")
# Observation: Released December 19, 1997
# Thought: Now compute sqrt(1997).
# Action: run_python("import math; print(math.sqrt(1997))")
# Observation: 44.688...
# Final Answer: approximately 44.69
verbose=True is the most important debugging flag — it exposes every Thought/Action/Observation step, making it straightforward to diagnose incorrect tool selection or malformed arguments. Set max_iterations before going to production: without it, a confused agent loops until the token budget is exhausted.
For a full deep-dive on LangChain multi-tool agents, memory integration, and production observability with LangSmith tracing, a dedicated follow-up post is planned.
📚 Key Lessons About AI Agents
Five concrete lessons from building and deploying agents in production:
Tool descriptions are your most important prompt. The model never sees tool code — only the name and docstring. Write docstrings like user-facing documentation: what the tool does, when to use it, what to pass in, and what it returns. One ambiguous tool description causes more failures than any model limitation.
Always set
max_iterations. Without a hard cap, a confused agent will keep looping until your API budget is exhausted. In production, treat hitting the iteration limit as an error to alert on, not a silent fallback.Structured outputs improve reliability dramatically. Agents that return tool call arguments as free-form text hallucinate more often than agents using JSON-schema-validated function calling. Use OpenAI function calling or LangChain structured output tools wherever possible.
Cost scales with loop depth. Every Thought → Action → Observation iteration is one full LLM inference plus one tool call. A task requiring 10 iterations with a frontier model can cost $0.50–$1.00 per query. Profile your agents before going to production and set budgets per session.
Agents are hard to test deterministically. Unlike a pure function with fixed input/output, agents make probabilistic decisions at each step. Write integration tests using recorded tool responses (fixtures) rather than live tool calls, and explicitly test common failure paths such as tool errors and empty results.
📌 TLDR: Summary & Key Takeaways
- A plain LLM generates text; an agent generates text and calls tools to act.
- The dominant loop is ReAct: Thought → Action → Observation, repeated until the task is complete.
- Tools are functions with a name and description; the LLM decides when and how to call them.
- Key failure modes: hallucinated tool calls, infinite loops, cost explosion, and context overflow.
- Always set
max_iterationsand monitor tool call costs in production.
🧩 Test Your Understanding
- What is the difference between an LLM and an AI agent?
- In the ReAct pattern, what triggers the agent to stop looping?
- Why are tool descriptions (docstrings) so important for agent reliability?
- Name two ways to prevent an agent from running up an unexpectedly large API bill.
🔗 Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)
TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...
Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs
TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...
Watermarking and Late Data Handling in Spark Structured Streaming
TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
