AI Agents Explained: When LLMs Start Using Tools
An LLM can talk, but an AI Agent can *act*. We explain how Agents use the ReAct framework to brow...
Abstract Algorithms
TLDR: A standard LLM is a brain in a jar — it can reason but cannot act. An AI Agent connects that brain to tools (web search, code execution, APIs). Instead of just answering a question, an agent executes a loop of Thought → Action → Observation until the goal is reached.
📖 Brain in a Jar vs Brain with Arms
A plain LLM generates text. Give it "What is the weather in Tokyo today?" and it will:
- Answer from training data (which is months or years old).
- Confidently hallucinate a plausible-sounding answer.
An AI agent would:
- Recognize it needs current weather data.
- Call a weather API tool.
- Return the real, live answer.
The difference: the agent can act on the world, not just describe it.
⚙️ The ReAct Loop: Thought → Action → Observation
The dominant pattern for agents is ReAct (Reasoning + Acting). The model cycles through three steps until the task is complete:
| Step | Type | Content |
| 1 | Thought | I need to find out when the movie Titanic was released. |
| 2 | Action | search("Titanic movie release date") |
| 3 | Observation | "Titanic was released in December 1997." |
| 4 | Thought | Now I need to find who was US president in December 1997. |
| 5 | Action | search("US President December 1997") |
| 6 | Observation | "Bill Clinton was US President in December 1997." |
| 7 | Thought | I have all the information. I can answer. |
| 8 | Final Answer | Bill Clinton was president when Titanic was released. |
flowchart TD
Start([User Goal]) --> T[Thought: What do I need?]
T --> A[Action: Call a Tool]
A --> O[Observation: Tool Result]
O --> D{Goal reached?}
D -- No --> T
D -- Yes --> Answer([Return Final Answer])
This loop continues until the model decides it has enough information to answer.
🔢 Tool Definitions: How an Agent Knows What It Can Do
A tool is a function the model can call. In LangChain you define tools with a name, description, and input schema:
from langchain.tools import tool
@tool
def search_web(query: str) -> str:
"""Search the web for current information. Use this for recent events or facts."""
return web_search_api(query)
@tool
def run_python(code: str) -> str:
"""Execute Python code and return the output. Use this for calculations."""
return exec_sandbox(code)
The model receives the tool descriptions in its system prompt and decides which to call (and with what arguments) based on the task. It never sees the implementation — only the name and docstring.
🧠 Building a Simple Agent with LangChain
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain import hub
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [search_web, run_python]
prompt = hub.pull("hwchase17/react") # standard ReAct prompt
agent = create_react_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
result = executor.invoke({"input": "Who was US president when Titanic was released?"})
print(result["output"])
The verbose=True flag shows you the full Thought/Action/Observation chain — invaluable for debugging.
🌍 Real-World Agent Use Cases
| Use case | Tools used |
| Customer support triage | CRM lookup, ticket creation, knowledge base search |
| Data analyst bot | SQL runner, Python executor, chart renderer |
| Code reviewer agent | GitHub file reader, linter, test runner |
| Travel booking | Flight search API, hotel API, calendar API |
| Research assistant | Web search, PDF reader, citation manager |
⚖️ When Agents Fail: Hallucinations, Loops, and Cost Blowouts
Agents introduce new failure modes beyond plain LLMs:
Hallucinated tool calls — the model invents arguments or calls a non-existent tool. Fix: validate tool schemas strictly; use structured outputs.
Infinite loops — the agent gets stuck in Thought→Action→Observation cycles with no progress. Fix: set a hard max_iterations limit.
Cost explosion — each loop iteration is an API call + tool call. A task that needs 15 iterations with GPT-4 can cost $1 per query. Fix: use cheaper models for planning steps; cache repeated tool results.
Context overflow — long observation histories can push earlier context out of the window. Fix: summarize or prune old observations periodically.
📌 Key Takeaways
- A plain LLM generates text; an agent generates text and calls tools to act.
- The dominant loop is ReAct: Thought → Action → Observation, repeated until the task is complete.
- Tools are functions with a name and description; the LLM decides when and how to call them.
- Key failure modes: hallucinated tool calls, infinite loops, cost explosion, and context overflow.
- Always set
max_iterationsand monitor tool call costs in production.
🧩 Test Your Understanding
- What is the difference between an LLM and an AI agent?
- In the ReAct pattern, what triggers the agent to stop looping?
- Why are tool descriptions (docstrings) so important for agent reliability?
- Name two ways to prevent an agent from running up an unexpectedly large API bill.
🔗 Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. �...
