The ReAct Agent Pattern in LangGraph: Think, Act, Observe, Repeat

Build ReAct agents in LangGraph: prebuilt create_react_agent vs custom StateGraph loop, with a coding assistant example.

Agentic AI: LangChain and LangGraph

Abstract Algorithms

·Mar 28, 2026·21 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 21 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: ReAct = Think + Act + Observe, looped as a LangGraph graph — prebuilt or custom.

📖 The Single-Shot Failure: Why One LLM Call Isn't Enough for Complex Tasks

Your agent is supposed to write a function, run the tests, fix the failures, and repeat until green. With a single LLM call, it writes the function and stops. It has no way to see the test output and try again.

This is the single-shot ceiling — the fundamental limit of one-call agents. The LLM generates a response and the conversation ends. No feedback loop, no correction, no ability to observe what actually happened when the code ran.

Single-shot works fine for tasks where the correct answer is entirely within the model's training distribution: "Summarize this paragraph." "Translate this sentence." "What is the capital of France?" These have deterministic, self-contained answers.

The moment a task requires external feedback — run this code and tell me if it passed, search the web for current prices, read the file that was just created — single-shot collapses. The LLM cannot observe the world outside its context window. It can only generate text based on what it was given at call time.

Task Type	Single-Shot	Looping Agent
Summarize a document	✅ Works	Overkill
Answer from training data	✅ Works	Overkill
Write + run + fix code	❌ Fails	✅ Required
Multi-step research with live data	❌ Fails	✅ Required
Iterative data transformation	❌ Fails	✅ Required

The solution is an agent that can act, observe the result, and decide what to do next — repeating as many times as the task demands. That is exactly what the ReAct pattern provides.

🔍 The ReAct Pattern: Think, Act, Observe, and Why It Works

ReAct (Reasoning + Acting) was introduced in a 2022 paper by Yao et al. The core insight is deceptively simple: instead of generating one final answer, make the LLM generate a reasoning trace interleaved with tool calls, each followed by an observation of the tool's output.

The loop looks like this:

Think — the LLM reasons about the current state and decides what action to take next.
Act — it calls a tool (a function, an API, a shell command).
Observe — the tool's return value is fed back into the LLM's context as a ToolMessage.
Repeat — the LLM decides whether to call another tool or produce a final answer.

The paper demonstrated that ReAct agents significantly outperform two alternatives:

Pure reasoning (chain-of-thought, no tools) — the LLM reasons but cannot verify facts or act on the world.
Pure acting (tools without reasoning) — the agent calls tools mechanically, without the planning quality that reasoning provides.

The combination is what matters. Reasoning improves tool selection. Tool results ground the reasoning and prevent hallucination. Each iteration builds on the previous observations, so the agent gets progressively closer to the goal.

Turn 1 — Think: "I need to write a fibonacci function first."
Turn 1 — Act:   write_file("fib.py", "def fib(n): ...")
Turn 1 — Observe: "File written successfully."

Turn 2 — Think: "I should run the tests to verify correctness."
Turn 2 — Act:   run_tests("test_fib.py")
Turn 2 — Observe: "FAILED: test_fib.py::test_edge_cases - AssertionError: fib(0) returned 1, expected 0"

Turn 3 — Think: "The base case for 0 is wrong. I need to fix the function."
Turn 3 — Act:   write_file("fib.py", "def fib(n): if n == 0: return 0 ...")
Turn 3 — Observe: "File written."

Turn 4 — Think: "Let me re-run the tests."
Turn 4 — Act:   run_tests("test_fib.py")
Turn 4 — Observe: "PASSED: 4 tests in 0.1s"

Turn 5 — Think: "All tests pass. I'm done."
Turn 5 — Final Answer: "The fibonacci function is complete and all tests pass."

The critical structural requirement is that all prior observations stay in the context window. The LLM "remembers" every step because every step is a message in the conversation history. This is exactly what LangGraph's MessagesState provides.

⚙️ Two Implementations: create_react_agent vs Building the Loop Manually

LangGraph gives you two paths to a working ReAct agent. The right choice depends on how much you need to customize the loop.

Path 1: The Prebuilt Shortcut with `create_react_agent`

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def run_tests(test_file: str) -> str:
    """Run pytest on the given test file and return output."""
    import subprocess
    result = subprocess.run(
        ["pytest", test_file, "-v", "--tb=short"],
        capture_output=True, text=True
    )
    return result.stdout + result.stderr

@tool
def write_file(path: str, content: str) -> str:
    """Write content to a file and return a confirmation."""
    with open(path, "w") as f:
        f.write(content)
    return f"File '{path}' written ({len(content)} chars)."

llm = ChatOpenAI(model="gpt-4o")
tools = [run_tests, write_file]

# One line — the entire ReAct loop is wired for you
agent = create_react_agent(llm, tools)

result = agent.invoke({
    "messages": [("user", "Write a fibonacci function in fib.py and make all tests in test_fib.py pass.")]
})
print(result["messages"][-1].content)

create_react_agent handles everything: it binds your tools to the LLM, creates the agent node, creates the ToolNode, wires the conditional edge, and compiles the graph. It accepts an optional state_modifier argument for injecting a system prompt without polluting message history.

When it's enough: prototyping, simple single-agent flows, standard MessagesState, no custom inter-step logic.

Where it falls short: you cannot add custom nodes between the agent and tool steps, you cannot use a non-standard state schema, and you cannot insert side effects (logging, guardrails, memory writes) mid-loop.

Path 2: Building the Loop Manually as a StateGraph

from langgraph.graph import StateGraph, MessagesState, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import SystemMessage, AIMessage
from langchain_openai import ChatOpenAI

SYSTEM_PROMPT = SystemMessage(content="""You are a Python coding assistant.
When asked to implement code, write it, run the tests, fix any failures,
and iterate until ALL tests pass. Use your tools to verify every change.""")

def build_react_agent(llm, tools):
    llm_with_tools = llm.bind_tools(tools)

    def agent_node(state: MessagesState):
        # Prepend system prompt on every call — it is NOT stored in state
        messages = [SYSTEM_PROMPT] + state["messages"]
        response = llm_with_tools.invoke(messages)
        return {"messages": [response]}

    def should_continue(state: MessagesState) -> str:
        last_msg = state["messages"][-1]
        # If the LLM issued tool calls, route to the tools node
        if hasattr(last_msg, "tool_calls") and last_msg.tool_calls:
            return "tools"
        # Otherwise the LLM produced a final answer — stop
        return END

    tool_node = ToolNode(tools)

    workflow = StateGraph(MessagesState)
    workflow.add_node("agent", agent_node)
    workflow.add_node("tools", tool_node)
    workflow.set_entry_point("agent")
    workflow.add_conditional_edges("agent", should_continue)
    workflow.add_edge("tools", "agent")   # always loop back after tool execution

    return workflow.compile()

agent = build_react_agent(ChatOpenAI(model="gpt-4o"), tools)

The key difference from create_react_agent is the should_continue function — you own it. You can add early-exit conditions, custom logging, or guardrail checks right there.

🧠 Deep Dive: What Happens Inside Each Loop Iteration

The Internals

MessagesState is the agent's working memory. Every message appended to the messages list persists across loop iterations. After turn 3, the LLM's context window contains the original user request, two AIMessage responses (each with their tool_calls), and two ToolMessage observations. This is how the agent "knows" what it tried before — there is no separate memory store; the conversation history IS the reasoning trace.

The data flow for one iteration looks like this:

Step	Object Created	Type
User sends request	`HumanMessage("Write a fib function...")`	`HumanMessage`
LLM decides to call a tool	`AIMessage(tool_calls=[ToolCall(...)])`	`AIMessage`
`ToolNode` executes the tool	`ToolMessage(content="FAILED: ...", tool_call_id=...)`	`ToolMessage`
LLM sees all three messages, decides next action	`AIMessage(tool_calls=[...])`	`AIMessage`

ToolNode handles the execution machinery. You define tools with the @tool decorator; ToolNode receives the AIMessage, extracts the tool_calls list, dispatches each call to the matching function in your tools list, and wraps the return values in ToolMessage objects with the correct tool_call_id. This ID links each observation back to the specific tool call that generated it — critical for models that can emit multiple parallel tool calls in a single AIMessage.

bind_tools is the bridge between the LLM and your functions. When you call llm.bind_tools(tools), LangChain serializes your tool signatures and docstrings into the format that the model's API expects (OpenAI's tools parameter, Anthropic's tools block, etc.). The LLM never calls your Python functions directly — it emits a structured JSON tool call in its response, which ToolNode then dispatches.

Performance Analysis

Token cost scales with loop depth. Each iteration adds messages to the context. By turn 5, the prompt contains the full conversation so far. For a GPT-4o run with 5 iterations at ~500 tokens per turn, you are paying for roughly 2,500 input tokens on turn 5 alone — compared to 500 on turn 1. This is the context accumulation cost, and it is the primary cost driver for deep loops.

Latency is additive. A 3-turn loop where each LLM call takes 2 seconds adds 6 seconds of LLM latency before the first final answer arrives. Tool execution latency (test runs, API calls, file I/O) stacks on top of that.

Infinite loop risk is real. The should_continue function is your only safety valve. If the LLM consistently hallucinates tool results or gets into a correction→failure→correction cycle, the loop will run until it exhausts your token budget. The standard mitigation is an iteration counter in custom state:

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph.message import add_messages

class BoundedAgentState(TypedDict):
    messages: Annotated[list, add_messages]
    iterations: int

def agent_node_with_limit(state: BoundedAgentState):
    if state.get("iterations", 0) >= 8:
        return {
            "messages": [AIMessage(content="Iteration limit reached. Stopping.")],
            "iterations": state["iterations"],
        }
    messages = [SYSTEM_PROMPT] + state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response], "iterations": state.get("iterations", 0) + 1}

Using a custom TypedDict instead of MessagesState gives you this kind of extra state field. The trade-off is that you must manage the messages field's merge reducer yourself (hence Annotated[list, add_messages]).

📊 The ReAct Loop as a LangGraph Graph

The entire ReAct pattern maps cleanly onto four LangGraph primitives: an entry point, two nodes, a conditional edge, and an unconditional back-edge.

graph TD
    A([ START]) --> B[" Agent Node (LLM call with bind_tools)"]
    B --> C{should_continue}
    C -- "tool_calls present in last AIMessage" --> D[" ToolNode (execute tool functions)"]
    D -- "append ToolMessages to state" --> B
    C -- "no tool_calls (final answer)" --> E([ END])

    style A fill:#4CAF50,color:#fff,stroke:none
    style E fill:#F44336,color:#fff,stroke:none
    style B fill:#2196F3,color:#fff,stroke:#1565C0
    style D fill:#FF9800,color:#fff,stroke:#E65100
    style C fill:#9C27B0,color:#fff,stroke:#6A1B9A

The loop: every AIMessage with tool calls routes right; every AIMessage without tool calls routes down to END. ToolNode always routes back up to the agent.

What makes this a loop and not a chain is the back-edge from tools → agent. LangGraph's graph model explicitly supports cycles; a plain DAG-based workflow library (like early Airflow or Prefect) would require unrolling this into N duplicated steps. With LangGraph, the cycle is a first-class construct.

The should_continue function is the only decision point. Everything else — state merging, tool dispatch, error wrapping — is handled by LangGraph internals.

🌍 Real-World Applications: Where ReAct Agents Are Deployed Today

ReAct is not a research curiosity. It is the backbone of a wide class of production deployments:

Coding assistants with test harnesses (GitHub Copilot Workspace, Devin-style agents): Write code → run linter → fix → run tests → fix → commit. The loop terminates when all checks pass or an iteration limit is reached. This is exactly the pattern this post's coding assistant example implements.

Automated data analysis pipelines: A data analyst agent receives a natural-language question ("Why did revenue drop 12% in Q3?"), queries a SQL database, inspects the result, runs a Python aggregation, charts the output, and iterates until the answer is coherent. Each query result is an observation that shapes the next query.

Customer support with live system lookup: An agent receives a ticket, looks up the customer's subscription status, checks recent invoices, queries an incident log, and synthesizes a reply. Without the loop, it cannot follow the chain of lookups that depends on what it found in the previous step.

Research and summarization over live corpora: Search → read → decide if more sources are needed → search again → synthesize. The agent terminates when it has enough evidence, not after a fixed number of searches.

In all cases, the same structural requirement applies: the task cannot be completed in one call because the correct next step depends on what the previous step returned.

⚖️ Trade-offs and Failure Modes: Infinite Loops, Token Explosion, and Hallucinated Tool Calls

Performance vs. Cost

ReAct loops are significantly more expensive than single-shot calls. A 5-turn loop on GPT-4o can consume 10–20× the tokens of a direct answer, because the growing conversation history is re-sent to the model on every iteration. For high-throughput systems, this cost is non-trivial.

Mitigation: Use a cheaper model (GPT-4o-mini, Haiku) for tool routing turns and reserve the expensive model for the final synthesis turn. LangGraph makes this easy — you can use different LLMs in different nodes.

Infinite Loop Failure Mode

The most dangerous failure mode is an agent that loops without making progress. This happens when:

The LLM repeatedly calls the same tool with the same arguments (stuck in a retry loop).
Tool errors are ambiguous and the LLM cannot determine whether to retry or give up.
The LLM hallucinates a successful tool result and then cannot reconcile it with subsequent failures.

Mitigation: Always implement an iteration cap in custom state (shown in the Deep Dive). Additionally, log every (tool_name, arguments) pair and detect duplicate calls within the same session.

Hallucinated Tool Calls

The LLM may emit a tool call that references a function that does not exist in your tools list, or passes arguments with the wrong type. LangGraph's ToolNode will raise a ValueError in this case. If unhandled, this crashes the graph.

Mitigation: Wrap ToolNode execution in a try/except and return a ToolMessage with the error string so the LLM can recover:

from langgraph.prebuilt import ToolNode

# ToolNode has built-in error handling via handle_tool_errors
tool_node = ToolNode(tools, handle_tool_errors=True)

Setting handle_tool_errors=True (available in LangGraph ≥ 0.1.17) catches exceptions from tools and returns the traceback as a ToolMessage, keeping the loop alive so the LLM can attempt a correction.

Token Explosion in Long Conversations

Deep loops on complex tasks can exhaust the context window. A 20-turn loop with verbose tool outputs can easily hit 128K tokens on GPT-4o.

Mitigation: Summarize old tool results before they accumulate. After every N turns, invoke a summarization step that compresses all prior ToolMessage content into a single condensed HumanMessage, then trim the history. This is a custom node you would add between tools and agent in a manual build — another reason to build the loop yourself when operating at production scale.

🧭 Decision Guide: create_react_agent vs Custom Loop vs Plan-and-Execute

Situation	Recommendation
Prototype or demo — standard tools, no custom nodes	Use `create_react_agent`; ship in minutes
Production agent with logging, guardrails, or memory writes between steps	Build the loop manually with `StateGraph`; add nodes between `tools` and `agent`
Task with a knowable, fixed structure (research 5 papers → synthesize)	Use Plan-and-Execute instead; see Multistep AI Agents: The Power of Planning
Multi-agent system where this agent is one node	Build manually; `create_react_agent` compiles to a `CompiledGraph` that can be embedded as a node in a parent graph
Non-standard state (need to track iteration count, custom fields)	Build manually with a custom `TypedDict` state
LLM is Anthropic, Groq, or Ollama (not OpenAI)	Either approach; `bind_tools` is provider-agnostic

🧪 Practical Example: Coding Assistant That Iterates Until Tests Pass

This example implements the opening scenario end-to-end: a coding assistant that writes a Python function, runs the test suite, inspects failures, and iterates until all tests are green — or gives up after 8 attempts. The coding-assistant scenario was chosen because it is the clearest possible demonstration of the ReAct loop's core value: the agent cannot solve the task in one shot because it needs to observe real test output before deciding what to fix. As you read through the execution trace, watch the should_continue edge — every iteration where the result is "tools" is one full Think→Act→Observe cycle, and the loop only exits to END when the LLM decides there is nothing more to fix.

import os
import subprocess
from typing import Annotated
from typing_extensions import TypedDict

from langchain_core.messages import SystemMessage, AIMessage
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode

# ── State ──────────────────────────────────────────────────────────────────
class CodingAgentState(TypedDict):
    messages: Annotated[list, add_messages]
    iterations: int

# ── Tools ──────────────────────────────────────────────────────────────────
@tool
def write_python_file(filename: str, code: str) -> str:
    """Write Python source code to a file. Returns confirmation."""
    with open(filename, "w") as f:
        f.write(code)
    return f"Written {len(code)} chars to '{filename}'."

@tool
def run_pytest(test_file: str) -> str:
    """Run pytest on the given test file. Returns stdout + stderr."""
    result = subprocess.run(
        ["pytest", test_file, "-v", "--tb=short", "--no-header"],
        capture_output=True, text=True, timeout=30
    )
    output = result.stdout + result.stderr
    return output[:4000]  # Guard against extremely verbose output

tools = [write_python_file, run_pytest]

# ── Nodes ──────────────────────────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools(tools)

SYSTEM = SystemMessage(content="""You are an expert Python engineer.
When given a coding task:
1. Write the implementation file using write_python_file.
2. Run the test file using run_pytest.
3. If any tests fail, read the error output carefully, fix the code, and run tests again.
4. Repeat until ALL tests pass, then report "All tests pass." as your final answer.
Do not stop until tests pass or you are truly stuck.""")

MAX_ITERATIONS = 8

def agent_node(state: CodingAgentState):
    if state.get("iterations", 0) >= MAX_ITERATIONS:
        return {
            "messages": [AIMessage(content=f"Stopped after {MAX_ITERATIONS} iterations.")],
            "iterations": state.get("iterations", 0),
        }
    messages = [SYSTEM] + state["messages"]
    response = llm_with_tools.invoke(messages)
    return {
        "messages": [response],
        "iterations": state.get("iterations", 0) + 1,
    }

def should_continue(state: CodingAgentState) -> str:
    last = state["messages"][-1]
    if state.get("iterations", 0) >= MAX_ITERATIONS:
        return END
    if hasattr(last, "tool_calls") and last.tool_calls:
        return "tools"
    return END

tool_node = ToolNode(tools, handle_tool_errors=True)

# ── Graph ──────────────────────────────────────────────────────────────────
workflow = StateGraph(CodingAgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", should_continue)
workflow.add_edge("tools", "agent")

coding_agent = workflow.compile()

# ── Run ────────────────────────────────────────────────────────────────────
if __name__ == "__main__":
    result = coding_agent.invoke({
        "messages": [(
            "user",
            "Implement a fibonacci function in fib.py. "
            "Tests are in test_fib.py. Make all tests pass."
        )],
        "iterations": 0,
    })

    for msg in result["messages"]:
        role = type(msg).__name__
        content = getattr(msg, "content", "") or ""
        if content:
            print(f"[{role}] {content[:200]}")
    print(f"\nCompleted in {result['iterations']} iteration(s).")

What the loop looks like across 3+ turns:

[Turn 1] Agent thinks: "I'll write the fibonacci implementation first."
         Calls: write_python_file("fib.py", "def fib(n):\n    if n <= 1: return n\n    ...")
[Tool]   "Written 89 chars to 'fib.py'."

[Turn 2] Agent thinks: "Now run the tests."
         Calls: run_pytest("test_fib.py")
[Tool]   "FAILED test_fib.py::test_large_input - RecursionError: maximum recursion depth exceeded"

[Turn 3] Agent thinks: "I need an iterative implementation for large inputs."
         Calls: write_python_file("fib.py", "def fib(n):\n    a, b = 0, 1\n    for _ in range(n): a, b = b, a+b\n    return a")
[Tool]   "Written 74 chars to 'fib.py'."

[Turn 4] Agent thinks: "Retry the tests."
         Calls: run_pytest("test_fib.py")
[Tool]   "PASSED: 5 tests in 0.04s"

[Turn 5] Agent produces final answer: "All tests pass. The fibonacci function uses an iterative
         approach and handles inputs up to fib(1000) without recursion errors."

The agent used 5 turns, observed real test output, diagnosed a recursion failure, and applied a targeted fix — none of which is possible in a single shot.

🛠️ LangGraph Prebuilt Agents: create_react_agent, create_tool_calling_executor

LangGraph ships two prebuilt agent constructors. Understanding both helps you choose the right level of abstraction.

`create_react_agent` (current standard)

from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic  # swap provider freely

# Works identically with ChatOpenAI, ChatAnthropic, ChatOllama, ChatGroq
agent = create_react_agent(
    model=ChatAnthropic(model="claude-3-5-sonnet-20241022"),
    tools=tools,
    # System prompt injected before messages on every call, not stored in state
    state_modifier=SYSTEM,
)

create_react_agent is LLM-agnostic: any model that supports tool calling via .bind_tools() works identically. This includes ChatOpenAI, ChatAnthropic, ChatOllama, and ChatGroq. The graph topology is fixed: agent → conditional → tools → agent.

For inspection, you can render the compiled graph:

from IPython.display import Image
Image(agent.get_graph().draw_mermaid_png())

`create_tool_calling_executor` (legacy)

create_tool_calling_executor was an earlier LangGraph function (pre-0.1.x) that wired a similar loop but with a less flexible state schema and no state_modifier support. It is now considered deprecated in favour of create_react_agent. If you encounter it in older codebases, replace it with create_react_agent — the interface is nearly identical.

Scratchpad vs MessagesState

The original ReAct paper used a scratchpad approach: all thoughts, actions, and observations were concatenated into a single string that grew with each step. LangGraph uses MessagesState instead: a typed list of BaseMessage objects (Human, AI, Tool). The MessagesState approach is strictly superior for LangGraph use:

	Scratchpad (string)	MessagesState (typed list)
Parsing tool calls	Manual regex	Structured `tool_calls` attribute
Multi-turn history	One long string	Ordered list, each message typed
Provider compatibility	Prompt-engineering dependent	Natively supported by all major LLM APIs
Streaming	Difficult	Native via `.astream()`

📚 Lessons Learned

1. The system prompt does not belong in state. A common mistake is appending the system prompt as the first message in MessagesState. It then gets re-included in every ToolMessage response, inflating token counts and sometimes confusing tool routing. Always inject it in the agent_node function — not in the initial invoke payload.

2. handle_tool_errors=True is non-negotiable in production. Without it, a single malformed tool call crashes the graph with an unhandled exception. With it, the LLM sees the error as a ToolMessage and can self-correct. The cost is negligible; the resilience benefit is significant.

3. Build manually when you need anything non-standard. The moment you need to log tool call latency, write an observation to a vector store, run a guardrail check, or enforce a different system prompt per iteration, create_react_agent is not flexible enough. Start with the manual build from day one if any of those requirements exist.

4. Deterministic iteration limits prevent runaway costs. Always bound the loop. A hard cap of 8–12 iterations is reasonable for most coding or research tasks. Log when the cap is hit — those cases are diagnostically valuable for identifying prompts or tools that cause the agent to spin.

5. Token cost grows quadratically, not linearly. Turn N sends all N-1 prior messages plus the new input. Budget for this. For GPT-4o at $5/M input tokens, a 10-turn loop with 800 tokens per turn costs roughly 10× more than the equivalent single-shot call, not 10× more than turn 1 alone — because every turn re-sends the entire history.

📌 TLDR: Summary and Key Takeaways

TLDR: ReAct = Think + Act + Observe, looped as a LangGraph graph — prebuilt or custom.

Single-shot LLM calls fail at any task where the correct next step depends on real-world feedback (test output, API responses, file contents).
ReAct solves this by embedding a reasoning trace interleaved with tool calls in a loop — each tool result becomes an observation that informs the next thought.
LangGraph implements ReAct as a graph: agent node → conditional edge → ToolNode → back to agent node. The back-edge is what makes it a loop, not a chain.
create_react_agent gives you a working loop in two lines. Use it for prototyping and simple single-agent flows.
The manual StateGraph build gives you full control: custom state fields, extra nodes between steps, per-node model selection, and custom should_continue logic.
Iteration limits are mandatory in production. Context accumulates with every turn; without a cap, runaway loops are expensive and hard to debug.
bind_tools is provider-agnostic. The same ReAct graph works with OpenAI, Anthropic, Ollama, and Groq without changing the graph topology.
The memorable rule: if a task requires "act, see what happened, then decide," it is a ReAct task. Wire it as a loop, not a chain.

Multistep AI Agents: The Power of Planning — when Plan-and-Execute outperforms a ReAct loop
AI Agents Explained: When LLMs Start Using Tools — the foundational concepts behind tool-calling agents
AI Architecture Patterns: Routing, Planning, Memory, and Evaluation — where ReAct fits in a full production AI architecture

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read