Home/Blog/Ai/Multistep AI Agents: The Power of Planning

AiAdvanced•15 min read•Mar 9, 2026

Multistep AI Agents: The Power of Planning

Simple AI agents react one step at a time. Multistep agents are different: they create a full pla...

Abstract Algorithms

Helping engineers master software engineering topics.

Multistep AI Agents: The Power of Planning

TLDR: A simple ReAct agent reacts one tool call at a time. A multistep agent plans a complete task decomposition upfront, then executes each step sequentially — handling complex goals that require 5-10 interdependent actions without re-prompting the LLM for each step.

📖 Line Cook vs. Head Chef

A line cook (simple ReAct agent): receives one ticket, cooks one dish, hands it over. Then the next ticket.

A head chef (multistep agent): receives the full dinner party menu, plans the entire 5-course sequence, coordinates prep timing for all dishes, anticipates which items can be done in parallel, and manages the full execution before the first guest is seated.

The difference: planning before acting. Complex goals require plans, not just reactions.

🔍 What Makes a Multistep Agent Different

A simple ReAct agent operates with a single-step planning horizon: it looks at the current state, picks the next action, observes the result, and repeats. This works well for short, open-ended tasks where the required steps are unknown ahead of time. But when a task involves 5–10 interdependent actions, this approach becomes inefficient — the LLM must rediscover context at every step, decisions made in step 6 can conflict with assumptions from step 2, and there is no mechanism to detect that a task is structurally impossible before spending tokens on execution.

A multistep agent solves this by introducing a distinct planning phase before any tool is called. During planning, the LLM receives the full goal and produces a structured decomposition — typically a JSON array of steps, each with an action name and arguments. This plan encodes the task dependency graph: which steps can run in parallel, which must complete before others begin, and what the expected output of each step feeds into downstream steps.

Key concepts in multistep agent design:

Planning horizon: how many steps ahead the agent reasons before committing to an action
Task decomposition: breaking a complex goal into discrete, executable sub-tasks with clear inputs and outputs
Step dependencies: the directed acyclic graph (DAG) of which steps must complete before others can begin
Re-planning: when a step fails, generating a revised plan for the remaining steps based on current state
Context management: summarizing intermediate results to prevent context window overflow across many steps

Understanding these concepts is essential before choosing whether to use a multistep agent or a simpler reactive loop for a given problem.

🔢 Simple ReAct vs. Plan-and-Execute: Core Difference

Dimension	ReAct (Single-Step Loop)	Plan-and-Execute (Multistep)
Planning	None — LLM decides next action after each observation	LLM creates a full plan upfront (JSON array of steps)
LLM calls	One per action (tight feedback loop)	One for planning; one per step for execution
Best for	Short, open-ended tasks with unknown required steps	Long tasks with a knowable step structure
Failure handling	Adapts after each observation	Re-plan on step failure
Token cost	Lower per step	Higher plan call; lower execution calls

⚙️ The Plan-and-Execute Architecture

Goal: "Research the top 3 AI papers from last month, summarize each, and draft a blog post."

Phase 1 — Plan Call (one LLM call):

[
  { "step": 1, "action": "search", "args": ["top AI papers July 2025"] },
  { "step": 2, "action": "fetch_abstract", "args": ["paper_id_1"] },
  { "step": 3, "action": "summarize", "args": ["abstract_1"] },
  { "step": 4, "action": "fetch_abstract", "args": ["paper_id_2"] },
  { "step": 5, "action": "summarize", "args": ["abstract_2"] },
  { "step": 6, "action": "fetch_abstract", "args": ["paper_id_3"] },
  { "step": 7, "action": "summarize", "args": ["abstract_3"] },
  { "step": 8, "action": "write_post", "args": ["[summary_1, summary_2, summary_3]"] }
]

Phase 2 — Execution Loop (LLM only called when tool output needs reasoning):

flowchart TD
    Goal[Complex Goal] --> Planner["LLM Planner (one call  JSON plan)"]
    Planner --> Loop[Executor Loop]
    Loop --> Step["Execute Next Step (tool call or sub-LLM call)"]
    Step --> Check{Last step?}
    Check -->|No| Loop
    Check -->|Yes| Result[Final Result]
    Step -->|Failure| Replan[Re-plan remaining steps]
    Replan --> Loop

📊 Multistep Agent Execution Flow

Understanding how a multistep agent transitions between planning and execution is critical for debugging and designing reliable pipelines. The flow is not a simple linear sequence — it includes conditional branches for parallel execution and failure recovery. The diagram below captures the full lifecycle from receiving a goal to delivering a final aggregated result.

flowchart TD
    Goal[User Goal] --> Planner["LLM Planner (produce JSON step list)"]
    Planner --> Validate["Validate Plan (tool names, schema)"]
    Validate -->|Valid| Decompose[Decompose into Parallel + Sequential Steps]
    Validate -->|Invalid| Planner
    Decompose --> Parallel["Run Parallel Steps (independent actions)"]
    Decompose --> Sequential["Run Sequential Steps (dependent actions)"]
    Parallel --> Aggregate[Aggregate Intermediate Results]
    Sequential --> Aggregate
    Aggregate --> Check{All steps complete?}
    Check -->|Yes| Final[Final Result Delivered]
    Check -->|Step Failed| Replan[Re-plan Remaining Steps from Current State]
    Replan --> Decompose

This flow highlights two key design decisions: plan validation happens before any tool call (catching hallucinated tool names early), and aggregation combines outputs from parallel and sequential branches before the next dependent step begins. The re-plan path preserves completed step results — there is no need to restart from the beginning when a single step fails mid-execution.

🌍 Real-World Applications: Real-World Use Cases

Multistep agents are not theoretical constructs — they power several categories of real production systems today.

Automated Research Pipelines A research agent might receive a goal like "produce a competitive analysis of the top 5 CRM tools." The plan includes: search for the top 5 tools (step 1), fetch the pricing page for each tool (steps 2–6 in parallel), extract key features from each page (steps 7–11), compare them across a structured rubric (step 12), and produce a formatted markdown report (step 13). No human re-prompts the system between steps — the plan drives the full execution.

Coding Agents Agents like GitHub Copilot Workspace and Devin use multistep planning to handle tasks like "add OAuth login to this codebase." The plan might include: reading existing auth files, identifying integration points, writing new code files, updating configuration, running tests, and fixing any test failures. Each step has dependencies on prior outputs, making reactive single-step agents impractical.

Business Process Automation Invoice processing pipelines, HR onboarding workflows, and compliance audits can all be modeled as multistep plans. An invoice agent might: extract line items (step 1), validate against purchase orders (step 2), route exceptions for human review (step 3), and post approved invoices to the accounting system (step 4). The structured plan makes auditing and failure recovery straightforward.

Data Analysis Workflows Data science agents receive goals like "analyze Q3 sales data and identify the top 3 regional trends." The plan includes: querying the data warehouse, cleaning the dataset, computing regional aggregates, ranking by growth rate, and generating a narrative summary. Each step produces a structured artifact consumed by the next step — a pattern that maps directly to the plan-and-execute model.

In every case, the shared characteristic is a knowable step structure: the agent can enumerate the required actions before observing the results of any single action. This is the fundamental criterion for choosing a multistep agent over a reactive loop.

📊 Multi-Step Planning Sequence

sequenceDiagram
    participant U as User
    participant P as LLM Planner
    participant E as Executor
    participant T as Tools

    U->>P: Submit goal
    P->>P: Decompose into step list
    P-->>E: Plan [step1, step2, step3...]
    E->>T: Execute step 1
    T-->>E: Result 1
    E->>P: Observe result 1
    P->>P: Update plan if needed
    E->>T: Execute step 2
    T-->>E: Result 2
    E->>T: Execute step 3
    T-->>E: Result 3
    E-->>U: Aggregated final answer

This sequence diagram traces the communication flow between the four key participants in a multistep agent run: the user who submits the goal, the LLM Planner that decomposes it into a numbered step list, the Executor that drives each tool call, and the Tools that perform the actual operations. Notice that the Planner is consulted again after Step 1 completes — this models the optional re-planning trigger, where intermediate results can cause the agent to revise remaining steps before continuing. This feedback loop is what distinguishes plan-and-execute agents from a static batch of sequential tool calls.

📊 Planning Loop Decision Flow

flowchart TD
    Goal[Receive Goal]
    Decompose[LLM: Decompose into step list]
    Queue["Queue Steps (DAG order)"]
    Execute[Execute Next Step with Tools]
    UpdateState[Update Shared State with Result]
    Done{All Steps Complete?}
    Replan{Step Failed?}
    Final[Return Final Result]
    ReplanStep[Re-plan Remaining Steps from State]

    Goal --> Decompose --> Queue --> Execute --> UpdateState
    UpdateState --> Done
    Done -->|Yes| Final
    Done -->|No| Replan
    Replan -->|Yes| ReplanStep --> Queue
    Replan -->|No| Execute

This flowchart details the executor's decision logic during the step-by-step execution phase. After each step's result is written to shared state, the executor checks two conditions: whether all steps are complete, and whether the last step failed. A successful completion routes to the final result. A step failure triggers the re-planner, which generates a revised plan for the remaining steps using current accumulated state — avoiding a full restart. If neither condition is met, the executor simply advances to the next queued step, forming the core execution loop of any plan-and-execute agent.

🧪 Practical: Building a Simple Plan-and-Execute Agent

The fastest way to understand multistep agents is to trace how the planner and executor components interact in code. Below is a minimal Python example using LangChain's PlanAndExecute abstraction with custom tools.

Step 1 — Define tools and the LLM:

from langchain_openai import ChatOpenAI
from langchain.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for the given query."""
    return f"Results for: {query}"

@tool
def summarize_text(text: str) -> str:
    """Summarize the provided text."""
    return f"Summary of: {text[:50]}..."

tools = [search_web, summarize_text]
llm = ChatOpenAI(model="gpt-4o", temperature=0)

Step 2 — Wire up planner and executor:

from langchain_experimental.plan_and_execute import (
    PlanAndExecute, load_agent_executor, load_chat_planner
)

planner  = load_chat_planner(llm)                          # single LLM call that generates the JSON step list
executor = load_agent_executor(llm, tools, verbose=True)   # runs each step; verbose=True prints tool call traces
agent    = PlanAndExecute(planner=planner, executor=executor, verbose=True)

Step 3 — Invoke with a multi-step goal:

result = agent.invoke({
    "input": "Find the top 2 benefits of multistep AI agents and summarize them."
})
print(result["output"])

When you run this with verbose=True, you will see the planner emit a JSON step list, then the executor call each tool in sequence. This trace is invaluable for debugging — if a step produces an unexpected output, you can inspect exactly where the plan diverged from reality and adjust either the planner prompt or the tool implementation accordingly.

🛠️ LangGraph and AutoGen: How OSS Frameworks Implement Multistep Agent Orchestration

Two open-source frameworks have emerged as the leading ways to build multistep agents with explicit state management, and they take very different architectural approaches.

LangGraph is a graph-based agent orchestration library from LangChain that models an agent's execution as a directed graph of nodes (LLM calls, tool calls, or logic) and edges (conditional transitions). It gives you full control over state, supports cycles (loops), and is designed for durable, resumable workflows — making it the production-ready choice for complex multistep pipelines.

from langgraph.graph import StateGraph, END
from typing import TypedDict

# Define the shared state for the agent
class AgentState(TypedDict):
    task: str
    plan: list
    results: list
    current_step: int

# Build the graph
graph = StateGraph(AgentState)

# Add nodes: each is a function that reads and writes state
graph.add_node("planner", run_planner)      # LLM call → produces plan list
graph.add_node("executor", run_executor)    # Tool call → appends to results
graph.add_node("checker", check_completion) # Logic → routes to next step or END

# Wire the execution flow
graph.set_entry_point("planner")
graph.add_edge("planner", "executor")
graph.add_conditional_edges(
    "executor",
    lambda state: "checker" if state["current_step"] < len(state["plan"]) else END,
)
graph.add_edge("checker", "executor")

agent = graph.compile()
result = agent.invoke({"task": "Research top 3 AI papers and draft a summary", "plan": [], "results": [], "current_step": 0})

AutoGen (Microsoft) takes a conversation-centric approach — agents are modeled as participants in a multi-agent dialogue, where each agent has a role (AssistantAgent, UserProxyAgent) and can call tools or other agents as part of the conversation flow. It excels at collaborative multi-agent tasks where two or more LLMs reason together.

from autogen import AssistantAgent, UserProxyAgent

planner = AssistantAgent(
    name="Planner",
    system_message="You decompose goals into numbered steps and produce a JSON plan.",
    llm_config={"model": "gpt-4o"},
)

executor = UserProxyAgent(
    name="Executor",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "./workspace"},
)

# The executor drives the conversation; the planner produces the plan
executor.initiate_chat(
    planner,
    message="Research the top 3 AI papers from last month and summarize each.",
    max_turns=10,
)

Framework	Model	Best for
LangGraph	Graph of nodes + state	Durable pipelines, resumable workflows, conditional branching
AutoGen	Multi-agent conversation	Collaborative reasoning, code generation, peer-review loops

For a full deep-dive on LangGraph and AutoGen, dedicated follow-up posts are planned.

📚 Key Lessons

Plan validation is non-negotiable. Always validate the generated plan against the registered tool list and expected argument schema before execution begins. Catching hallucinated tool names at plan time costs one validation pass; catching them at execution time costs the tokens from all preceding steps plus a re-plan.
Re-plan from failure state, not from scratch. When step 4 of a 10-step plan fails, the results of steps 1–3 are valid and should be preserved. A well-designed executor passes the current completed-steps state back to the planner, which then generates a revised plan for steps 4–10 only. Restarting from step 1 wastes tokens and time.
Summarize intermediate results aggressively. Long intermediate outputs from tool calls accumulate in the context window. For plans with more than 5–6 steps, summarize each step's output before passing it to the next step. This prevents context overflow and keeps the LLM focused on the current step rather than retrieving irrelevant prior results.
Prefer parallel execution for independent steps. If steps 2, 3, and 4 each depend only on step 1's output (and not on each other), they can run in parallel. This reduces total wall-clock time significantly for I/O-bound steps like web fetches or database queries. The plan structure should encode this parallelism explicitly.
Choose multistep agents only for knowable-structure tasks. If the required steps cannot be enumerated without observing intermediate results, a ReAct loop is more appropriate. Multistep planning adds overhead — one extra LLM call for planning and the cognitive cost of maintaining a step list. Use it when the structure is known; use ReAct when it is not.

📌 TLDR: Summary & Key Takeaways

Multistep agents plan the full task structure upfront, then execute step by step with minimal LLM calls during execution.
Plan-and-Execute = one Planner LLM call → JSON step list → Executor loop using tools.
Best for tasks with a knowable structure (reports, research pipelines, automated workflows).
Failure handling: re-plan from failed step, not from scratch.
LangChain's PlanAndExecute wraps this pattern in a few lines of Python.

🧠 Deep Dive: LangChain Plan-and-Execute Agent

from langchain_experimental.plan_and_execute import (
    PlanAndExecute,
    load_agent_executor,
    load_chat_planner,
)
from langchain_openai import ChatOpenAI
from langchain_community.tools import WikipediaQueryRun, DuckDuckGoSearchRun

llm = ChatOpenAI(model="gpt-4o", temperature=0)  # temperature=0: deterministic planning produces more consistent JSON step lists
tools = [WikipediaQueryRun(), DuckDuckGoSearchRun()]

planner   = load_chat_planner(llm)   # generates the structured JSON plan from the goal
executor  = load_agent_executor(llm, tools, verbose=True)
agent     = PlanAndExecute(planner=planner, executor=executor)
# planner produces the step list once; executor calls tools for each step without re-running the planner

agent.invoke({
    "input": "Research the top 3 AI papers from last month, summarize each, and draft a blog post."
})

The planner produces a step list; the executor runs each step with access to tools.

🔬 Internals

Multi-step agents implement a Thought-Action-Observation loop: the LLM emits a reasoning trace, selects a tool call with arguments, receives the result, and appends it to context before the next step. The agent loop runs until a terminal condition (task complete, max steps, or error). Tool dispatch is typically handled by structured output parsing (JSON function calling) rather than free-text extraction, reducing parse failures.

⚡ Performance Analysis

A 5-step agent with GPT-4 averages 15–25 seconds end-to-end due to sequential LLM calls (~3s each) plus tool latency. Parallelizing independent tool calls (e.g., concurrent web searches) cuts wall time by 40–60%. Smaller orchestrator models (GPT-3.5 or 7B local) reduce per-step cost by 10–50× at the expense of ~15% more planning errors on complex tasks.

⚖️ Trade-offs & Failure Modes: When to Use Multistep Agents vs Simple Agents

Use Case	Simple ReAct	Multistep Plan-Execute
Q&A with a single tool lookup	✅	Overkill
Writing a report with 8 research steps	—	✅
Interactive conversation with user feedback	✅	Awkward
Automated pipeline with known step structure	—	✅
Debugging code with back-and-forth tool calls	✅	—

Critical failure modes for multistep agents:

Stale plan: If step 3 fails, steps 4-8 may be based on incorrect assumptions. Solution: re-plan from the failure point.
Context window overflow: 10-step plans with long intermediate outputs can exceed context length. Solution: summarize intermediate results before passing to the next step.
Hallucinated tool calls: LLM may plan to call a tool that doesn't exist. Solution: validate the plan against available tools before execution begins.

🧭 Decision Guide: ReAct vs. Plan-and-Execute

Situation	Use
Steps can be enumerated before starting	Plan-and-Execute
Each next step depends on observing prior results	ReAct
Task has 5+ interdependent actions	Plan-and-Execute
Interactive conversation with user feedback	ReAct
Automated pipeline with predictable structure	Plan-and-Execute

Article tools

Explain simpler Compare approaches What next?

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata