Home/Blog/Ai/How to Develop Apps Using LangChain and LLMs

AiAdvanced•17 min read•Mar 9, 2026

How to Develop Apps Using LangChain and LLMs

LangChain is the glue that connects LLMs to your data. We explain Chains, Prompts, and Agents, and how to build your first app.

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: LangChain is a framework that simplifies building LLM applications. It provides abstractions for Chains (linking steps), Memory (remembering chat history), and Agents (using tools). It turns raw API calls into composable building blocks.

TLDR: LangChain chains, agents, and memory turn raw LLM calls into composable applications — from retrieval-augmented Q&A to multi-step autonomous agents.

📖 Lego Bricks for LLM Apps

Before we explain how LangChain works, here is what it looks like in practice. This five-line chain translates text to French — prompt template, LLM call, and output parsing wired together:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = ChatPromptTemplate.from_template("Translate to French: {text}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()
print(chain.invoke({"text": "Hello, how are you?"}))
# → "Bonjour, comment allez-vous ?"

That | pipe — connecting a prompt template, an LLM, and an output parser — is LangChain's core abstraction. You will understand every part of that line by the end of this guide.

Building with the raw OpenAI API means writing the same boilerplate endlessly: formatting prompts, managing conversation history, parsing outputs, calling tools when needed.

LangChain is the Lego set — pre-assembled pieces (prompt templates, memory stores, output parsers, tool wrappers) that snap together so you can focus on logic rather than plumbing.

Raw API	LangChain
Manual string formatting	`ChatPromptTemplate`
Manual history appending	`ConversationBufferMemory`
Manual tool calling logic	`AgentExecutor`
Manual output parsing	`StrOutputParser`, `JsonOutputParser`

🔍 Core Concepts: What Makes LangChain Different

Raw LLM APIs hand you a hammer and leave you to build the house. Every call is stateless — the model forgets everything the moment you hang up. You must manually format prompt strings, append conversation history to each request, parse the model's text output into structured data, and wire up tool calls yourself. For a one-off script that's fine; for a production chatbot or document Q&A system it becomes hundreds of lines of brittle glue.

LangChain solves this through three architectural layers:

Layer	What it does
Core	Abstract base classes: `Runnable`, `BasePromptTemplate`, `BaseChatMemory`, `BaseTool`
Community	100+ pre-built integrations: OpenAI, Anthropic, Chroma, FAISS, Wikipedia, SQL, and more
LangSmith	Hosted tracing and evaluation — records every prompt, response, tool call, and token cost

The glue holding Core together is LCEL (LangChain Expression Language). The | pipe operator creates a lazy, inspectable pipeline:

chain = prompt | model | parser   # nothing runs yet
chain.invoke({"text": "hello"})   # pipeline executes here

Every component — prompt template, chat model, output parser, retriever — implements the same Runnable protocol: .invoke() for a single call, .stream() for token-by-token output, and .batch() for parallel requests. This uniform interface means you can swap any piece without rewriting the pipeline.

🔢 The Three Core Abstractions

A. Chains — Linking Steps

A Chain connects: User Input → Prompt Template → LLM → Output Parser.

The | operator in LCEL (LangChain Expression Language) pipes the output of one step into the next:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to French: {text}")
chain = prompt | model | StrOutputParser()

result = chain.invoke({"text": "Hello, how are you?"})
# "Bonjour, comment allez-vous ?"

Chains are composable — the output of chain can be piped into another chain.

📊 LangChain Chain Flow

sequenceDiagram
    participant U as User
    participant PT as PromptTemplate
    participant L as LLM
    participant OP as OutputParser
    U->>PT: Input variables
    PT->>L: Formatted prompt
    L->>OP: Raw LLM output
    OP-->>U: Parsed response

The sequence diagram shows the four-step data flow inside a basic LCEL chain. User-provided variables flow into the PromptTemplate, which formats them into a complete prompt and forwards it to the LLM; the LLM's raw text output then passes to the OutputParser, which transforms it into the final structured response. The key takeaway: every component exposes the same Runnable interface, so swapping any piece — for example, replacing StrOutputParser with JsonOutputParser — requires only a one-word change without touching the rest of the pipeline.

B. Memory — State Across Turns

LLMs are stateless: each API call starts fresh. LangChain's Memory objects inject conversation history into the next prompt automatically.

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=model, memory=memory)

conversation.predict(input="My name is Alice.")
conversation.predict(input="What is my name?")
# "Your name is Alice."

Memory Type	Keeps	Best For
`ConversationBufferMemory`	Full history	Short sessions
`ConversationSummaryMemory`	LLM-generated summary	Long sessions
`ConversationBufferWindowMemory`	Last N turns	Chatbots with context limit

C. Agents — LLMs That Use Tools

An Agent is an LLM that can decide which tools to call based on the user's question.

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun

tools = [WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

executor.invoke({"input": "What is the boiling point of mercury?"})
# Agent calls Wikipedia → reads result → returns answer

The Agent loop:

flowchart TD
    Q[User Question] --> LLM[LLM: Choose Action]
    LLM -->|calls tool| Tool["Tool (Wikipedia, Calculator, DB)"]
    Tool --> Observation["Observation (result)"]
    Observation --> LLM
    LLM -->|has enough info| Answer[Final Answer]

⚙️ Building a RAG Pipeline with LangChain

Retrieval-Augmented Generation (RAG) is the most common real-world LangChain pattern: load documents → embed them → retrieve relevant chunks → answer with context.

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

# 1. Load and split documents
loader = TextLoader("my_docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,    # ~500 tokens: focused enough for precise retrieval, large enough to preserve sentence context
    chunk_overlap=50,  # 10% overlap so sentences split across boundaries appear in both adjacent chunks
)
chunks = splitter.split_documents(docs)

# 2. Embed and store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())

# 3. Build the QA chain
qa = RetrievalQA.from_chain_type(
    llm=model,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4})  # k=4: retrieve 4 chunks — balances context richness vs token budget
)

qa.invoke("What is the refund policy?")

flowchart LR
    Q[User Question]
    Embed[Embed Question]
    VDB["Vector Store (Chroma/FAISS)"]
    Chunks[Top-K Chunks]
    LLM[LLM + Context]
    A[Answer]

    Q --> Embed --> VDB --> Chunks --> LLM --> A

The diagram shows the five-step runtime flow of a RAG query. The user's question is first embedded into a vector, which is compared against the vector store (Chroma or FAISS) to retrieve the top-K semantically similar document chunks; those chunks are then injected as context into the LLM call, which produces the final grounded answer. The takeaway: RAG lets the LLM answer questions about your private documents without any fine-tuning — the retrieval step scopes the model's attention to the relevant content at query time.

📊 LangChain RAG Chain

flowchart TD
    Q[User Query] --> E[Embed Query]
    E --> VDB[Vector Store]
    VDB --> K[Top-K Docs]
    K --> PT[PromptTemplate]
    PT --> LLM[LLM Call]
    LLM --> ANS[Answer]

This diagram maps the same RAG flow onto named LangChain components. The query is embedded, passed to the Vector Store for nearest-neighbour lookup, and the top-K documents feed a PromptTemplate that structures the context around the question before the LLM call produces the answer. Comparing this diagram with the code above shows exactly how each LangChain class maps to one node in the pipeline — making it straightforward to swap the vector store or retriever without touching the LLM or prompt logic.

🧠 Deep Dive: LangSmith Observability for LLM Chains

In production, you need to debug why a chain produced a wrong answer. LangSmith (LangChain's tracing backend) records every step:

Which prompt was sent.
What the LLM returned.
Which tool was called and with what arguments.
Total latency and token cost per step.

Enable tracing:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_key"

All chain invocations are now automatically traced.

🔬 Internals

LangChain chains implement the Runnable protocol: each component exposes .invoke(), .stream(), and .batch() interfaces, enabling composition via the | pipe operator (LCEL). Memory backends (ConversationBufferMemory, VectorStoreRetrieverMemory) serialize conversation history into the prompt context at each turn. Callbacks propagate through the chain hierarchy, allowing logging, tracing, and token counting without modifying chain logic.

⚡ Performance Analysis

A simple LLM chain (prompt + model) adds ~5–10ms overhead over direct API calls due to LCEL composition. RAG chains with ChromaDB retrieval typically add 20–80ms for vector search on <1M documents. Streaming with .stream() cuts time-to-first-token from 2–3s to <200ms for long outputs, dramatically improving perceived responsiveness.

⚖️ Trade-offs & Failure Modes: LangChain

Benefit	Risk
Rapid prototyping with composable building blocks	Adds abstraction layers that can obscure errors
Built-in integrations (100+ LLMs, vector stores, tools)	Version churn — API changes frequently
Memory management out of the box	Token cost grows if memory strategy is not tuned
Tracing via LangSmith	Production overhead if not carefully sampled

When to skip LangChain: If your use case is a single LLM call with a fixed prompt, the raw API (OpenAI SDK) is simpler and more debuggable. LangChain pays off when you have multi-step chains, conditional tool use, or complex memory strategies.

📊 Decision Guide: LangChain Application Architecture

A multi-turn agent application wires together every abstraction from the sections above. User input arrives, Memory retrieves prior conversation turns and injects them into the ChatPromptTemplate, the filled prompt is sent to the LLM, and the LLM either calls a Tool or produces a final answer that flows through an Output Parser.

flowchart TD
    Input[User Input]
    Memory["Memory (ConversationBufferMemory)"]
    Template["ChatPromptTemplate (system + history + user)"]
    LLM["LLM (ChatOpenAI)"]
    Parser["Output Parser (StrOutputParser / JsonOutputParser)"]
    Tools["Tools (search, calculator, DB)"]
    Agent{Agent Decision: use tool or answer?}
    Answer[Final Answer]

    Input --> Template
    Memory --> Template
    Template --> LLM
    LLM --> Agent
    Agent -->|needs tool| Tools
    Tools --> LLM
    Agent -->|has answer| Parser
    Parser --> Answer

The loop between LLM → Agent → Tools → LLM may iterate several times before the agent decides it has enough information to produce a final answer. AgentExecutor enforces a max_iterations limit to prevent runaway loops, and handle_parsing_errors=True lets the agent recover from malformed tool-call outputs without crashing the entire pipeline.

🧭 Decision Guide: When to Use Which LangChain Pattern

Use this reference to select the right LangChain pattern for your use case.

Pattern	Use When	Avoid When
Simple LLM chain	Single-turn, low complexity	Multi-step reasoning required
RAG chain	Knowledge grounding needed	Small context, no external docs
Agent	Multi-step, tool use required	Latency-sensitive hot paths
LangGraph	Complex state, conditional branching	Simple linear pipelines

When in doubt, start with the simplest chain that meets your requirements and evolve toward agents only when your task genuinely requires multi-step planning with external tools.

🌍 Real-World Applications of LangChain

LangChain's composable architecture maps cleanly onto a wide range of production use cases. The table below shows which components carry the load in each scenario and what to watch for in production:

Application Type	LangChain Components Used	Production Consideration
Chat with documents (RAG)	`TextLoader`, `RecursiveCharacterTextSplitter`, `OpenAIEmbeddings`, `Chroma`, `RetrievalQA`	Chunk size and overlap tuning — too large wastes tokens; too small loses context
Customer service bot	`ConversationChain`, `ConversationBufferWindowMemory`, `AgentExecutor`	Memory window size vs. token budget; escalation path when agent confidence is low
Code generation assistant	`ChatPromptTemplate` (system: "You are an expert Python developer"), `StrOutputParser`	Output validation — pipe results through a linter or test runner before showing to user
SQL generator	`SQLDatabaseChain`, `SQLDatabase`, custom prompt with schema	Always run queries in read-only mode; validate SQL before execution
Research assistant agent	`AgentExecutor`, `WikipediaQueryRun`, `ArxivQueryRun`, `ConversationSummaryMemory`	Long sessions accumulate cost — use `ConversationSummaryMemory` to compress history
Content moderation pipeline	Sequential LCEL chain: classifier → reviewer → decision parser	Add confidence threshold check; route low-confidence results to human review queue

When NOT to use LangChain: If your entire application is a single, fixed-prompt LLM call with no memory and no tool use, the raw OpenAI (or Anthropic) SDK is simpler, more transparent, and easier to debug. LangChain's abstractions earn their overhead only when you have multi-step pipelines, state management, or conditional tool use.

🧪 Practical Exercises

Work through these three exercises in order — each one builds on the previous, adding a new LangChain capability. The sequence mirrors the three core abstractions covered in this post: LCEL chain composition (Exercise 1), stateful memory (Exercise 2), and autonomous tool use (Exercise 3). Focus on the | pipe composition in Exercise 1, on how Memory automatically injects history into prompts in Exercise 2, and on which tool the agent selects and why in Exercise 3.

Exercise 1 — Build an LCEL Translation Pipeline

Create a chain that translates text to a target language, then scale it with .batch():

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template(
    "Translate the following to {language}: {text}"
)
chain = prompt | model | StrOutputParser()

# Single call
print(chain.invoke({"language": "French", "text": "Good morning!"}))

# Batch — 5 sentences in parallel
sentences = [{"language": "Spanish", "text": s} for s in
             ["Hello", "Thank you", "Goodbye", "How are you?", "See you later"]]
results = chain.batch(sentences)
print(results)

Exercise 2 — Add Memory to a Conversation

Wrap a model in ConversationChain with ConversationBufferMemory and verify it remembers a name across three turns:

from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
conv = ConversationChain(llm=model, memory=memory)

conv.predict(input="My name is Alice.")
conv.predict(input="I work at a robotics startup.")
response = conv.predict(input="What is my name and where do I work?")
print(response)  # Should mention both Alice and the robotics startup

Exercise 3 — Build a Tool-Using Agent

Give an agent a Calculator and a Wikipedia tool, then observe which tool it selects for different question types:

from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
from langchain.tools import tool

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

tools = [calculator, WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=5)

executor.invoke({"input": "What is 847 * 293?"})          # uses calculator
executor.invoke({"input": "Who invented the telephone?"}) # uses Wikipedia

🎯 What to Study Next

🛠️ LangChain and LangGraph: From LCEL Chains to Stateful Multi-Step Agents

LangChain (the framework introduced throughout this post) provides the LCEL | pipe syntax, ChatPromptTemplate, RunnablePassthrough, and built-in memory/retrieval primitives. LangGraph is LangChain's extension for stateful, cyclical agent graphs — it models agent loops as explicit nodes and edges, replacing the opaque AgentExecutor with a transparent state machine you can inspect and debug.

How they solve the problem in this post: The snippet below shows three patterns: (1) an LCEL chain with RunnablePassthrough passing context alongside transformed values, (2) a legacy LLMChain for comparison, and (3) a minimal LangGraph agent that loops a tool call until the LLM decides it is done.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# ─── Pattern 1: LCEL chain with RunnablePassthrough ──────────────────────────
# RunnablePassthrough forwards the original input alongside a transformed field
prompt = ChatPromptTemplate.from_template(
    "Summarise this in one sentence: {text}\nThen answer: {question}"
)

chain = (
    {"text": RunnablePassthrough(), "question": lambda _: "What is the main topic?"}
    | prompt
    | llm
    | StrOutputParser()
)

result = chain.invoke("The Transformer architecture replaced RNNs for NLP tasks in 2017.")
print(result)
# → "The Transformer architecture revolutionised NLP by replacing RNNs. Main topic: Transformers."

# ─── Pattern 2: Legacy LLMChain (still supported, but LCEL preferred) ────────
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

legacy_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate.from_template("Translate to Spanish: {text}")
)
print(legacy_chain.run("Hello, world!"))  # → "¡Hola, mundo!"

# ─── Pattern 3: Minimal LangGraph stateful agent with a tool loop ─────────────
# pip install langgraph
from langgraph.graph import StateGraph, END
from langchain_core.tools import tool
from typing import TypedDict, Annotated
import operator

@tool
def word_count(text: str) -> str:
    """Count words in the provided text."""
    return str(len(text.split()))

# Agent state: accumulates messages across turns
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]

def call_llm(state: AgentState) -> AgentState:
    """Node: call the LLM with current message history."""
    llm_with_tools = llm.bind_tools([word_count])
    response = llm_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: AgentState) -> str:
    """Edge: if LLM called a tool, route to tool node; else finish."""
    last = state["messages"][-1]
    return "tools" if last.tool_calls else END

def call_tools(state: AgentState) -> AgentState:
    """Node: execute any tool calls the LLM requested."""
    from langchain_core.messages import ToolMessage
    last = state["messages"][-1]
    results = []
    for call in last.tool_calls:
        output = word_count.invoke(call["args"])
        results.append(ToolMessage(content=output, tool_call_id=call["id"]))
    return {"messages": results}

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("llm",   call_llm)
graph.add_node("tools", call_tools)
graph.set_entry_point("llm")
graph.add_conditional_edges("llm", should_continue)
graph.add_edge("tools", "llm")   # loop back after tool execution
app = graph.compile()

# Run the agent
from langchain_core.messages import HumanMessage
output = app.invoke({"messages": [HumanMessage("How many words in: 'The quick brown fox'?")]})
print(output["messages"][-1].content)
# → "The phrase 'The quick brown fox' contains 4 words."

RunnablePassthrough is the key LCEL primitive for injecting context that bypasses transformation — essential for RAG pipelines where you want both the retrieved context AND the original query flowing forward simultaneously. LangGraph's explicit node/edge model gives you full observability over each loop iteration — something AgentExecutor hid entirely.

For a full deep-dive on LangGraph stateful agents and multi-tool orchestration, a dedicated follow-up post is planned.

📚 Key Lessons from Building with LangChain

LCEL chains are lazy — design for it. The | expression only builds the pipeline; nothing executes until .invoke(), .stream(), or .batch() is called. This enables streaming tokens to a UI and parallel batch processing without any code changes.
Choose memory type by session length. ConversationBufferMemory is simple and accurate for short sessions (< ~10 turns). For long conversations, switch to ConversationSummaryMemory — it compresses history with an LLM call, keeping token usage bounded at the cost of some fidelity.
LangSmith is non-negotiable in production. When a chain produces a wrong answer, you can't debug it from the final output alone. LangSmith records every intermediate prompt, LLM response, and tool call — without it you're flying blind.
Always set AgentExecutor safety limits. Unconstrained agents can loop indefinitely on ambiguous inputs, burning tokens and money. Always set max_iterations (e.g., 10) and handle_parsing_errors=True to recover from malformed tool outputs gracefully.
Prefer LCEL over legacy chain classes for new code. LLMChain and ConversationChain are in maintenance mode. LCEL chains (built with |) are the future-proof API — they support streaming, batching, async, and composition natively, and they integrate directly with LangSmith tracing.

📌 TLDR: Summary & Key Takeaways

Memory: Inject conversation history automatically. Choose the right memory type for session length.
Agents: LLMs that call tools in a loop until they have enough information to answer.
RAG: Load → chunk → embed → retrieve → answer. The most common production pattern.
LangSmith: Trace every chain step for debugging and cost analysis.

Article tools

Explain simpler Compare approaches What next?

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata