How to Develop Apps Using LangChain and LLMs
LangChain is the glue that connects LLMs to your data. We explain Chains, Prompts, and Agents, and how to build your first app.
Abstract AlgorithmsAI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: LangChain is a framework that simplifies building LLM applications. It provides abstractions for Chains (linking steps), Memory (remembering chat history), and Agents (using tools). It turns raw API calls into composable building blocks.
TLDR: LangChain chains, agents, and memory turn raw LLM calls into composable applications — from retrieval-augmented Q&A to multi-step autonomous agents.
📖 Lego Bricks for LLM Apps
Before we explain how LangChain works, here is what it looks like in practice. This five-line chain translates text to French — prompt template, LLM call, and output parsing wired together:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = ChatPromptTemplate.from_template("Translate to French: {text}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()
print(chain.invoke({"text": "Hello, how are you?"}))
# → "Bonjour, comment allez-vous ?"
That | pipe — connecting a prompt template, an LLM, and an output parser — is LangChain's core abstraction. You will understand every part of that line by the end of this guide.
Building with the raw OpenAI API means writing the same boilerplate endlessly: formatting prompts, managing conversation history, parsing outputs, calling tools when needed.
LangChain is the Lego set — pre-assembled pieces (prompt templates, memory stores, output parsers, tool wrappers) that snap together so you can focus on logic rather than plumbing.
| Raw API | LangChain |
| Manual string formatting | ChatPromptTemplate |
| Manual history appending | ConversationBufferMemory |
| Manual tool calling logic | AgentExecutor |
| Manual output parsing | StrOutputParser, JsonOutputParser |
🔍 Core Concepts: What Makes LangChain Different
Raw LLM APIs hand you a hammer and leave you to build the house. Every call is stateless — the model forgets everything the moment you hang up. You must manually format prompt strings, append conversation history to each request, parse the model's text output into structured data, and wire up tool calls yourself. For a one-off script that's fine; for a production chatbot or document Q&A system it becomes hundreds of lines of brittle glue.
LangChain solves this through three architectural layers:
| Layer | What it does |
| Core | Abstract base classes: Runnable, BasePromptTemplate, BaseChatMemory, BaseTool |
| Community | 100+ pre-built integrations: OpenAI, Anthropic, Chroma, FAISS, Wikipedia, SQL, and more |
| LangSmith | Hosted tracing and evaluation — records every prompt, response, tool call, and token cost |
The glue holding Core together is LCEL (LangChain Expression Language). The | pipe operator creates a lazy, inspectable pipeline:
chain = prompt | model | parser # nothing runs yet
chain.invoke({"text": "hello"}) # pipeline executes here
Every component — prompt template, chat model, output parser, retriever — implements the same Runnable protocol: .invoke() for a single call, .stream() for token-by-token output, and .batch() for parallel requests. This uniform interface means you can swap any piece without rewriting the pipeline.
🔢 The Three Core Abstractions
A. Chains — Linking Steps
A Chain connects: User Input → Prompt Template → LLM → Output Parser.
The | operator in LCEL (LangChain Expression Language) pipes the output of one step into the next:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to French: {text}")
chain = prompt | model | StrOutputParser()
result = chain.invoke({"text": "Hello, how are you?"})
# "Bonjour, comment allez-vous ?"
Chains are composable — the output of chain can be piped into another chain.
📊 LangChain Chain Flow
sequenceDiagram
participant U as User
participant PT as PromptTemplate
participant L as LLM
participant OP as OutputParser
U->>PT: Input variables
PT->>L: Formatted prompt
L->>OP: Raw LLM output
OP-->>U: Parsed response
The sequence diagram shows the four-step data flow inside a basic LCEL chain. User-provided variables flow into the PromptTemplate, which formats them into a complete prompt and forwards it to the LLM; the LLM's raw text output then passes to the OutputParser, which transforms it into the final structured response. The key takeaway: every component exposes the same Runnable interface, so swapping any piece — for example, replacing StrOutputParser with JsonOutputParser — requires only a one-word change without touching the rest of the pipeline.
B. Memory — State Across Turns
LLMs are stateless: each API call starts fresh. LangChain's Memory objects inject conversation history into the next prompt automatically.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
conversation = ConversationChain(llm=model, memory=memory)
conversation.predict(input="My name is Alice.")
conversation.predict(input="What is my name?")
# "Your name is Alice."
| Memory Type | Keeps | Best For |
ConversationBufferMemory | Full history | Short sessions |
ConversationSummaryMemory | LLM-generated summary | Long sessions |
ConversationBufferWindowMemory | Last N turns | Chatbots with context limit |
C. Agents — LLMs That Use Tools
An Agent is an LLM that can decide which tools to call based on the user's question.
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
tools = [WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"input": "What is the boiling point of mercury?"})
# Agent calls Wikipedia → reads result → returns answer
The Agent loop:
flowchart TD
Q[User Question] --> LLM[LLM: Choose Action]
LLM -->|calls tool| Tool[Tool (Wikipedia, Calculator, DB)]
Tool --> Observation[Observation (result)]
Observation --> LLM
LLM -->|has enough info| Answer[Final Answer]
⚙️ Building a RAG Pipeline with LangChain
Retrieval-Augmented Generation (RAG) is the most common real-world LangChain pattern: load documents → embed them → retrieve relevant chunks → answer with context.
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Load and split documents
loader = TextLoader("my_docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # ~500 tokens: focused enough for precise retrieval, large enough to preserve sentence context
chunk_overlap=50, # 10% overlap so sentences split across boundaries appear in both adjacent chunks
)
chunks = splitter.split_documents(docs)
# 2. Embed and store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
# 3. Build the QA chain
qa = RetrievalQA.from_chain_type(
llm=model,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}) # k=4: retrieve 4 chunks — balances context richness vs token budget
)
qa.invoke("What is the refund policy?")
flowchart LR
Q[User Question]
Embed[Embed Question]
VDB[Vector Store (Chroma/FAISS)]
Chunks[Top-K Chunks]
LLM[LLM + Context]
A[Answer]
Q --> Embed --> VDB --> Chunks --> LLM --> A
The diagram shows the five-step runtime flow of a RAG query. The user's question is first embedded into a vector, which is compared against the vector store (Chroma or FAISS) to retrieve the top-K semantically similar document chunks; those chunks are then injected as context into the LLM call, which produces the final grounded answer. The takeaway: RAG lets the LLM answer questions about your private documents without any fine-tuning — the retrieval step scopes the model's attention to the relevant content at query time.
📊 LangChain RAG Chain
flowchart TD
Q[User Query] --> E[Embed Query]
E --> VDB[Vector Store]
VDB --> K[Top-K Docs]
K --> PT[PromptTemplate]
PT --> LLM[LLM Call]
LLM --> ANS[Answer]
This diagram maps the same RAG flow onto named LangChain components. The query is embedded, passed to the Vector Store for nearest-neighbour lookup, and the top-K documents feed a PromptTemplate that structures the context around the question before the LLM call produces the answer. Comparing this diagram with the code above shows exactly how each LangChain class maps to one node in the pipeline — making it straightforward to swap the vector store or retriever without touching the LLM or prompt logic.
🧠 Deep Dive: LangSmith Observability for LLM Chains
In production, you need to debug why a chain produced a wrong answer. LangSmith (LangChain's tracing backend) records every step:
- Which prompt was sent.
- What the LLM returned.
- Which tool was called and with what arguments.
- Total latency and token cost per step.
Enable tracing:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_key"
All chain invocations are now automatically traced.
🔬 Internals
LangChain chains implement the Runnable protocol: each component exposes .invoke(), .stream(), and .batch() interfaces, enabling composition via the | pipe operator (LCEL). Memory backends (ConversationBufferMemory, VectorStoreRetrieverMemory) serialize conversation history into the prompt context at each turn. Callbacks propagate through the chain hierarchy, allowing logging, tracing, and token counting without modifying chain logic.
⚡ Performance Analysis
A simple LLM chain (prompt + model) adds ~5–10ms overhead over direct API calls due to LCEL composition. RAG chains with ChromaDB retrieval typically add 20–80ms for vector search on <1M documents. Streaming with .stream() cuts time-to-first-token from 2–3s to <200ms for long outputs, dramatically improving perceived responsiveness.
⚖️ Trade-offs & Failure Modes: LangChain
| Benefit | Risk |
| Rapid prototyping with composable building blocks | Adds abstraction layers that can obscure errors |
| Built-in integrations (100+ LLMs, vector stores, tools) | Version churn — API changes frequently |
| Memory management out of the box | Token cost grows if memory strategy is not tuned |
| Tracing via LangSmith | Production overhead if not carefully sampled |
When to skip LangChain: If your use case is a single LLM call with a fixed prompt, the raw API (OpenAI SDK) is simpler and more debuggable. LangChain pays off when you have multi-step chains, conditional tool use, or complex memory strategies.
📊 Decision Guide: LangChain Application Architecture
A multi-turn agent application wires together every abstraction from the sections above. User input arrives, Memory retrieves prior conversation turns and injects them into the ChatPromptTemplate, the filled prompt is sent to the LLM, and the LLM either calls a Tool or produces a final answer that flows through an Output Parser.
flowchart TD
Input[User Input]
Memory[Memory (ConversationBufferMemory)]
Template[ChatPromptTemplate (system + history + user)]
LLM[LLM (ChatOpenAI)]
Parser[Output Parser (StrOutputParser / JsonOutputParser)]
Tools[Tools (search, calculator, DB)]
Agent{Agent Decision: use tool or answer?}
Answer[Final Answer]
Input --> Template
Memory --> Template
Template --> LLM
LLM --> Agent
Agent -->|needs tool| Tools
Tools --> LLM
Agent -->|has answer| Parser
Parser --> Answer
The loop between LLM → Agent → Tools → LLM may iterate several times before the agent decides it has enough information to produce a final answer. AgentExecutor enforces a max_iterations limit to prevent runaway loops, and handle_parsing_errors=True lets the agent recover from malformed tool-call outputs without crashing the entire pipeline.
🧭 Decision Guide: When to Use Which LangChain Pattern
Use this reference to select the right LangChain pattern for your use case.
| Pattern | Use When | Avoid When |
| Simple LLM chain | Single-turn, low complexity | Multi-step reasoning required |
| RAG chain | Knowledge grounding needed | Small context, no external docs |
| Agent | Multi-step, tool use required | Latency-sensitive hot paths |
| LangGraph | Complex state, conditional branching | Simple linear pipelines |
When in doubt, start with the simplest chain that meets your requirements and evolve toward agents only when your task genuinely requires multi-step planning with external tools.
🌍 Real-World Applications of LangChain
LangChain's composable architecture maps cleanly onto a wide range of production use cases. The table below shows which components carry the load in each scenario and what to watch for in production:
| Application Type | LangChain Components Used | Production Consideration |
| Chat with documents (RAG) | TextLoader, RecursiveCharacterTextSplitter, OpenAIEmbeddings, Chroma, RetrievalQA | Chunk size and overlap tuning — too large wastes tokens; too small loses context |
| Customer service bot | ConversationChain, ConversationBufferWindowMemory, AgentExecutor | Memory window size vs. token budget; escalation path when agent confidence is low |
| Code generation assistant | ChatPromptTemplate (system: "You are an expert Python developer"), StrOutputParser | Output validation — pipe results through a linter or test runner before showing to user |
| SQL generator | SQLDatabaseChain, SQLDatabase, custom prompt with schema | Always run queries in read-only mode; validate SQL before execution |
| Research assistant agent | AgentExecutor, WikipediaQueryRun, ArxivQueryRun, ConversationSummaryMemory | Long sessions accumulate cost — use ConversationSummaryMemory to compress history |
| Content moderation pipeline | Sequential LCEL chain: classifier → reviewer → decision parser | Add confidence threshold check; route low-confidence results to human review queue |
When NOT to use LangChain: If your entire application is a single, fixed-prompt LLM call with no memory and no tool use, the raw OpenAI (or Anthropic) SDK is simpler, more transparent, and easier to debug. LangChain's abstractions earn their overhead only when you have multi-step pipelines, state management, or conditional tool use.
🧪 Practical Exercises
Work through these three exercises in order — each one builds on the previous, adding a new LangChain capability. The sequence mirrors the three core abstractions covered in this post: LCEL chain composition (Exercise 1), stateful memory (Exercise 2), and autonomous tool use (Exercise 3). Focus on the | pipe composition in Exercise 1, on how Memory automatically injects history into prompts in Exercise 2, and on which tool the agent selects and why in Exercise 3.
Exercise 1 — Build an LCEL Translation Pipeline
Create a chain that translates text to a target language, then scale it with .batch():
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template(
"Translate the following to {language}: {text}"
)
chain = prompt | model | StrOutputParser()
# Single call
print(chain.invoke({"language": "French", "text": "Good morning!"}))
# Batch — 5 sentences in parallel
sentences = [{"language": "Spanish", "text": s} for s in
["Hello", "Thank you", "Goodbye", "How are you?", "See you later"]]
results = chain.batch(sentences)
print(results)
Exercise 2 — Add Memory to a Conversation
Wrap a model in ConversationChain with ConversationBufferMemory and verify it remembers a name across three turns:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
memory = ConversationBufferMemory()
conv = ConversationChain(llm=model, memory=memory)
conv.predict(input="My name is Alice.")
conv.predict(input="I work at a robotics startup.")
response = conv.predict(input="What is my name and where do I work?")
print(response) # Should mention both Alice and the robotics startup
Exercise 3 — Build a Tool-Using Agent
Give an agent a Calculator and a Wikipedia tool, then observe which tool it selects for different question types:
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_community.tools import WikipediaQueryRun
from langchain.tools import tool
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression))
tools = [calculator, WikipediaQueryRun()]
agent = create_openai_tools_agent(model, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, max_iterations=5)
executor.invoke({"input": "What is 847 * 293?"}) # uses calculator
executor.invoke({"input": "Who invented the telephone?"}) # uses Wikipedia
🎯 What to Study Next
- RAG with LangChain and ChromaDB
- AI Agents Explained: When LLMs Start Using Tools
- Prompt Engineering Guide: Zero-Shot to Chain-of-Thought
🛠️ LangChain and LangGraph: From LCEL Chains to Stateful Multi-Step Agents
LangChain (the framework introduced throughout this post) provides the LCEL | pipe syntax, ChatPromptTemplate, RunnablePassthrough, and built-in memory/retrieval primitives. LangGraph is LangChain's extension for stateful, cyclical agent graphs — it models agent loops as explicit nodes and edges, replacing the opaque AgentExecutor with a transparent state machine you can inspect and debug.
How they solve the problem in this post: The snippet below shows three patterns: (1) an LCEL chain with RunnablePassthrough passing context alongside transformed values, (2) a legacy LLMChain for comparison, and (3) a minimal LangGraph agent that loops a tool call until the LLM decides it is done.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# ─── Pattern 1: LCEL chain with RunnablePassthrough ──────────────────────────
# RunnablePassthrough forwards the original input alongside a transformed field
prompt = ChatPromptTemplate.from_template(
"Summarise this in one sentence: {text}\nThen answer: {question}"
)
chain = (
{"text": RunnablePassthrough(), "question": lambda _: "What is the main topic?"}
| prompt
| llm
| StrOutputParser()
)
result = chain.invoke("The Transformer architecture replaced RNNs for NLP tasks in 2017.")
print(result)
# → "The Transformer architecture revolutionised NLP by replacing RNNs. Main topic: Transformers."
# ─── Pattern 2: Legacy LLMChain (still supported, but LCEL preferred) ────────
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
legacy_chain = LLMChain(
llm=llm,
prompt=PromptTemplate.from_template("Translate to Spanish: {text}")
)
print(legacy_chain.run("Hello, world!")) # → "¡Hola, mundo!"
# ─── Pattern 3: Minimal LangGraph stateful agent with a tool loop ─────────────
# pip install langgraph
from langgraph.graph import StateGraph, END
from langchain_core.tools import tool
from typing import TypedDict, Annotated
import operator
@tool
def word_count(text: str) -> str:
"""Count words in the provided text."""
return str(len(text.split()))
# Agent state: accumulates messages across turns
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
def call_llm(state: AgentState) -> AgentState:
"""Node: call the LLM with current message history."""
llm_with_tools = llm.bind_tools([word_count])
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
"""Edge: if LLM called a tool, route to tool node; else finish."""
last = state["messages"][-1]
return "tools" if last.tool_calls else END
def call_tools(state: AgentState) -> AgentState:
"""Node: execute any tool calls the LLM requested."""
from langchain_core.messages import ToolMessage
last = state["messages"][-1]
results = []
for call in last.tool_calls:
output = word_count.invoke(call["args"])
results.append(ToolMessage(content=output, tool_call_id=call["id"]))
return {"messages": results}
# Build the graph
graph = StateGraph(AgentState)
graph.add_node("llm", call_llm)
graph.add_node("tools", call_tools)
graph.set_entry_point("llm")
graph.add_conditional_edges("llm", should_continue)
graph.add_edge("tools", "llm") # loop back after tool execution
app = graph.compile()
# Run the agent
from langchain_core.messages import HumanMessage
output = app.invoke({"messages": [HumanMessage("How many words in: 'The quick brown fox'?")]})
print(output["messages"][-1].content)
# → "The phrase 'The quick brown fox' contains 4 words."
RunnablePassthrough is the key LCEL primitive for injecting context that bypasses transformation — essential for RAG pipelines where you want both the retrieved context AND the original query flowing forward simultaneously. LangGraph's explicit node/edge model gives you full observability over each loop iteration — something AgentExecutor hid entirely.
For a full deep-dive on LangGraph stateful agents and multi-tool orchestration, a dedicated follow-up post is planned.
📚 Key Lessons from Building with LangChain
LCEL chains are lazy — design for it. The
|expression only builds the pipeline; nothing executes until.invoke(),.stream(), or.batch()is called. This enables streaming tokens to a UI and parallel batch processing without any code changes.Choose memory type by session length.
ConversationBufferMemoryis simple and accurate for short sessions (< ~10 turns). For long conversations, switch toConversationSummaryMemory— it compresses history with an LLM call, keeping token usage bounded at the cost of some fidelity.LangSmith is non-negotiable in production. When a chain produces a wrong answer, you can't debug it from the final output alone. LangSmith records every intermediate prompt, LLM response, and tool call — without it you're flying blind.
Always set
AgentExecutorsafety limits. Unconstrained agents can loop indefinitely on ambiguous inputs, burning tokens and money. Always setmax_iterations(e.g.,10) andhandle_parsing_errors=Trueto recover from malformed tool outputs gracefully.Prefer LCEL over legacy chain classes for new code.
LLMChainandConversationChainare in maintenance mode. LCEL chains (built with|) are the future-proof API — they support streaming, batching, async, and composition natively, and they integrate directly with LangSmith tracing.
📌 TLDR: Summary & Key Takeaways
- Memory: Inject conversation history automatically. Choose the right memory type for session length.
- Agents: LLMs that call tools in a loop until they have enough information to answer.
- RAG: Load → chunk → embed → retrieve → answer. The most common production pattern.
- LangSmith: Trace every chain step for debugging and cost analysis.
🔗 Related Posts
- RAG with LangChain and ChromaDB
- Mastering Prompt Templates with LangChain
- AI Agents Explained: When LLMs Start Using Tools
- Prompt Engineering Guide: Zero-Shot to Chain-of-Thought

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)
TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...
Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs
TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...
Watermarking and Late Data Handling in Spark Structured Streaming
TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
