Home/Blog/Langchain/LangGraph Memory and State Persistence: Checkpointers, Threads, and Cross-Session Memory

LangchainAdvanced•18 min read•Mar 28, 2026

LangGraph Memory and State Persistence: Checkpointers, Threads, and Cross-Session Memory

Give LangGraph agents persistent memory: checkpointers, thread IDs, cross-session memory store, and context overflow strategies.

Abstract Algorithms

Helping engineers master software engineering topics.

> TLDR: Checkpointers + thread IDs give LangGraph agents persistent memory across turns and sessions.

📖 The Amnesia Problem: Why Stateless Agents Frustrate Users

Your customer support agent is on its third message with a user. The user says: "As I mentioned before, my order number is 9847." The agent replies: "Could you please provide your order number so I can look that up?"

The agent hasn't forgotten. It never knew. Every call to graph.invoke() without a checkpointer starts with a completely empty state — zero messages, zero context, no trace of anything said before. For a single-turn Q&A bot this is fine. For a support agent, a coding assistant, or anything expected to hold a coherent conversation, it is a product-killing bug.

The fix is one line at compile time:

from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)

# Now invoke with a thread ID — state persists across calls
config = {"configurable": {"thread_id": "support-session-9847"}}
graph.invoke({"messages": [HumanMessage(content="My order is 9847")]}, config=config)
graph.invoke({"messages": [HumanMessage(content="Where is it?")]}, config=config)
# Second invoke sees the full conversation history — order 9847 is already in state

The thread_id acts as a conversation namespace. The checkpointer snapshots the graph's state after every node execution. When you invoke again with the same thread ID, LangGraph loads the last snapshot and resumes exactly where it left off. The rest of this post explains how that works, which checkpointer to choose, and how to extend memory beyond a single conversation.

🔍 Memory Fundamentals: Short-Term State vs Long-Term Store vs External Memory

LangGraph offers three distinct memory layers, and conflating them is the most common source of confusion when building persistent agents.

Layer	Mechanism	Scope	Typical Use
Short-term (in-state)	`operator.add` accumulates messages in graph state	Current conversation thread	Multi-turn chat, tool call history
Long-term (Memory Store)	`InMemoryStore` / `AsyncPostgresStore`	Across all conversations	User preferences, past issues, facts
External memory	Your own DB queried by a tool node	Any	Full CRM history, documents, files

Short-term memory is built into how LangGraph state works. When you annotate a field with operator.add, new values are appended rather than overwritten. Your message list grows turn by turn, and the checkpointer persists that growing list between invoke() calls.

Long-term memory uses LangGraph's BaseStore interface. You explicitly store.put() facts you want to survive beyond one conversation, and store.search() them in future sessions. This is where you record "Alice prefers email updates" or "Bob's account was credited $20 last month."

External memory is anything you connect via tool nodes — a database query, a vector search, an API call. This is the most flexible but requires you to write and maintain the retrieval logic yourself.

The rest of this post focuses on layers one and two: how checkpointers power short-term persistence, and how the Memory Store handles long-term facts.

⚙️ Checkpointers: InMemorySaver, SqliteSaver, and PostgresSaver

A checkpointer is an object you attach at compile() time. After every node runs, LangGraph calls the checkpointer to write the current state to storage. On the next invoke() with the same thread ID, LangGraph calls the checkpointer to read the latest snapshot before the graph starts running.

LangGraph ships three checkpointers out of the box:

InMemorySaver — Development and Unit Tests

from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)

State lives in a Python dictionary in RAM. It is wiped when the process ends. Use this for local development, CI tests, and notebooks. Never use it in production — a server restart loses every conversation.

SqliteSaver — Single-Process Deployments

from langgraph.checkpoint.sqlite import SqliteSaver

# Context-manager handles connection lifecycle
with SqliteSaver.from_conn_string("./agent_memory.db") as checkpointer:
    graph = builder.compile(checkpointer=checkpointer)
    result = graph.invoke({"messages": [...]}, config=config)

State is persisted to a local SQLite file. Survives restarts. Works for prototypes, scripts, and single-worker web services. The catch: SQLite uses file-level locking, so multiple processes (or gunicorn workers) will deadlock or corrupt the database under concurrent load.

PostgresSaver — Production Multi-Instance Deployments

from langgraph.checkpoint.postgres import PostgresSaver

DB_URI = "postgresql://agent_user:secret@db-host:5432/agent_db"

with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
    # Run migrations on first deploy (creates checkpoints table)
    checkpointer.setup()
    graph = builder.compile(checkpointer=checkpointer)

State is persisted to a PostgreSQL table. Safe for concurrent workers, horizontally scalable, and compatible with connection poolers like PgBouncer. For async applications, use AsyncPostgresSaver with an asyncpg connection pool:

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
import asyncpg

async def build_graph():
    pool = await asyncpg.create_pool(DB_URI, min_size=2, max_size=10)
    checkpointer = AsyncPostgresSaver(pool)
    await checkpointer.setup()
    return builder.compile(checkpointer=checkpointer)

🧠 Deep Dive: How LangGraph Checkpointing Works Under the Hood

The Internals

When LangGraph compiles a graph with a checkpointer, it wraps every node execution in a checkpoint lifecycle:

Before the graph runs: LangGraph calls checkpointer.get_tuple(config). If a checkpoint exists for the given thread_id, it deserializes the state and uses it as the starting values. If not, the graph starts fresh.
After each node completes: LangGraph computes a state diff — only the fields that changed — and calls checkpointer.put(config, checkpoint, metadata, new_versions). This write is synchronous before the next node starts, guaranteeing that a crash between nodes is recoverable.
Checkpoint schema: Each checkpoint record stores thread_id, checkpoint_id (a UUID per step), parent_checkpoint_id, the serialized state blob, and a metadata dict (node name, timestamp, custom tags). This forms a linked list of snapshots — the full audit trail of every state the conversation passed through.

The thread_id is the primary namespace key. Every checkpoint row is keyed on (thread_id, checkpoint_id). Two conversations with different thread_id values are completely isolated even if they run on the same graph instance or the same database.

# The config dict is the entire addressing scheme for LangGraph persistence
config = {
    "configurable": {
        "thread_id": "user-alice-session-42",   # conversation namespace
        "checkpoint_id": None,                   # None = latest; set a UUID to time-travel
    }
}

Passing an explicit checkpoint_id lets you time-travel — replay the graph from any historical snapshot. This is how human-in-the-loop approval flows work: pause the graph, let a human approve, resume from the exact checkpoint before the approval gate.

Performance Analysis

Checkpointer	Write latency per step	Read on resume	Concurrent workers
InMemorySaver	~0.01 ms (dict write)	~0.01 ms	❌ Single process only
SqliteSaver	~1–5 ms (local file I/O)	~1–5 ms	❌ File-level lock
PostgresSaver (direct)	~5–15 ms (TCP round-trip)	~5–10 ms	✅ Row-level lock
AsyncPostgresSaver + pool	~3–8 ms (pooled conn)	~3–8 ms	✅ Best for async web

Storage growth rate is the hidden cost. Each node execution writes one checkpoint row. A 10-node graph running 1000 conversations per day generates ~10,000 checkpoint rows daily. For long-lived agents with dozens of turns, rows accumulate quickly. Mitigation strategies include:

TTL pruning: A nightly job that deletes checkpoints older than N days via DELETE FROM checkpoints WHERE created_at < NOW() - INTERVAL '30 days'.
Checkpoint compaction: Keep only the latest N checkpoints per thread (get_state_history + custom deletion logic).
Selective checkpointing: Tag certain node outputs as ephemeral and skip the write using LangGraph's checkpoint_during config flag (available in LangGraph ≥ 0.2).

Connection pooling with AsyncPostgresSaver is critical at scale. Without a pool, each graph invocation opens a new TCP connection to Postgres. At 100 concurrent conversations, that is 100+ open connections — well beyond the default Postgres max_connections of 100. A pool of 10 connections serving 100 concurrent invocations via async multiplexing is the standard production pattern.

📊 Thread Isolation and State Flow

The diagram below shows two users — Alice and Bob — both interacting with the same compiled graph. Their state never touches because thread_id keeps every checkpoint in its own namespace within the same checkpointer backend.

graph TD
    subgraph Alice[Thread: user-alice]
        A1([invoke turn 1]) --> A2[classify_intent node]
        A2 --> A3[fetch_order node]
        A3 --> A4[respond node]
        A4 --> A5([invoke turn 2 resumes from A4 snapshot])
    end

    subgraph Bob[Thread: user-bob]
        B1([invoke turn 1]) --> B2[classify_intent node]
        B2 --> B3[respond node]
        B3 --> B4([invoke turn 2 resumes from B3 snapshot])
    end

    CP[(Checkpointer PostgresSaver)]

    A4 -- "writes checkpoint thread=user-alice" --> CP
    B3 -- "writes checkpoint thread=user-bob" --> CP
    A5 -- "reads checkpoint thread=user-alice" --> CP
    B4 -- "reads checkpoint thread=user-bob" --> CP

Two conversations share one graph instance and one checkpointer backend, but their state is completely isolated by thread_id. Each arrow to the checkpointer is a separate row in the checkpoints table.

You can inspect the live state of any thread at any time without invoking the graph:

# Snapshot of Alice's current conversation state
state = graph.get_state({"configurable": {"thread_id": "user-alice"}})
print(state.values["messages"])   # full message list
print(state.next)                 # next node(s) that would run

# Full audit trail — every checkpoint in reverse-chronological order
history = list(graph.get_state_history({"configurable": {"thread_id": "user-alice"}}))
for snapshot in history:
    print(f"Step {snapshot.config['configurable']['checkpoint_id']}: "
          f"next={snapshot.next}, messages={len(snapshot.values['messages'])}")

This get_state_history call is invaluable for debugging: you can see exactly what state the graph held at every step, spot where a node introduced bad data, and replay from any prior checkpoint to test a fix.

🌍 Real-World Applications: How Production Agents Use Persistent Memory

Case Study 1 — E-Commerce Support Agent

An online retailer runs a LangGraph support agent with PostgresSaver. When a customer opens a chat:

Input: thread_id = f"customer-{customer_id}-{ticket_id}"
Process: The graph loads prior messages for this ticket. If the customer said "I already tried resetting my password" three turns ago, the agent routes to escalation rather than suggesting the same fix again.
Output: First-contact resolution rate improved ~18% in one deployment because agents stopped repeating questions.

The ticket thread ID encodes both customer and ticket, so a customer's second ticket starts fresh while their current ticket is fully resumable across browser refreshes and agent handoffs.

Case Study 2 — Async Human-in-the-Loop Approval

A finance team uses a LangGraph workflow that drafts expense reports and waits for a manager approval node before submitting:

# Agent drafts report, hits "awaiting_approval" interrupt node, and pauses
result = graph.invoke({"messages": [...]}, config=config)
# result.next == ("awaiting_approval",) — graph is suspended

# Hours later, manager approves via dashboard — graph resumes from exact checkpoint
graph.invoke(
    Command(resume={"approved": True, "approver": "manager@co.com"}),
    config=config
)

Without checkpointing, a multi-hour pause would require storing state externally and rebuilding it manually. With PostgresSaver, the graph's entire state — including all tool call results already computed — is available the moment the approval arrives.

⚖️ Trade-offs and Failure Modes: Storage Costs, Context Overflow, and Stale Memory

Storage Costs Compound Quickly

Every node in every turn writes a checkpoint row. A graph with 8 nodes handling 500 daily active users averaging 10 turns each generates 8 × 500 × 10 = 40,000 checkpoint writes per day. At ~2 KB per row (a typical message-heavy state), that is ~80 MB/day — manageable, but 2.4 GB/month before any cleanup. PostgreSQL JSONB compression helps, but plan for a pruning strategy from day one.

Context Window Overflow

The more you accumulate messages in state, the closer you creep to the LLM's context limit. A gpt-4o context window of 128k tokens sounds enormous — but a 50-turn support conversation with tool results can hit 40–60k tokens easily. Strategies:

trim_messages: Keep only the last N tokens of history (see 🛠️ section).
Summarization node: Replace old messages with a running summary, then continue with only the summary + recent messages.
Selective retention: Store only human/assistant turns in state; rebuild tool call history from the Memory Store on demand.

Stale Memory as a Liability

Long-term memory that is never updated becomes misleading. If Alice's preferred shipping address is stored in the Memory Store from 18 months ago, and she moved last week, the agent will confidently ship to the wrong address. Mitigation: add a last_updated field to every stored fact, and include a "verify preferences" flow for sessions where stored facts are more than N days old.

Failure Mode: Partial Checkpoint Corruption

If a node crashes after writing to an external API but before the checkpoint write completes, the graph will re-run that node on resume — potentially double-submitting an order. Mitigation: use idempotency keys in your external API calls (most payment and shipping APIs support this natively), so a re-run has no side effects.

🧭 Decision Guide: Which Checkpointer for Which Deployment

Situation	Recommendation
Use InMemorySaver when	Running tests, notebooks, demos, or any single-turn evaluation harness where state persistence across runs is unwanted noise
Use SqliteSaver when	Building a personal tool, local CLI agent, or a prototype with a single worker process and low concurrency (< 5 simultaneous conversations)
Use PostgresSaver when	Deploying to production with multiple workers, a web framework (FastAPI, Django), or any scenario requiring cross-process state sharing and concurrent users
Use AsyncPostgresSaver when	Your app uses async Python (asyncio, FastAPI async routes) — avoids blocking the event loop on checkpoint reads/writes
Edge case: serverless	Each Lambda/Cloud Run invocation is a fresh process — InMemorySaver loses state. You must use PostgresSaver or an equivalent remote store. DynamoDB-backed custom checkpointers exist in the community for AWS deployments.

🧪 Practical Example: Customer Support Agent With Cross-Session Memory

This example builds a customer support agent that uses both memory layers at the same time: a PostgresSaver checkpointer for short-term in-thread state, and a MemoryStore for user facts that persist across completely separate sessions. The support scenario is the right choice here because it generates exactly the two events the post's memory model is built around — a returning user whose preferences are already stored, and a resolved issue that must be written back for next time. As you read through the nodes, watch how load_user_profile reads from the Store before the first message is even processed, and how save_resolved_issue writes back to the Store at the end — those two calls are the cross-session memory pattern in its simplest production form.

import operator
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.store.memory import InMemoryStore

# ── State schema ────────────────────────────────────────────────────────────
class SupportState(TypedDict):
    messages: Annotated[list[BaseMessage], operator.add]  # accumulates across turns
    user_id: str
    user_profile: dict  # loaded once per session from Memory Store

# ── Shared long-term store ───────────────────────────────────────────────────
store = InMemoryStore()  # swap for AsyncPostgresStore in production

# ── Nodes ────────────────────────────────────────────────────────────────────
llm = ChatOpenAI(model="gpt-4o-mini")

def load_user_profile(state: SupportState) -> dict:
    """Pull cross-session facts from the Memory Store on first turn."""
    user_id = state["user_id"]
    results = store.search(("user_profiles", user_id))
    profile = results[0].value if results else {"name": "there", "past_issues": []}
    return {"user_profile": profile}

def respond(state: SupportState) -> dict:
    """Generate a reply using full conversation history + profile context."""
    profile = state.get("user_profile", {})
    system = SystemMessage(content=(
        f"You are a helpful support agent. The user's name is {profile.get('name', 'there')}. "
        f"Past issues they reported: {profile.get('past_issues', [])}. "
        "Be concise and helpful. If the issue is resolved, say 'Issue resolved.'"
    ))
    reply = llm.invoke([system] + state["messages"])
    return {"messages": [reply]}

def save_resolved_issue(state: SupportState) -> dict:
    """If resolved, write the issue summary to the Memory Store for future sessions."""
    last_msg = state["messages"][-1].content
    if "issue resolved" in last_msg.lower():
        user_id = state["user_id"]
        existing = store.search(("user_profiles", user_id))
        profile = existing[0].value if existing else {"name": state["user_profile"].get("name", ""), "past_issues": []}
        # Summarise the issue from the first human message
        first_human = next((m.content for m in state["messages"] if isinstance(m, HumanMessage)), "")
        profile["past_issues"].append(first_human[:120])
        store.put(("user_profiles", user_id), "preferences", profile)
    return {}

# ── Graph assembly ────────────────────────────────────────────────────────────
builder = StateGraph(SupportState)
builder.add_node("load_profile", load_user_profile)
builder.add_node("respond", respond)
builder.add_node("save_resolved", save_resolved_issue)

builder.set_entry_point("load_profile")
builder.add_edge("load_profile", "respond")
builder.add_edge("respond", "save_resolved")
builder.add_edge("save_resolved", END)

# ── Compile with checkpointer ─────────────────────────────────────────────────
with SqliteSaver.from_conn_string("./support_memory.db") as checkpointer:
    graph = builder.compile(checkpointer=checkpointer)

    user_config = {"configurable": {"thread_id": "alice-ticket-001"}}

    # Turn 1 — new user
    graph.invoke(
        {"messages": [HumanMessage(content="Hi, I'm Alice. My order #9847 hasn't arrived.")],
         "user_id": "alice", "user_profile": {}},
        config=user_config
    )

    # Turn 2 — same thread, state resumes automatically; no need to pass user_id again
    result = graph.invoke(
        {"messages": [HumanMessage(content="It was supposed to arrive 3 days ago.")]},
        config=user_config
    )
    print(result["messages"][-1].content)

What happens on Turn 2: LangGraph loads the checkpoint from Turn 1 (which already has Alice's first message and the assistant's reply). The new HumanMessage is appended via operator.add. The load_profile node runs again but finds the existing profile in the store. The LLM sees the full 3-message history and can reference the order number Alice mentioned in Turn 1 — no amnesia.

What happens in a new session (e.g., Alice returns next week): A new thread_id starts fresh state, but load_profile finds Alice's stored past_issues in the Memory Store and surfaces them in the system prompt — giving the agent cross-session context without carrying old message history into the new context window.

🛠️ LangGraph Memory Store: Long-Term Semantic Memory Beyond the Conversation

The BaseStore interface is LangGraph's answer to memory that outlives any single thread. It is a key-value store organized into namespaces (tuples of strings) with keys (string identifiers) and values (arbitrary dicts).

from langgraph.store.memory import InMemoryStore

store = InMemoryStore()

# Write a user preference (namespace, key, value)
store.put(("user_profiles", "alice"), "preferences", {
    "language": "en",
    "contact_channel": "email",
    "last_order": "9847"
})

# Read it back in any future session
results = store.search(("user_profiles", "alice"))
if results:
    prefs = results[0].value
    print(prefs["contact_channel"])   # "email"

# Update — put() is upsert, same namespace+key overwrites
store.put(("user_profiles", "alice"), "preferences", {
    **prefs,
    "last_order": "10234"
})

For production, replace InMemoryStore with AsyncPostgresStore:

from langgraph.store.postgres.aio import AsyncPostgresStore

store = AsyncPostgresStore.from_conn_string(DB_URI)
await store.setup()   # creates the store table on first run

Message Trimming to Prevent Context Overflow

As conversations grow, trim old messages before passing them to the LLM. LangChain's trim_messages utility handles this cleanly:

from langchain_core.messages import trim_messages

def respond_with_trim(state: SupportState) -> dict:
    """Trim message history to last 4000 tokens before invoking the LLM."""
    trimmed = trim_messages(
        state["messages"],
        max_tokens=4000,
        strategy="last",          # keep the most recent messages
        token_counter=llm,        # uses the model's tokenizer for accurate counts
        include_system=True,      # always keep the system message
        allow_partial=False,      # never cut a message mid-sentence
    )
    reply = llm.invoke(trimmed)
    return {"messages": [reply]}

For very long-running agents, combine trimming with a summarization node that periodically condenses old messages into a single summary AIMessage, reducing history length while preserving context continuity.

📚 Lessons Learned

1. Pick your checkpointer before you write any agent code. Switching checkpointers later means migrating stored state — InMemorySaver checkpoints cannot be imported into SqliteSaver. Decide on PostgresSaver for anything that will eventually run in production, even during early development with a local Postgres container.

2. Thread IDs are your API. Your thread_id naming convention determines how conversations are isolated. "user-{id}" collapses all of a user's conversations into one thread (continuous memory but no fresh starts). "user-{id}-session-{ts}" creates a new context per session. "ticket-{id}" isolates by support ticket. Choose based on the UX you want — it is hard to change later.

3. Don't store everything in the Memory Store. It sounds tempting to persist every message, but the Memory Store is not a chat history database. It is for facts — things a human would write in a notepad about another person. Use a purpose-built conversation database (or just the checkpoint history) for raw message logs.

4. Always implement a pruning strategy before you go to production. Checkpoint tables grow fast. An ON DELETE CASCADE TTL job is a single cron entry that saves you from an unexpectedly full disk at 3 AM.

5. Test failure recovery explicitly. Simulate a node crash mid-graph and verify that a fresh invoke() with the same thread_id resumes correctly without side effects. LangGraph's resume semantics are reliable, but your external API calls must be idempotent for this to be safe end-to-end.

📌 TLDR: Summary and Key Takeaways

The amnesia problem is real: without a checkpointer, every graph.invoke() starts with empty state — no memory of prior turns.
operator.add on the messages field accumulates conversation history within a thread; this is the foundation of multi-turn agents.
Choose your checkpointer by deployment target: InMemorySaver for tests, SqliteSaver for single-process prototypes, PostgresSaver / AsyncPostgresSaver for production multi-worker services.
Thread IDs are conversation namespaces. The same compiled graph serves thousands of isolated conversations simultaneously; config = {"configurable": {"thread_id": "..."}} is the entire routing key.
get_state and get_state_history give you full visibility into any conversation's past — essential for debugging and for human-in-the-loop approval flows.
The Memory Store (InMemoryStore / AsyncPostgresStore) bridges conversations: persist cross-session facts with store.put() and retrieve them with store.search().
Context overflow is inevitable in long-running agents — plan for trim_messages or a summarization node from the start, not as an afterthought.

The one-liner to remember: a checkpointer turns a graph into a conversation; a Memory Store turns a conversation into a relationship.

AI-generated article quiz

Test your understanding

🧠

Ready to test what you just learned?

Generate four focused questions from this article. Answers include immediate explanations.

Guided series path

Agentic AI: LangChain and LangGraph

View all lessons →

Lesson 4 of 16

← Previous lessonHuman-in-the-Loop Workflows with LangGraph: Interrupts, Approvals, and Async ExecutionIntermediate · 18 min Next lesson →LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom ToolsIntermediate · 18 min

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Article metadata