LangGraph Memory and State Persistence: Checkpointers, Threads, and Cross-Session Memory
Give LangGraph agents persistent memory: checkpointers, thread IDs, cross-session memory store, and context overflow strategies.
Abstract AlgorithmsTLDR: Checkpointers + thread IDs give LangGraph agents persistent memory across turns and sessions.
π The Amnesia Problem: Why Stateless Agents Frustrate Users
Your customer support agent is on its third message with a user. The user says: "As I mentioned before, my order number is 9847." The agent replies: "Could you please provide your order number so I can look that up?"
The agent hasn't forgotten. It never knew. Every call to graph.invoke() without a checkpointer starts with a completely empty state β zero messages, zero context, no trace of anything said before. For a single-turn Q&A bot this is fine. For a support agent, a coding assistant, or anything expected to hold a coherent conversation, it is a product-killing bug.
The fix is one line at compile time:
from langgraph.checkpoint.memory import InMemorySaver
checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)
# Now invoke with a thread ID β state persists across calls
config = {"configurable": {"thread_id": "support-session-9847"}}
graph.invoke({"messages": [HumanMessage(content="My order is 9847")]}, config=config)
graph.invoke({"messages": [HumanMessage(content="Where is it?")]}, config=config)
# Second invoke sees the full conversation history β order 9847 is already in state
The thread_id acts as a conversation namespace. The checkpointer snapshots the graph's state after every node execution. When you invoke again with the same thread ID, LangGraph loads the last snapshot and resumes exactly where it left off. The rest of this post explains how that works, which checkpointer to choose, and how to extend memory beyond a single conversation.
π Memory Fundamentals: Short-Term State vs Long-Term Store vs External Memory
LangGraph offers three distinct memory layers, and conflating them is the most common source of confusion when building persistent agents.
| Layer | Mechanism | Scope | Typical Use |
| Short-term (in-state) | operator.add accumulates messages in graph state | Current conversation thread | Multi-turn chat, tool call history |
| Long-term (Memory Store) | InMemoryStore / AsyncPostgresStore | Across all conversations | User preferences, past issues, facts |
| External memory | Your own DB queried by a tool node | Any | Full CRM history, documents, files |
Short-term memory is built into how LangGraph state works. When you annotate a field with operator.add, new values are appended rather than overwritten. Your message list grows turn by turn, and the checkpointer persists that growing list between invoke() calls.
Long-term memory uses LangGraph's BaseStore interface. You explicitly store.put() facts you want to survive beyond one conversation, and store.search() them in future sessions. This is where you record "Alice prefers email updates" or "Bob's account was credited $20 last month."
External memory is anything you connect via tool nodes β a database query, a vector search, an API call. This is the most flexible but requires you to write and maintain the retrieval logic yourself.
The rest of this post focuses on layers one and two: how checkpointers power short-term persistence, and how the Memory Store handles long-term facts.
βοΈ Checkpointers: InMemorySaver, SqliteSaver, and PostgresSaver
A checkpointer is an object you attach at compile() time. After every node runs, LangGraph calls the checkpointer to write the current state to storage. On the next invoke() with the same thread ID, LangGraph calls the checkpointer to read the latest snapshot before the graph starts running.
LangGraph ships three checkpointers out of the box:
InMemorySaver β Development and Unit Tests
from langgraph.checkpoint.memory import InMemorySaver
checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)
State lives in a Python dictionary in RAM. It is wiped when the process ends. Use this for local development, CI tests, and notebooks. Never use it in production β a server restart loses every conversation.
SqliteSaver β Single-Process Deployments
from langgraph.checkpoint.sqlite import SqliteSaver
# Context-manager handles connection lifecycle
with SqliteSaver.from_conn_string("./agent_memory.db") as checkpointer:
graph = builder.compile(checkpointer=checkpointer)
result = graph.invoke({"messages": [...]}, config=config)
State is persisted to a local SQLite file. Survives restarts. Works for prototypes, scripts, and single-worker web services. The catch: SQLite uses file-level locking, so multiple processes (or gunicorn workers) will deadlock or corrupt the database under concurrent load.
PostgresSaver β Production Multi-Instance Deployments
from langgraph.checkpoint.postgres import PostgresSaver
DB_URI = "postgresql://agent_user:secret@db-host:5432/agent_db"
with PostgresSaver.from_conn_string(DB_URI) as checkpointer:
# Run migrations on first deploy (creates checkpoints table)
checkpointer.setup()
graph = builder.compile(checkpointer=checkpointer)
State is persisted to a PostgreSQL table. Safe for concurrent workers, horizontally scalable, and compatible with connection poolers like PgBouncer. For async applications, use AsyncPostgresSaver with an asyncpg connection pool:
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
import asyncpg
async def build_graph():
pool = await asyncpg.create_pool(DB_URI, min_size=2, max_size=10)
checkpointer = AsyncPostgresSaver(pool)
await checkpointer.setup()
return builder.compile(checkpointer=checkpointer)
π§ Deep Dive: How LangGraph Checkpointing Works Under the Hood
The Internals
When LangGraph compiles a graph with a checkpointer, it wraps every node execution in a checkpoint lifecycle:
- Before the graph runs: LangGraph calls
checkpointer.get_tuple(config). If a checkpoint exists for the giventhread_id, it deserializes the state and uses it as the starting values. If not, the graph starts fresh. - After each node completes: LangGraph computes a state diff β only the fields that changed β and calls
checkpointer.put(config, checkpoint, metadata, new_versions). This write is synchronous before the next node starts, guaranteeing that a crash between nodes is recoverable. - Checkpoint schema: Each checkpoint record stores
thread_id,checkpoint_id(a UUID per step),parent_checkpoint_id, the serialized state blob, and a metadata dict (node name, timestamp, custom tags). This forms a linked list of snapshots β the full audit trail of every state the conversation passed through.
The thread_id is the primary namespace key. Every checkpoint row is keyed on (thread_id, checkpoint_id). Two conversations with different thread_id values are completely isolated even if they run on the same graph instance or the same database.
# The config dict is the entire addressing scheme for LangGraph persistence
config = {
"configurable": {
"thread_id": "user-alice-session-42", # conversation namespace
"checkpoint_id": None, # None = latest; set a UUID to time-travel
}
}
Passing an explicit checkpoint_id lets you time-travel β replay the graph from any historical snapshot. This is how human-in-the-loop approval flows work: pause the graph, let a human approve, resume from the exact checkpoint before the approval gate.
Performance Analysis
| Checkpointer | Write latency per step | Read on resume | Concurrent workers |
| InMemorySaver | ~0.01 ms (dict write) | ~0.01 ms | β Single process only |
| SqliteSaver | ~1β5 ms (local file I/O) | ~1β5 ms | β File-level lock |
| PostgresSaver (direct) | ~5β15 ms (TCP round-trip) | ~5β10 ms | β Row-level lock |
| AsyncPostgresSaver + pool | ~3β8 ms (pooled conn) | ~3β8 ms | β Best for async web |
Storage growth rate is the hidden cost. Each node execution writes one checkpoint row. A 10-node graph running 1000 conversations per day generates ~10,000 checkpoint rows daily. For long-lived agents with dozens of turns, rows accumulate quickly. Mitigation strategies include:
- TTL pruning: A nightly job that deletes checkpoints older than N days via
DELETE FROM checkpoints WHERE created_at < NOW() - INTERVAL '30 days'. - Checkpoint compaction: Keep only the latest N checkpoints per thread (
get_state_history+ custom deletion logic). - Selective checkpointing: Tag certain node outputs as ephemeral and skip the write using LangGraph's
checkpoint_duringconfig flag (available in LangGraph β₯ 0.2).
Connection pooling with AsyncPostgresSaver is critical at scale. Without a pool, each graph invocation opens a new TCP connection to Postgres. At 100 concurrent conversations, that is 100+ open connections β well beyond the default Postgres max_connections of 100. A pool of 10 connections serving 100 concurrent invocations via async multiplexing is the standard production pattern.
π Thread Isolation and State Flow
The diagram below shows two users β Alice and Bob β both interacting with the same compiled graph. Their state never touches because thread_id keeps every checkpoint in its own namespace within the same checkpointer backend.
graph TD
subgraph Alice ["Thread: user-alice"]
A1([invoke turn 1]) --> A2[classify_intent node]
A2 --> A3[fetch_order node]
A3 --> A4[respond node]
A4 --> A5([invoke turn 2\nresumes from A4 snapshot])
end
subgraph Bob ["Thread: user-bob"]
B1([invoke turn 1]) --> B2[classify_intent node]
B2 --> B3[respond node]
B3 --> B4([invoke turn 2\nresumes from B3 snapshot])
end
CP[(Checkpointer\nPostgresSaver)]
A4 -- "writes checkpoint\nthread=user-alice" --> CP
B3 -- "writes checkpoint\nthread=user-bob" --> CP
A5 -- "reads checkpoint\nthread=user-alice" --> CP
B4 -- "reads checkpoint\nthread=user-bob" --> CP
Two conversations share one graph instance and one checkpointer backend, but their state is completely isolated by thread_id. Each arrow to the checkpointer is a separate row in the checkpoints table.
You can inspect the live state of any thread at any time without invoking the graph:
# Snapshot of Alice's current conversation state
state = graph.get_state({"configurable": {"thread_id": "user-alice"}})
print(state.values["messages"]) # full message list
print(state.next) # next node(s) that would run
# Full audit trail β every checkpoint in reverse-chronological order
history = list(graph.get_state_history({"configurable": {"thread_id": "user-alice"}}))
for snapshot in history:
print(f"Step {snapshot.config['configurable']['checkpoint_id']}: "
f"next={snapshot.next}, messages={len(snapshot.values['messages'])}")
This get_state_history call is invaluable for debugging: you can see exactly what state the graph held at every step, spot where a node introduced bad data, and replay from any prior checkpoint to test a fix.
π Real-World Applications: How Production Agents Use Persistent Memory
Case Study 1 β E-Commerce Support Agent
An online retailer runs a LangGraph support agent with PostgresSaver. When a customer opens a chat:
- Input:
thread_id = f"customer-{customer_id}-{ticket_id}" - Process: The graph loads prior messages for this ticket. If the customer said "I already tried resetting my password" three turns ago, the agent routes to escalation rather than suggesting the same fix again.
- Output: First-contact resolution rate improved ~18% in one deployment because agents stopped repeating questions.
The ticket thread ID encodes both customer and ticket, so a customer's second ticket starts fresh while their current ticket is fully resumable across browser refreshes and agent handoffs.
Case Study 2 β Async Human-in-the-Loop Approval
A finance team uses a LangGraph workflow that drafts expense reports and waits for a manager approval node before submitting:
# Agent drafts report, hits "awaiting_approval" interrupt node, and pauses
result = graph.invoke({"messages": [...]}, config=config)
# result.next == ("awaiting_approval",) β graph is suspended
# Hours later, manager approves via dashboard β graph resumes from exact checkpoint
graph.invoke(
Command(resume={"approved": True, "approver": "manager@co.com"}),
config=config
)
Without checkpointing, a multi-hour pause would require storing state externally and rebuilding it manually. With PostgresSaver, the graph's entire state β including all tool call results already computed β is available the moment the approval arrives.
βοΈ Trade-offs and Failure Modes: Storage Costs, Context Overflow, and Stale Memory
Storage Costs Compound Quickly
Every node in every turn writes a checkpoint row. A graph with 8 nodes handling 500 daily active users averaging 10 turns each generates 8 Γ 500 Γ 10 = 40,000 checkpoint writes per day. At ~2 KB per row (a typical message-heavy state), that is ~80 MB/day β manageable, but 2.4 GB/month before any cleanup. PostgreSQL JSONB compression helps, but plan for a pruning strategy from day one.
Context Window Overflow
The more you accumulate messages in state, the closer you creep to the LLM's context limit. A gpt-4o context window of 128k tokens sounds enormous β but a 50-turn support conversation with tool results can hit 40β60k tokens easily. Strategies:
trim_messages: Keep only the last N tokens of history (see π οΈ section).- Summarization node: Replace old messages with a running summary, then continue with only the summary + recent messages.
- Selective retention: Store only human/assistant turns in state; rebuild tool call history from the Memory Store on demand.
Stale Memory as a Liability
Long-term memory that is never updated becomes misleading. If Alice's preferred shipping address is stored in the Memory Store from 18 months ago, and she moved last week, the agent will confidently ship to the wrong address. Mitigation: add a last_updated field to every stored fact, and include a "verify preferences" flow for sessions where stored facts are more than N days old.
Failure Mode: Partial Checkpoint Corruption
If a node crashes after writing to an external API but before the checkpoint write completes, the graph will re-run that node on resume β potentially double-submitting an order. Mitigation: use idempotency keys in your external API calls (most payment and shipping APIs support this natively), so a re-run has no side effects.
π§ Decision Guide: Which Checkpointer for Which Deployment
| Situation | Recommendation |
| Use InMemorySaver when | Running tests, notebooks, demos, or any single-turn evaluation harness where state persistence across runs is unwanted noise |
| Use SqliteSaver when | Building a personal tool, local CLI agent, or a prototype with a single worker process and low concurrency (< 5 simultaneous conversations) |
| Use PostgresSaver when | Deploying to production with multiple workers, a web framework (FastAPI, Django), or any scenario requiring cross-process state sharing and concurrent users |
| Use AsyncPostgresSaver when | Your app uses async Python (asyncio, FastAPI async routes) β avoids blocking the event loop on checkpoint reads/writes |
| Edge case: serverless | Each Lambda/Cloud Run invocation is a fresh process β InMemorySaver loses state. You must use PostgresSaver or an equivalent remote store. DynamoDB-backed custom checkpointers exist in the community for AWS deployments. |
π§ͺ Practical Example: Customer Support Agent With Cross-Session Memory
This example builds a customer support agent that uses both memory layers at the same time: a PostgresSaver checkpointer for short-term in-thread state, and a MemoryStore for user facts that persist across completely separate sessions. The support scenario is the right choice here because it generates exactly the two events the post's memory model is built around β a returning user whose preferences are already stored, and a resolved issue that must be written back for next time. As you read through the nodes, watch how load_user_profile reads from the Store before the first message is even processed, and how save_resolved_issue writes back to the Store at the end β those two calls are the cross-session memory pattern in its simplest production form.
import operator
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, SystemMessage
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.store.memory import InMemoryStore
# ββ State schema ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
class SupportState(TypedDict):
messages: Annotated[list[BaseMessage], operator.add] # accumulates across turns
user_id: str
user_profile: dict # loaded once per session from Memory Store
# ββ Shared long-term store βββββββββββββββββββββββββββββββββββββββββββββββββββ
store = InMemoryStore() # swap for AsyncPostgresStore in production
# ββ Nodes ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
llm = ChatOpenAI(model="gpt-4o-mini")
def load_user_profile(state: SupportState) -> dict:
"""Pull cross-session facts from the Memory Store on first turn."""
user_id = state["user_id"]
results = store.search(("user_profiles", user_id))
profile = results[0].value if results else {"name": "there", "past_issues": []}
return {"user_profile": profile}
def respond(state: SupportState) -> dict:
"""Generate a reply using full conversation history + profile context."""
profile = state.get("user_profile", {})
system = SystemMessage(content=(
f"You are a helpful support agent. The user's name is {profile.get('name', 'there')}. "
f"Past issues they reported: {profile.get('past_issues', [])}. "
"Be concise and helpful. If the issue is resolved, say 'Issue resolved.'"
))
reply = llm.invoke([system] + state["messages"])
return {"messages": [reply]}
def save_resolved_issue(state: SupportState) -> dict:
"""If resolved, write the issue summary to the Memory Store for future sessions."""
last_msg = state["messages"][-1].content
if "issue resolved" in last_msg.lower():
user_id = state["user_id"]
existing = store.search(("user_profiles", user_id))
profile = existing[0].value if existing else {"name": state["user_profile"].get("name", ""), "past_issues": []}
# Summarise the issue from the first human message
first_human = next((m.content for m in state["messages"] if isinstance(m, HumanMessage)), "")
profile["past_issues"].append(first_human[:120])
store.put(("user_profiles", user_id), "preferences", profile)
return {}
# ββ Graph assembly ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
builder = StateGraph(SupportState)
builder.add_node("load_profile", load_user_profile)
builder.add_node("respond", respond)
builder.add_node("save_resolved", save_resolved_issue)
builder.set_entry_point("load_profile")
builder.add_edge("load_profile", "respond")
builder.add_edge("respond", "save_resolved")
builder.add_edge("save_resolved", END)
# ββ Compile with checkpointer βββββββββββββββββββββββββββββββββββββββββββββββββ
with SqliteSaver.from_conn_string("./support_memory.db") as checkpointer:
graph = builder.compile(checkpointer=checkpointer)
user_config = {"configurable": {"thread_id": "alice-ticket-001"}}
# Turn 1 β new user
graph.invoke(
{"messages": [HumanMessage(content="Hi, I'm Alice. My order #9847 hasn't arrived.")],
"user_id": "alice", "user_profile": {}},
config=user_config
)
# Turn 2 β same thread, state resumes automatically; no need to pass user_id again
result = graph.invoke(
{"messages": [HumanMessage(content="It was supposed to arrive 3 days ago.")]},
config=user_config
)
print(result["messages"][-1].content)
What happens on Turn 2: LangGraph loads the checkpoint from Turn 1 (which already has Alice's first message and the assistant's reply). The new HumanMessage is appended via operator.add. The load_profile node runs again but finds the existing profile in the store. The LLM sees the full 3-message history and can reference the order number Alice mentioned in Turn 1 β no amnesia.
What happens in a new session (e.g., Alice returns next week): A new thread_id starts fresh state, but load_profile finds Alice's stored past_issues in the Memory Store and surfaces them in the system prompt β giving the agent cross-session context without carrying old message history into the new context window.
π οΈ LangGraph Memory Store: Long-Term Semantic Memory Beyond the Conversation
The BaseStore interface is LangGraph's answer to memory that outlives any single thread. It is a key-value store organized into namespaces (tuples of strings) with keys (string identifiers) and values (arbitrary dicts).
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()
# Write a user preference (namespace, key, value)
store.put(("user_profiles", "alice"), "preferences", {
"language": "en",
"contact_channel": "email",
"last_order": "9847"
})
# Read it back in any future session
results = store.search(("user_profiles", "alice"))
if results:
prefs = results[0].value
print(prefs["contact_channel"]) # "email"
# Update β put() is upsert, same namespace+key overwrites
store.put(("user_profiles", "alice"), "preferences", {
**prefs,
"last_order": "10234"
})
For production, replace InMemoryStore with AsyncPostgresStore:
from langgraph.store.postgres.aio import AsyncPostgresStore
store = AsyncPostgresStore.from_conn_string(DB_URI)
await store.setup() # creates the store table on first run
Message Trimming to Prevent Context Overflow
As conversations grow, trim old messages before passing them to the LLM. LangChain's trim_messages utility handles this cleanly:
from langchain_core.messages import trim_messages
def respond_with_trim(state: SupportState) -> dict:
"""Trim message history to last 4000 tokens before invoking the LLM."""
trimmed = trim_messages(
state["messages"],
max_tokens=4000,
strategy="last", # keep the most recent messages
token_counter=llm, # uses the model's tokenizer for accurate counts
include_system=True, # always keep the system message
allow_partial=False, # never cut a message mid-sentence
)
reply = llm.invoke(trimmed)
return {"messages": [reply]}
For very long-running agents, combine trimming with a summarization node that periodically condenses old messages into a single summary AIMessage, reducing history length while preserving context continuity.
π Lessons Learned
1. Pick your checkpointer before you write any agent code. Switching checkpointers later means migrating stored state β InMemorySaver checkpoints cannot be imported into SqliteSaver. Decide on PostgresSaver for anything that will eventually run in production, even during early development with a local Postgres container.
2. Thread IDs are your API. Your thread_id naming convention determines how conversations are isolated. "user-{id}" collapses all of a user's conversations into one thread (continuous memory but no fresh starts). "user-{id}-session-{ts}" creates a new context per session. "ticket-{id}" isolates by support ticket. Choose based on the UX you want β it is hard to change later.
3. Don't store everything in the Memory Store. It sounds tempting to persist every message, but the Memory Store is not a chat history database. It is for facts β things a human would write in a notepad about another person. Use a purpose-built conversation database (or just the checkpoint history) for raw message logs.
4. Always implement a pruning strategy before you go to production. Checkpoint tables grow fast. An ON DELETE CASCADE TTL job is a single cron entry that saves you from an unexpectedly full disk at 3 AM.
5. Test failure recovery explicitly. Simulate a node crash mid-graph and verify that a fresh invoke() with the same thread_id resumes correctly without side effects. LangGraph's resume semantics are reliable, but your external API calls must be idempotent for this to be safe end-to-end.
π TLDR: Summary and Key Takeaways
- The amnesia problem is real: without a checkpointer, every
graph.invoke()starts with empty state β no memory of prior turns. operator.addon the messages field accumulates conversation history within a thread; this is the foundation of multi-turn agents.- Choose your checkpointer by deployment target:
InMemorySaverfor tests,SqliteSaverfor single-process prototypes,PostgresSaver/AsyncPostgresSaverfor production multi-worker services. - Thread IDs are conversation namespaces. The same compiled graph serves thousands of isolated conversations simultaneously;
config = {"configurable": {"thread_id": "..."}}is the entire routing key. get_stateandget_state_historygive you full visibility into any conversation's past β essential for debugging and for human-in-the-loop approval flows.- The Memory Store (
InMemoryStore/AsyncPostgresStore) bridges conversations: persist cross-session facts withstore.put()and retrieve them withstore.search(). - Context overflow is inevitable in long-running agents β plan for
trim_messagesor a summarization node from the start, not as an afterthought.
The one-liner to remember: a checkpointer turns a graph into a conversation; a Memory Store turns a conversation into a relationship.
π Practice Quiz
What happens when you call
graph.invoke()twice with the samethread_idand anInMemorySavercheckpointer attached?- A) The second call starts with empty state because
invoke()always resets the graph. - B) The second call resumes from the state saved after the last node of the first call.
- C) An error is raised because the same thread ID cannot be reused.
- D) The two invocations run in parallel and merge their state. Correct Answer: B
- A) The second call starts with empty state because
Your LangGraph service is deployed with three gunicorn worker processes sharing one machine. User Alice's conversation starts on Worker 1, then her next request is routed to Worker 3. Which checkpointer correctly serves her full message history on Worker 3?
- A)
InMemorySaver, because it is the fastest option. - B)
SqliteSaver, because SQLite files are shared on disk. - C)
PostgresSaver, because it stores checkpoints in a shared remote database accessible by all workers. - D) No checkpointer can handle this β LangGraph requires sticky sessions. Correct Answer: C
- A)
You are building a customer support agent that must remember a user's shipping address across separate support tickets (different thread IDs). Which LangGraph feature is the correct tool for this?
- A) Passing the address in the
thread_idstring. - B) The
BaseStore/ Memory Store (InMemoryStoreorAsyncPostgresStore). - C) Storing it as a checkpoint field using
InMemorySaver. - D) Injecting it via the system prompt hard-coded at compile time. Correct Answer: B
- A) Passing the address in the
Open-ended: A LangGraph agent handles 200 concurrent users, each averaging 15 turns per conversation, with a graph containing 6 nodes. Estimate the daily checkpoint write volume and describe two strategies you would implement at the storage layer to manage growth over a 90-day period. Consider the trade-offs between storage cost, debuggability, and the ability to resume conversations after a service outage.
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Software Engineering Principles: Your Complete Learning Roadmap
TLDR: This roadmap organizes the Software Engineering Principles series into a problem-first learning path β starting with the code smell before the principle. New to SOLID? Start with Single Responsibility. Facing messy legacy code? Jump to the smel...
Machine Learning Fundamentals: Your Complete Learning Roadmap
TLDR: πΊοΈ Most ML courses dive into math formulas before explaining what problems they solve. This roadmap guides you through 9 essential posts across 3 phases: understanding ML fundamentals β mastering core algorithms β deploying production models. ...
Low-Level Design Guide: Your Complete Learning Roadmap
TLDR TLDR: LLD interviews ask you to design classes and interfaces β not databases and caches.This roadmap sequences 8 problems across two phases: Phase 1 (6 beginner posts) builds your core OOP vocabulary through increasingly complex domains; Phase...

LLM Engineering: Your Complete Learning Roadmap
TLDR: The LLM space moves so fast that engineers end up reading random blog posts and never build a mental model of how everything connects. This roadmap organizes 35+ LLM Engineering posts into 7 tra
