Home

Topic

llm

54 articles across 11 sub-topics

Sub-topic

Ai

29 articles

Chain of Thought Prompting: Teaching LLMs to Think Step by Step

Chain of Thought Prompting: Teaching LLMs to Think Step by Step

TLDR: Chain of Thought (CoT) prompting tells a language model to reason out loud before answering. By generating intermediate steps, the model steers itself toward correct conclusions — turning guesswork into structured reasoning. It's the difference...

25 min read
LLM Hallucinations: Causes, Detection, and Mitigation Strategies

LLM Hallucinations: Causes, Detection, and Mitigation Strategies

TLDR: LLMs hallucinate because they are trained to predict the next plausible token — not the next true token. Understanding the three hallucination types (factual, faithfulness, open-domain) plus the five root causes lets you choose the right mitiga...

28 min read
How AI Coding Agents Work: Models, Context, Sessions, and Memory

How AI Coding Agents Work: Models, Context, Sessions, and Memory

TLDR: An AI coding agent is an LLM stapled to a tool registry, wrapped in an orchestration loop that painstakingly rebuilds state on every single API call — because the model itself is completely stateless. Understanding the context window, the ReAct...

31 min read
Types of LLM Quantization: By Timing, Scope, and Mapping

Types of LLM Quantization: By Timing, Scope, and Mapping

TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...

16 min read

Practical LLM Quantization in Colab: A Hugging Face Walkthrough

TLDR: This is a practical, notebook-style quantization guide for Google Colab and Hugging Face. You will quantize real models, run inference, compare memory/latency, and learn when to use 4-bit NF4 vs safer INT8 paths. 📖 What You Will Build in Thi...

15 min read

GPTQ vs AWQ vs NF4: Choosing the Right LLM Quantization Pipeline

TLDR: GPTQ, AWQ, and NF4 all shrink LLMs, but they optimize different constraints. GPTQ focuses on post-training reconstruction error, AWQ protects salient weights for better quality at low bits, and NF4 offers practical 4-bit compression through bit...

14 min read

Sub-topic

Ai Agents

13 articles

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

27 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

30 min read

LangChain Tools and Agents: The Classic Agent Loop

🎯 Quick TLDR: The Classic Agent Loop TLDR: LangChain's @tool decorator plus AgentExecutor give you a working tool-calling agent in about 30 lines of Python. The ReAct loop — Thought → Action → Observation — drives every reasoning step. For simple l...

20 min read
LangChain 101: Chains, Prompts, and LLM Integration

LangChain 101: Chains, Prompts, and LLM Integration

TLDR: LangChain's LCEL pipe operator (|) wires prompts, models, and output parsers into composable chains — swap OpenAI for Anthropic or Ollama by changing one line without touching the rest of your code. 📖 One LLM API Today, Rewrite Tomorrow: The...

19 min read
LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools

LangGraph Tool Calling: ToolNode, Parallel Tools, and Custom Tools

TLDR: Wire @tool, ToolNode, and bind_tools into LangGraph for agents that call APIs at runtime. 📖 The Stale Knowledge Problem: Why LLMs Need Runtime Tools Your agent confidently tells you the current stock price of NVIDIA. It's from its training d...

17 min read

Streaming Agent Responses in LangGraph: Tokens, Events, and Real-Time UI Integration

TLDR: Stream agents token by token with astream_events; wire to FastAPI SSE for zero-spinner UX. 📖 The 25-Second Spinner: Why Streaming Is a UX Requirement, Not a Nice-to-Have Your agent takes 25 seconds to respond. Users abandon after 8 seconds....

19 min read