Start here
Llm
Learn Llm as a connected topic across articles, concepts, simulations, and interview reasoning.
LlmMental ModelTradeoffsFailure ModesInterview ReasoningFine
Begin with
Fine gives you the cleanest entry point before branching into constraints, failures, and related systems.
12
Articles
10
Concepts
Related systems
Follow the nearby ideas
Use the map as a quiet orientation layer, then move back into the articles for depth.
Guidance
Llm
Continues from what you have already explored.
System behavior
LLM Inference Pipeline
Request transforms through prompt, retrieval, generation, and guardrails.
Step 1 / 2Normal flow
Read in sequence
1Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-DiveTLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enab31 min2Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIsTLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M 31 min3Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHFTLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer tr30 min4Chain of Thought Prompting: Teaching LLMs to Think Step by StepTLDR: Chain of Thought (CoT) prompting tells a language model to reason out loud before answering. By generating intermediate steps, the model steers itself toward correct conclusions — turning guessw27 min5LLM Hallucinations: Causes, Detection, and Mitigation StrategiesTLDR: LLMs hallucinate because they are trained to predict the next plausible token — not the next true token. Understanding the three hallucination types (factual, faithfulness, open-domain) plus the30 min6Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 min7Dense LLM Architecture: How Every Parameter Works on Every TokenTLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-24 min8LLM Software Development Pitfalls: What to Avoid and When to SimplifyTLDR: Most bad LLM products do not fail because the model is weak. They fail because teams wrap a maybe-useful model in too much architecture: prompt spaghetti, no eval harness, weak tool schemas, hug20 min9LLM Observability: Tracing, Logging, and Debugging Production AI SystemsTLDR: 🔍 LLM observability is radically different from traditional APM—non-deterministic outputs, variable token costs, and multi-step reasoning chains require specialized tracing. LangSmith provides 19 min10LLM Evaluation Frameworks: How to Measure Model Quality (RAGAS, DeepEval, TruLens)TLDR: 📏 Traditional ML metrics (accuracy, F1) fail for LLMs because there's no single "correct" answer. RAGAS measures RAG pipeline quality with faithfulness, answer relevance, and context precision.16 min11LangChain 101: Chains, Prompts, and LLM IntegrationTLDR: LangChain's LCEL pipe operator (|) wires prompts, models, and output parsers into composable chains — swap OpenAI for Anthropic or Ollama by changing one line without touching the rest of your c19 min12Types of LLM Quantization: By Timing, Scope, and MappingTLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In17 min
Related threads

