Abstract Algorithms

Llm Mental Model Tradeoffs Failure Modes Interview Reasoning Fine Model Behavior Vector Space

Guidance

Llm

Continues from what you have already explored.

I can continue your learning session from the exact context you left off.

Resume Context

Continue Learning Practice Tradeoffs Next Drill

System behavior

LLM Inference Pipeline

Request transforms through prompt, retrieval, generation, and guardrails.

Open

Speed

Step 1 / 2Normal flow

Read in sequence

1Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-DiveTLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enab31 min 2Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIsTLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M 31 min 3Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHFTLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer tr30 min 4Chain of Thought Prompting: Teaching LLMs to Think Step by StepTLDR: Chain of Thought (CoT) prompting tells a language model to reason out loud before answering. By generating intermediate steps, the model steers itself toward correct conclusions — turning guessw27 min 5LLM Hallucinations: Causes, Detection, and Mitigation StrategiesTLDR: LLMs hallucinate because they are trained to predict the next plausible token — not the next true token. Understanding the three hallucination types (factual, faithfulness, open-domain) plus the30 min 6Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 min 7Dense LLM Architecture: How Every Parameter Works on Every TokenTLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-24 min 8LLM Software Development Pitfalls: What to Avoid and When to SimplifyTLDR: Most bad LLM products do not fail because the model is weak. They fail because teams wrap a maybe-useful model in too much architecture: prompt spaghetti, no eval harness, weak tool schemas, hug20 min 9LLM Observability: Tracing, Logging, and Debugging Production AI SystemsTLDR: 🔍 LLM observability is radically different from traditional APM—non-deterministic outputs, variable token costs, and multi-step reasoning chains require specialized tracing. LangSmith provides 19 min 10LLM Evaluation Frameworks: How to Measure Model Quality (RAGAS, DeepEval, TruLens)TLDR: 📏 Traditional ML metrics (accuracy, F1) fail for LLMs because there's no single "correct" answer. RAGAS measures RAG pipeline quality with faithfulness, answer relevance, and context precision.16 min 11LangChain 101: Chains, Prompts, and LLM IntegrationTLDR: LangChain's LCEL pipe operator (|) wires prompts, models, and output parsers into composable chains — swap OpenAI for Anthropic or Ollama by changing one line without touching the rest of your c19 min 12Types of LLM Quantization: By Timing, Scope, and MappingTLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In17 min

Related threads

Find the idea you are trying to connect