Home/Learn/Llm Engineering
Topic

Llm Engineering

Learn Llm Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts49 Articles14h 7m

Overview

Learn Llm Engineering as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Llm
Ai
Machine Learning
Ai Agents

Learning Path in this Topic

Series that contain articles from Llm Engineering. Select a path to filter the article list.

Articles

49 matched articles

Article 1Managed API LLMs vs Self-Hosted Models: When to Switch and When Not ToTLDR: Most teams should start with managed LLM APIs because they buy speed, reliability, model quality, and low operational burden. Move to self-hosted or open-weight models only when you have stable 17 minArticle 2LLM Model Selection Guide: GPT-4o vs Claude vs Llama vs Mistral β€” When to Use WhichTLDR: 🧠 Choosing the right LLM can save you 80% on costs while maintaining quality. This guide provides a decision framework, cost comparison, and practical examples to help engineering teams select 23 minArticle 3Context Window Management: Strategies for Long Documents and Extended ConversationsTLDR: 🧠 Context windows are LLM memory limits. When conversations grow past 4K-128K tokens, you need strategies: sliding windows (cheap, lossy), summarization (balanced), RAG (selective), map-reduce 20 minArticle 4ANN Index Types Explained: When to Choose Flat, HNSW, IVF, or IVF-PQTLDR: If your dataset is small and correctness is critical, use Flat. If you need high recall with low latency and enough RAM, use HNSW. If your corpus is huge and memory is your bottleneck, use IVF-P14 minArticle 5RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)πŸ“Œ TL;DR Summary Use RAG when facts change frequently and answers must be source-grounded. Use fine-tuning when you need stable behavior: tone, format, and domain-specific reasoning. Use RAG + fine-t31 minArticle 6Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-DiveTLDR: LoRA freezes the base model and trains two tiny matrices per layer β€” 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enab31 min

Page 1 of 9