Category
fine tuning
7 articles across 4 sub-topics
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...

LoRA Explained: How to Fine-Tune LLMs on a Budget
TLDR: Fine-tuning a 7B-parameter LLM updates billions of weights and requires expensive GPUs. LoRA (Low-Rank Adaptation) freezes the original weights and trains only tiny adapter matrices that are added on top. 90%+ memory reduction; zero inference l...
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Transfer Learning Explained: Standing on the Shoulders of Pretrained Models
TLDR: You don't need millions of labeled images or months of GPU time to build a great model. Transfer learning lets you borrow a pretrained network's hard-won feature detectors, plug in a new output head, and fine-tune on your small dataset — often ...
RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)
TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF
TLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer trainable parameters. RLHF shapes its behavior using...
