Category
alignment
2 articles in this category
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
•8 min read
RLHF Explained: How We Teach AI to Be Nice
TLDR: A raw LLM is a super-smart parrot that read the entire internet — including its worst parts. RLHF (Reinforcement Learning from Human Feedback) is the training pipeline that transforms it from a pattern-matching engine into an assistant that is ...
•5 min read
