Category
reinforcement learning
3 articles

Reinforcement Learning: Agents, Environments, and Rewards in Practice
TLDR: Reinforcement Learning trains agents to make sequences of decisions by learning from rewards and penalties. Unlike supervised learning, RL learns through trial and error rather than labeled examples. Use it for sequential decision problems wher...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
RLHF Explained: How We Teach AI to Be Nice
TLDR: A raw LLM is a super-smart parrot that read the entire internet — including its worst parts. RLHF (Reinforcement Learning from Human Feedback) is the training pipeline that transforms it from a pattern-matching engine into an assistant that is ...
