Abstract Algorithms

reinforcement learning

3 articles

Reinforcement Learning: Agents, Environments, and Rewards in Practice

TLDR: Reinforcement Learning trains agents to make sequences of decisions by learning from rewards and penalties. Unlike supervised learning, RL learns through trial and error rather than labeled examples. Use it for sequential decision problems wher...

Mar 29, 2026•15 min read

RLHF in Practice: From Human Preferences to Better LLM Policies

TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...

Mar 9, 2026•11 min read

RLHF Explained: How We Teach AI to Be Nice

TLDR: A raw LLM is a super-smart parrot that read the entire internet — including its worst parts. RLHF (Reinforcement Learning from Human Feedback) is the training pipeline that transforms it from a pattern-matching engine into an assistant that is ...

Mar 9, 2026•13 min read