Abstract AlgorithmsAbstract Algorithms

  • Home
  • All Posts
  • All Series
  • About

Category

rlhf

3 articles across 2 sub-topics

Ai(2)

RLHF in Practice: From Human Preferences to Better LLM Policies

TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...

Mar 9, 2026•11 min read

RLHF Explained: How We Teach AI to Be Nice

TLDR: A raw LLM is a super-smart parrot that read the entire internet — including its worst parts. RLHF (Reinforcement Learning from Human Feedback) is the training pipeline that transforms it from a pattern-matching engine into an assistant that is ...

Mar 9, 2026•13 min read
Fine Tuning(1)
Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF

Fine-Tuning LLMs: The Complete Engineer's Guide to SFT, LoRA, and RLHF

TLDR: A pretrained LLM is a generalist. Fine-tuning makes it a specialist. Supervised Fine-Tuning (SFT) teaches it your domain's language through labeled examples. LoRA does the same with 99% fewer trainable parameters. RLHF shapes its behavior using...

Apr 18, 2026•30 min read

Abstract Algorithms

Exploring the fascinating world of algorithms, data structures, and software engineering through clear explanations and practical examples.

Navigation

  • Home
  • All Posts
  • All Series
  • About

Series

  • LLM Engineering
  • Apache Spark Engineering
  • Python Programming

Popular Topics

  • #apache-spark14
  • intermediate12
  • Python6
  • performance6
  • Structured Streaming4
  • big data4

Author

Abstract Algorithms

Abstract Algorithms

@abstractalgorithms

© 2026 Abstract Algorithms. All rights reserved.

Powered by Hashnode