LLM Hyperparameters Guide: Temperature, Top-P, and Top-K Explained
Abstract AlgorithmsTL;DR
TLDR: Hyperparameters are the knobs you turn before generating text. Temperature controls randomness (Creativity vs. Focus). Top-P controls the vocabulary pool (Diversity). Frequency Penalty stops the model from repeating itself. Knowing how to tune ...

TLDR: Hyperparameters are the knobs you turn before generating text. Temperature controls randomness (Creativity vs. Focus). Top-P controls the vocabulary pool (Diversity). Frequency Penalty stops the model from repeating itself. Knowing how to tune these is the difference between a hallucinating bot and a reliable coding assistant.
1. The "Knobs" of Generation (The No-Jargon Explanation)
Imagine an LLM is a Jazz Musician.
- Temperature: How much improvisation is allowed?
- Low (0.1): Play the sheet music exactly as written. (Boring, precise).
- High (0.9): Go wild! Play random notes! (Creative, risky).
- Top-P (Nucleus Sampling): How many different notes can you choose from?
- Low (0.1): Only pick from the top 3 safest notes.
- High (0.9): You can pick from almost any note in the scale.
2. Deep Dive: The Core Parameters
A. Temperature (0.0 to 2.0)
Controls the randomness of the next token prediction.
- Math: It scales the logits before the Softmax function.
$$ P_i = \frac{\exp(z_i / T)}{\sum \exp(z_j / T)} $$
- Effect:
- Low ($T < 0.5$): The model becomes confident. It picks the most likely word almost every time.
- High ($T > 1.0$): The model flattens the probability curve. Rare words become more likely.
B. Top-P (0.0 to 1.0)
Also known as Nucleus Sampling. Instead of picking from all words, the model only considers the top words whose cumulative probability adds up to $P$.
- Effect:
- Low ($P = 0.1$): Only considers the top 1-2 words. Very deterministic.
- High ($P = 0.9$): Considers a wide range of words. More diverse vocabulary.
Pro Tip: Generally, change either Temperature or Top-P, but not both. They do similar things.
C. Top-K (Integer, e.g., 40)
Limits the model to pick from the top $K$ most likely words.
- Effect: Hard cutoff. Even if the $(K+1)$th word is perfectly valid, it is ignored. This prevents the model from going completely off the rails into nonsense.
D. Frequency/Presence Penalty (-2.0 to 2.0)
- Frequency Penalty: Penalizes words based on how many times they have already appeared in the text. (Stops: "and and and and").
- Presence Penalty: Penalizes words if they have appeared at least once. (Encourages new topics).
3. Control Parameters: Length & Sampling
These parameters control how and how much the model generates.
A. do_sample (True/False)
- True: The model picks the next word randomly based on probabilities (using Temperature/Top-P).
- False: The model always picks the single most likely word (Greedy Search).
- Usage: Set to
Falsefor math/coding where there is only one right answer.
B. max_new_tokens (Integer)
- Definition: The maximum number of new tokens the model is allowed to generate.
- Usage: Prevents the model from rambling on forever and eating up your API budget.
- Note: This is different from
context_length(which includes the input prompt).
C. min_length (Integer)
- Definition: Forces the model to generate at least $N$ tokens.
- Usage: Useful for summarization tasks where you don't want a one-word answer like "Good."
D. repetition_penalty (Float, usually 1.0 to 1.2)
- Definition: A hard penalty applied to the logits of already generated tokens.
- Math: If a token has appeared, divide its logit by this penalty.
- Usage: Stronger than Frequency Penalty. Use it if the model gets stuck in a loop like "I went to the the the the...".
4. The Cheat Sheet: Scenarios & Values
Use this table as a starting point for your applications.
| Scenario | Temperature | Top-P | do_sample | Repetition Penalty | Why? |
| Code Generation | 0.0 - 0.2 | 0.1 | False | 1.0 | Code must be precise. Syntax errors are fatal. We want the most likely (correct) token every time. |
| Data Extraction | 0.0 | 0.0 | False | 1.0 | We want consistent JSON/CSV output. No creativity allowed. |
| Chatbot (Support) | 0.5 - 0.7 | 0.8 | True | 1.05 | Friendly but accurate. Needs to vary phrasing slightly but stay on topic. |
| Creative Writing | 0.8 - 1.0 | 0.9 | True | 1.1 - 1.2 | Needs to be surprising and avoid repetition. High temp allows "interesting" word choices. |
| Brainstorming | 1.0+ | 1.0 | True | 1.0 | We want wild ideas. Hallucination is actually a feature here. |
5. Real-World Application: Tuning a Summarizer
- Goal: Summarize a news article.
- Attempt 1 (Temp 1.0): The model adds its own opinions and uses flowery language. (Bad).
- Attempt 2 (Temp 0.0): The model copies sentences verbatim from the text. (Boring, but accurate).
- Optimal (Temp 0.3): The model rephrases sentences slightly but sticks strictly to the facts.
Summary & Key Takeaways
- Temperature: Controls "Risk". Low = Safe/Repetitive. High = Creative/Random.
- Top-P: Controls "Vocabulary Size". Low = Limited. High = Diverse.
- do_sample: Set to
Falsefor deterministic tasks (Math/Code). - Repetition Penalty: Use this if the model gets stuck in loops.
Practice Quiz: Test Your Intuition
Scenario: You are building a SQL query generator. The user asks "Show me all users." You want the model to output
SELECT * FROM users;every single time. What settings do you use?- A) Temp = 0.0, do_sample = False
- B) Temp = 1.0, do_sample = True
- C) Top-P = 0.9
Scenario: Your chatbot keeps repeating the same phrase "I apologize for the inconvenience" three times in one paragraph. Which parameter should you increase?
- A) Temperature
- B) Repetition Penalty
- C) Top-K
Scenario: Why is High Temperature bad for math problems?
- A) It makes the model slower.
- B) Math has only one correct answer. High temp makes the model pick "less likely" (wrong) numbers.
- C) It uses more tokens.
(Answers: 1-A, 2-B, 3-B)

Written by
Abstract Algorithms
@abstractalgorithms
