From Boring to Brilliant: A Guide to LLM Sampling Techniques

Posted Nov 24, 2025 Updated Mar 12, 2026

By Sayan Biswas

4 min read

When you ask an LLM a question, it doesn’t just “know” the answer. It predicts the next word, one by one. But how it chooses that next word makes the difference between a robotic, repetitive answer and a creative, human-sounding response.

This logic is defined by Sampling Techniques. Here is a breakdown of the most common methods, how they work, and when to use them.

1. The Deterministic Route: Searching for the “Best” Answer

These methods aim to find the statistically most probable sequence of text. There is no randomness here; the same input will always yield the same output.

Greedy Search

This is the simplest approach. At every step, the model looks at the probability distribution and strictly picks the #1 most probable token (argmax).

Pros: It is fast, computationally cheap, and produces very coherent, on-topic text.
Cons: It kills creativity. It is prone to getting stuck in repetitive loops (e.g., “I am I am I am…”) and is “brittle”. One bad word choice early on can derail the whole sentence.
Best For: Strictly factual tasks like arithmetic, translation, or Q&A where you want the single “correct” answer.

Beam Search

Think of this as a “smarter” Greedy Search. Instead of only keeping the single best word at each step, it keeps track of the $k$ most probable sequences (called “beams”) simultaneously.

Pros: It offers high fluency and is better at maintaining overall sentence logic compared to Greedy.
Cons: It is computationally expensive ($k$ times more work) and can still result in generic, safe text.
Best For: Machine Translation and Summarization, where the overall quality of the full sequence is the top priority.

2. The Stochastic Route: Adding Creativity

To make text feel human, we need randomness (stochasticity). These methods sample from the probability distribution rather than just maximizing it.

Top-K Sampling

This method truncates the “tail” of the probability distribution. It filters the options to the Top $k$ most probable tokens (e.g., the top 50), and then samples randomly from that specific group.

Pros: It balances creativity with coherence by filtering out “absurd” low-probability words.
Cons: It isn’t adaptive. The cutoff $k$ is fixed. If the “true” list of good words is 3, but $k$ is 50, you might get a bad word. If the list of good words is 200, you unfairly cut 150 of them.
Best For: Creative tasks like story writing, chatbots, and brainstorming.

Top-P (Nucleus) Sampling

Currently the “gold standard” for open-ended generation. Instead of a fixed number of words, it filters to the smallest set of words whose cumulative probability is $\ge p$ (e.g., $p=0.92$).

Pros: It is highly adaptive.
- If the model is certain (one word has 95% probability), the sample pool shrinks to just that word (acting like Greedy).
- If the model is uncertain, the pool expands to include many options (acting like Top-K).
Best For: The modern default for chatbots (including ChatGPT), coding assistants, and creative writing.

Deep Dive: Beam Search vs. Top-K Sampling

A common point of confusion is that both Beam Search and Top-K use a number “$k$.” However, they are fundamentally different tools for different jobs.

Feature	Beam Search ($k=3$)	Top-K Sampling ($k=3$)
Type	Search Algorithm (Deterministic)	Sampling Method (Stochastic)
Goal	Find the single, most probable complete sentence	Pick one random, “good” next word
What is $k$?	Beam Width: Number of parallel sequences to track	Filter Size: Number of top tokens to choose from

The “Cat” Example

Imagine the model is completing the sentence: “The cat sat on the…”

1. How Beam Search handles it ($k=2$) Beam search looks at the “Total Score” of the entire sentence so far.

It calculates 50,000+ possibilities but only keeps the Top 2 paths: “The cat sat on the mat” and “The cat sat on the rug.” It permanently discards “floor” or “dog”.
It then explores the next word for those two specific paths to see which yields the highest total probability for the full sentence.
Result: No randomness. It optimizes for the “best” path.

2. How Top-K handles it ($k=3$) Top-K makes a decision for right now, with no memory of other paths.

It takes the top 3 words: mat (0.4), rug (0.3), floor (0.15).
It recalculates their probabilities to sum to 100% (e.g., mat becomes ~47%, rug ~35%).
It rolls the dice. There is a real chance it picks “rug” or “floor,” even if “mat” was the most likely.
Result: Randomness is introduced. The model creates something new.

Conclusion

Choosing the right sampling technique depends entirely on your goal. If you need a mathematically perfect translation, stick to Beam Search. If you want a chatbot to write a poem or brainstorm ideas, Top-P (Nucleus) Sampling is your best friend.

Deep Learning

LLM

This post is licensed under CC BY 4.0 by the author.