Top K
Top-K
selects the next token by selecting from the k
most probable tokens. Higher k
increases randomness.
How it Works #
To illustrate Top-K
sampling, consider generating the next word after the quick brown fox
using a model trained on extensive English text. Suppose the model’s predicted probabilities for the next word are as follows:
jumps: 0.4
runs: 0.3
walks: 0.2
eats: 0.05
sleeps: 0.05
Applying Top-K
sampling with k=3
limits our choices to jumps
, runs
, and walks
.
These are then adjusted to a new distribution where the probabilities sum to 1
, making jumps
more likely while still allowing for runs
or walks
to be chosen, based on a random draw.