Temperature: how random is random?

When the model produces a probability distribution over the vocabulary, it doesn't always pick the most probable token. There's a dial called temperature that controls how much randomness is introduced.

At low temperature, the model almost always picks the highest-probability token. The output is predictable, focused, and somewhat repetitive. Good for factual questions where there's one right answer.

At high temperature, the model samples more freely from lower-probability tokens. The output is more varied, more surprising, sometimes creative — and more likely to go off the rails.

At zero temperature, the model always picks the single most probable token. This makes it fully deterministic: given the same prompt, it gives the same answer every time.

This dial explains some behavior that confuses users. The same question asked twice might get different answers — not because the model changed, but because it's sampling from a probability distribution. It explains why creative writing prompts can feel "alive" while factual queries should be set to lower temperatures for reliability.

<!-- TODO: a simple dial visual, or a side-by-side showing the same prompt at different temperatures, would make this feel intuitive -->