One token at a time
Here is the complete mechanical description of how a language model generates text:
1. It takes the entire context as input — everything that came before.
2. It produces a probability distribution over the entire vocabulary: a score for every possible next token.
3. It picks one token from that distribution.
4. That token gets appended to the context.
5. Repeat from step 1.
That's it. There's no planning step. No internal draft. No preview of the final sentence before it starts writing. The model commits to one token at a time, and each token immediately becomes part of the context that informs the next.
This is why language models sometimes start a sentence and paint themselves into a corner. They chose early words that constrained what could follow. There was no going back.
<!-- TODO: an animation of a context growing token by token as the model generates would make this tangible and is probably unique to Untangled's visual style -->