No inner monologue

People often imagine a language model thinking before it speaks — holding a complete thought internally, then expressing it.

That's not what happens. The model doesn't have an internal representation of what it's about to say. It has the context so far, and it picks what comes next. The words are the thinking.

This has a counterintuitive implication: asking a model to "think step by step" genuinely improves its accuracy on reasoning tasks — not because it gives the model time to think privately, but because the steps themselves become part of the context. The model can then attend to those intermediate steps when generating subsequent tokens.

The reasoning happens in the output, not before it.

This is different from how humans work. When you solve a math problem in your head, you hold partial results somewhere private. A language model has no such private space. All of its "working memory" is the context window. This is why chain-of-thought prompting works, and why models get better at reasoning when given space to write out intermediate steps.

<!-- TODO: a contrast between "thinking then speaking" vs "speaking is thinking" would be a nice visual anchor here -->