Module VI
The Meaning of Words
Chapter II
Tokens
A language model never reads the way you do.
Before any processing begins, text has to be broken into pieces. Not words, exactly. Something smaller and more practical: fragments that can be combined to cover almost any text efficiently, without requiring a vocabulary so large it becomes unworkable.
The pieces are called tokens. The way text gets split into them shapes everything that follows. It determines what the model finds easy to handle and what it finds surprisingly difficult. It creates blind spots in places you might not expect. It means the model's relationship to language is subtly different from yours in ways that matter.
It is easy to overlook because it happens before anything visible occurs. But it quietly decides what the model is allowed to notice.