Numbers don't carry meaning on their own. Lists of zeros don't either. What a language model needs is a representation where the positions in space encode something about how words relate to each other — and the only way to build that is to learn it from how language is actually used.

Before any of that learning can happen, though, the text has to be broken into pieces the model can process. Those pieces are called tokens. That's where we're going next.