The model doesn't read words. It reads tokens — subword fragments, each with a numeric ID. Every limit, every cost, every measure of context is in tokens, not words.

Now those IDs need to become something the network can reason with. Each one gets looked up in a table and swapped for a vector — a point in the embedding space. That's the chapter ahead.