What to carry out of this chapter

Subword fragments, byte pair encoding, the missing r's in "strawberry." If the mechanics have started to blur, that's fine. You don't need to keep them.

Here is the one thing to carry out of this chapter.

A model never sees your words. Before it reads anything, the text is chopped into small pieces called tokens, and each piece is swapped for an ID number. A token is usually a chunk smaller than a word: "dog" might be one token, "unbelievable" splits into a few. The model's world is a stream of these numbered chunks, never the letters or words you typed.

Everything else followed from that one fact. The pieces are fragments, not whole words, because whole words would mean an endless list and single letters would mean nothing, so the answer had to sit in between. The model can't count the r's in "strawberry" because it never saw the letters, only the chunks they were sealed inside. And every limit you'll ever hit, how much it can hold, what it costs, how fast it runs, is measured in tokens, not words.

But notice where this leaves us. A token's ID is still just a name tag, the exact problem we opened this module with: a number that says nothing about what the piece means. So the next step is to give each ID a richer set of numbers, a position in a space of meaning, placed so that pieces used alike sit near each other. Building that space, and the surprising arithmetic it allows, is the chapter ahead.

Subword fragments, byte pair encoding, the missing r's in "strawberry." If the mechanics have started to blur, that's fine. You don't need to keep them.

Here is the one thing to carry out of this chapter.