Context windows are measured in tokens

Tokenization matters outside the model too, because almost everything about using a language model is counted in tokens.

The , the amount of text a model can hold in mind at once, is a token limit, not a word limit. A large one can hold many pages, sometimes a short book's worth, give or take depending on how cleanly the text tokenizes. We will return to the context window in its own chapter; for now, just notice that its size is set in tokens.

Cost and speed are per-token as well. When you use a commercial model, you pay for tokens going in and tokens coming out. Long conversations and big documents get expensive partly because they are simply long, measured in tokens.

And this sets up exactly what comes next. The model now holds a row of token IDs. But an ID is still just a name tag, the very problem we opened this module with. To do anything useful, each ID gets looked up and swapped for an embedding: a position in the space of meaning. Turning tokens into that space is the final chapter of this module.

Context windows are measured in tokens

Tokenization matters outside the model too, because almost everything about using a language model is counted in tokens.