What to carry out of this chapter

That was a lot. House-number streets, loops that won't lie flat, ten thousand light switches. If the details have already started to blur, that's fine. You are not meant to keep all of them.

Here is the one thing to carry out of this chapter.

A machine has no idea what a word means. All it ever sees is numbers. So the whole game is to hand it numbers where closeness tells the truth: words that mean similar things sit near each other, words that don't sit far apart. Get that right and the machine can finally treat "dog" and "puppy" as relatives instead of strangers.

Everything else this chapter did was chasing that one property and watching simpler tries fail. A single number per word is too cramped to hold meaning, which pulls in many directions at once. Giving every word its own switch is the opposite mistake: it keeps everything the same maximum distance apart, which is no map at all. What you actually want sits between those two, a word as a point in a space of hundreds of directions, placed so that nearness means real similarity. And nobody has to place those points by hand. A word's company gives its position away, the way you guessed "wibble" was a drink from nothing but the words around it.

That is the idea the rest of this module builds on. Before a machine can start placing words by their company, though, it needs to know what counts as a piece of text in the first place. A word? Part of a word? The text has to be broken into units the machine can handle, and those units are called tokens. That is where we go next.

That was a lot. House-number streets, loops that won't lie flat, ten thousand light switches. If the details have already started to blur, that's fine. You are not meant to keep all of them.

Here is the one thing to carry out of this chapter.