One-hot encoding
The first solution researchers tried is called one-hot encoding. Instead of assigning each word a single number, you give it a list of zeros with a single 1 in the position that corresponds to that word.
A vocabulary of 10,000 words means every word is represented as a list of 10,000 numbers — all zeros, except one.
This solves the "labels are arbitrary" problem. No word is numerically close to any other word. But it creates a new problem: every word is equally far from every other word. "Dog" is no closer to "puppy" than it is to "skyscraper." The representation has no structure. There's nothing the network can learn from the positions.
It also wastes space. 10,000 zeros and a single 1 is not an efficient way to represent a word.
The right approach has to do something harder: encode meaning in the numbers themselves.
<!-- TODO: simple illustration of a one-hot vector — 9999 zeros and a single 1 — would land the wastefulness intuitively -->