What we actually need
Here's what a useful word representation would look like:
Similar words should be close together. "Dog" and "puppy" should end up nearby. "Paris" and "London" should be near each other, in the same region as other capital cities.
Relationships should be consistent. If "king" is to "queen" as "man" is to "woman," that relationship should show up in the numbers — the same kind of shift, pointing in the same direction.
And the representation should be compact. Not 10,000 numbers per word. Maybe 300. Dense, not sparse.
This is what an embedding is. Not a label. A position in a space — a space where the geometry encodes something about meaning.
The question is: how do you learn that space? Nobody sits down and decides where "dog" should go. The positions have to be discovered from the data.
<!-- TODO: small conceptual diagram showing words as points in 2D space, clustered by meaning — abstract but intuitive -->