The same word, different meanings
Word2Vec had a limitation: each word got one fixed vector. But words often mean different things in different contexts.
"Bank" by a river is not the same as "bank" you deposit money into. "Bright" describing a light and "bright" describing a person are related but distinct. In Word2Vec, both uses of a word map to the same point in space. The model averages over all the contexts where the word appeared.
Later models solved this with contextual embeddings. Instead of a fixed vector per word, the embedding changes based on the surrounding context. The representation of "bank" shifts depending on whether "river" or "account" is nearby.
This is what modern language models do. Every token gets an embedding that's shaped by everything else in the context. The word isn't just placed in space — it's placed based on where it is in the sentence, in relation to every other word around it.
That process of building context-sensitive representations is what the transformer architecture is designed to do. That's the next module.