The same word, different meanings

Word2Vec had one stubborn limitation: each word got a single, fixed position. But words shift meaning depending on where they sit.

"Bank" by a river is not the "bank" you deposit money in. A "bright" light and a "bright" student are related but not the same. In a fixed embedding, both senses of a word collapse onto one point, an average smeared across every context the word ever appeared in. The model can't tell which "bank" you meant.

The fix is called . Instead of one position per word, the position is recalculated for each sentence, shaped by the words around it. "Bank" drifts toward one neighborhood when "river" is nearby and a different one when "account" is.

Every large language model works this way. A word isn't just dropped into the space of meaning, it's placed according to its whole sentence, weighed against every other word around it. But doing that, letting every word look at every other word and adjust, demands a specific machine built for the job. That machine is the transformer, and it is the entire next module.

The same word, different meanings

Word2Vec had one stubborn limitation: each word got a single, fixed position. But words shift meaning depending on where they sit.