Where does word order go?
Attention treats every token equally — it compares every token to every other token with no built-in sense of which comes first.
That's a problem. Word order matters enormously in language. "The dog bit the man" and "The man bit the dog" have the same words but opposite meanings.
The transformer solves this with positional encoding: a set of numbers added to each token's embedding that encodes its position in the sequence. Token 1 gets one pattern added to it. Token 2 gets a slightly different pattern. Token 100 gets a pattern that's clearly distinct from token 1.
The attention mechanism now has access to both what the word means and where it appears. It can learn to use either or both depending on what the task requires.
This is a small but essential detail: the transformer isn't order-blind. It just handles order through addition rather than through the structure of the network itself.