Something unexpected
The speedup was expected. What wasn't expected was what happened to the models themselves.
Researchers assumed that bigger, deeper networks would be better in roughly the way that a bigger engine is better: more of the same, proportionally stronger. But that's not what they found.
At a certain scale, something changed qualitatively. Networks started noticing things nobody had told them to look for. Features appeared in the hidden layers that nobody had anticipated. Tasks that had required painstaking hand-engineering started solving themselves, just from exposure to enough data at enough scale.
It wasn't that the algorithm got smarter. The algorithm was the same one from 1986. What changed was the environment it ran in. More data to learn from. More parameters to learn with. More compute to run the whole process.
The field hadn't anticipated that scale itself was a variable. That turning up the dial on size and data didn't just produce quantitatively better results — it produced qualitatively different ones.
That insight would keep compounding for the next decade.