The realization
The realization, when it came, felt almost obvious in retrospect.
Researchers at Toronto and NYU noticed that the core math of neural network training — multiplying large tables of numbers together — was exactly the kind of thing GPUs were built to do. In 2007, NVIDIA released a platform called CUDA that let programmers write code for general purposes that ran directly on GPU hardware. Not just graphics. Anything.
Training that had taken weeks on a CPU took hours on a GPU. Some tasks were 10 to 50 times faster.
No new algorithm was invented. No new idea was discovered. Researchers just moved existing math to hardware that could run it in parallel — hardware that had been sitting in gaming computers and entertainment systems, used mostly to render explosions and racing games.
The speedup changed everything about what was practical to build.