The scale this enabled

The speedup didn't just make existing work faster. It opened up work that had never been possible.

Before GPU training, you could train a network on tens of thousands of examples. Now you could train on millions. You could build networks with dozens of layers instead of two or three. You could run experiments in hours that would have taken months.

Something unexpected happened as networks got bigger and datasets grew larger: the quality didn't just improve gradually, it jumped. Problems that shallow networks had barely made a dent in started yielding to deeper ones. Capabilities emerged that smaller-scale training had never hinted at.

Scale turned out to matter in a way nobody had fully predicted. The theory had been right for decades. What changed was the ability to test it at a size where it actually worked.