Data and compute
GPUs solved the compute problem. But compute alone wasn't enough.
The second ingredient was data. By the mid-2000s, the internet had been accumulating photographs, text, and video for more than a decade. Billions of images existed, indexed, searchable, and downloadable. The raw material for training was suddenly abundant in a way it had never been before.
But raw images aren't enough. A network learning to recognize cats needs to be told which images contain cats. Someone has to label the data.
That's what Fei-Fei Li, a researcher at Stanford, spent years doing. She assembled 1.2 million photographs, each carefully labeled with one of 1,000 categories (cats, buses, coffee mugs, everything). The labeling was done by thousands of workers online. The result was called ImageNet, and it gave the field something it had never had: a large, clean, standard benchmark that everyone could train and test on.
Three things had been missing: the algorithm, the data, and the compute. By 2010, all three were finally in place.