Data and compute

GPUs solved the compute problem. But a fast machine with nothing to study is still stuck. Compute alone wasn't enough.

The missing ingredient was data. By the mid-2000s, the internet had been accumulating photographs, text, and video for more than a decade. Billions of images existed, indexed, searchable, and downloadable. The raw material for training was suddenly abundant in a way it had never been before.

But raw images aren't enough. Remember how training works: the network guesses, then checks its guess against the right answer, and the gap between the two is what it learns from. For a photo, that right answer is a label saying what the image actually contains. A network learning to recognize cats needs to be told which images are cats, or there is no gap to measure and nothing to correct. Someone has to label the data.

That's what Fei-Fei Li, a researcher at Stanford, spent years doing. She assembled 1.2 million photographs, each carefully labeled with one of 1,000 categories (cats, buses, coffee mugs, everything). The labeling was done by thousands of workers online. The result was called ImageNet, and it gave the field something it had never had: a large, clean, standard that everyone could train and test on.

Three things had been missing: the algorithm (backpropagation), the data, and the compute. By 2010, all three were finally in place.

Data and compute

GPUs solved the compute problem. But a fast machine with nothing to study is still stuck. Compute alone wasn't enough.

Three things had been missing: the algorithm (backpropagation), the data, and the compute. By 2010, all three were finally in place.