Limits in the 1990s

Three things were missing.

The first was depth. Networks with many layers were much more powerful in theory, but training them didn't work. The correction signal that traveled backwards through the network to update the weights would fade as it passed through each layer. By the time it reached the early layers, it had almost disappeared. Those weights barely changed. The network couldn't really learn.

The second was computing power. Even training a small network on a modest dataset took a very long time on the hardware available in the 1990s. Researchers often had to wait days for results. Bigger, more capable networks were simply out of reach.

The third was data. For a network to learn to recognize a cat, it needs to see thousands of examples of cats, each one labeled. Someone has to go through every single image and write down the answer. In the 1990s, that kind of labeled data barely existed. Collecting it was slow, expensive, and largely done by hand.

The idea was right. The math worked. But without the computing power to train large networks, and without the data to feed them, neural networks stayed a promising theory rather than a practical tool.

Limits in the 1990s

Three things were missing.