Why training is slow

Training a neural network is relentless arithmetic.

Every training example gets passed through every layer. Every weight gets multiplied by something. Every error gets traced back through the whole network and every weight gets adjusted. Then the next example. Then the next. Across millions of examples, across millions of weights, repeated thousands of times.

The computer at the center of this — the CPU — was built for a different kind of work. CPUs are good at doing one complicated thing very quickly: a calculation that depends on the result of the previous one, then the next, then the next. They handle complexity in sequence, step by step.

Neural network training doesn't need that. Most of the calculations don't depend on each other. You could, in principle, run all of them at the same time.

A CPU does one thing at a time, very fast. Neural network training needed something else entirely.