Stacking neurons

A single neuron can only draw one line.

Imagine sorting photographs into two piles: cats and dogs. A single neuron looks at the inputs, adds them up, and makes a call. But the world is messier than one line can handle. Real patterns — fur texture, the shape of an ear, the angle of a jaw — involve combinations that a single step can't capture.

What if you chained neurons together? The first set looks at the raw inputs. The second set looks at what the first set found. The third set looks at what the second found. Each layer sees a slightly more abstract version of the problem than the one before.

The first layer might notice edges in a photo. The second combines those edges into shapes. The third combines shapes into something that functions like the idea of "cat." Nobody wrote those concepts down. They emerged from the layers.

The architecture has a name: a multilayer network, or a neural network with hidden layers. The middle layers are called hidden not because they're secret, but because you can't see them from outside. The inputs go in. The outputs come out. Everything in between is invisible unless you go looking.

The architecture was the easy part. Training it was the problem.