The problem of credit

In the last chapter, we saw what training a single neuron looks like: get something wrong, nudge the weights, repeat. We also saw why a whole network makes that harder: the mistake shows up at the output, but the weights that caused it are buried deep inside.

Now let's name that problem properly.

The network looks at an image and says "dog", the answer was "cat." There are dozens of neurons across many layers, each with their own weights, each contributing a little to the final answer. Which weights caused the mistake? The ones near the output? The ones buried three layers back? The wrong answer is visible. The cause is hidden.

This became known as the credit assignment problem: when the network gets it wrong, which weights were responsible? Which weights needed adjusting?

Researchers knew the multi-layer architecture was more powerful. They just couldn't figure out how to train it. To train a network, you need to know which weights to adjust. But the error only shows up at the output. There was no way to trace it back through the layers and know what caused it. The problem sat unsolved for two decades.

The problem of credit

Now let's name that problem properly.

This became known as the credit assignment problem: when the network gets it wrong, which weights were responsible? Which weights needed adjusting?