Gradient descent
Training is a search. You're looking for the combination of weights that makes the fewest mistakes. There are billions of weights. The possible combinations are astronomical.
Picture the error as a landscape of hills and valleys. Every combination of weights puts you somewhere on that landscape. High ground means a lot of mistakes. You want the lowest valley you can find.
Gradient descent is how you navigate. At each step, check which direction the ground slopes downward and take a small step that way. Then check again. Repeat, thousands of times, across millions of examples. The weights gradually settle toward a valley — a configuration that performs well.
The size of each step matters. Steps too large and you skip over the valley entirely. Steps too small and training takes forever. The step size is called the learning rate, and finding the right one is part of the craft of building networks.
Backpropagation tells you which direction is downhill. Gradient descent tells you how to walk.