Depth vs. width
Why add more layers at all? You could just make one layer very wide — add more neurons to it and, in theory, it could handle more complex problems.
The answer comes down to how complex patterns are built.
Think about how you recognize a face. You don't look at the raw pixels and immediately know whose face it is. You notice edges first. Then you see how those edges form shapes — a nose, an eye, a jawline. Then those shapes combine into a face. Then the face becomes a specific person.
That's a hierarchy: simple things combining into more complex things, step by step.
A single wide layer has to do all of that in one step. A deep network can do it the natural way: one level of abstraction per layer. The first layer notices edges. The second notices shapes. The third notices faces. Each layer builds on what the previous one found.
The result: you need far fewer neurons total to do the same job. Depth isn't just possible. It's more efficient. It matches the structure of the problem.
Going deeper wasn't a stylistic choice. It was the right shape for the task.