What happens when you just make it bigger

The transformer existed by 2017. The obvious next question was the one a child asks about anything that works: how big can we make it?

Bigger has a precise meaning here. More layers in the tower we built last module. More parameters, those millions of little weights, now grown into billions. More dimensions in the space of meaning where each word lives. And more text to train on, eventually most of the readable internet.

You might expect a catch. In most of engineering, making a thing bigger runs into walls. A taller building needs a wider base. A bigger engine overheats. For most of AI's history before this, the same was true: pile on more and the gains tapered off, then stopped. We saw exactly that in the long-winter module, where each wave of optimism scaled up its methods and hit a ceiling.

So the surprise was not that bigger helped. It was that bigger kept helping, smoothly, far past the point where anyone expected a ceiling. To see why that was such a big deal, we need to look at the shape of the improvement, not just the fact of it.

What happens when you just make it bigger

The transformer existed by 2017. The obvious next question was the one a child asks about anything that works: how big can we make it?