Train once, adapt anywhere

The transformer came with a way of working that has shaped AI ever since, and you have already met both halves of it. It comes in two stages with very different costs.

First, the big, expensive, general stage: pretraining. Take a transformer and a huge slice of the internet, books, articles, code, and set it the one simple game from the learning module: predict the next word, over and over, billions of times. This is the brutal, costly part, the months of GPUs running flat out that we saw turn random numbers into knowledge. Out of nothing but next-word practice, the model soaks up grammar, facts, reasoning patterns, the shape of language itself. What you get is a : fluent, knowledgeable, but raw. It only knows how to continue text, not how to be helpful. This is exactly the base model from the learning module.

Then the small, cheap, specific stage: fine-tuning. You take that one pretrained base and train it a little further on a small, targeted set of examples to shape its behaviour. Show it thousands of polite, helpful question-and-answer exchanges and it becomes a chat assistant. Show it medical texts and it sharpens for medicine. The heavy lifting is already done; fine-tuning just points the existing knowledge in a direction. This is the same fine-tuning that, back in the learning module, turned a raw text-continuer into the assistant you actually talk to.

The split mattered because the expensive part only had to happen once. That single costly base model can then be cheaply fine-tuned by dozens of different teams into dozens of different tools, none of them paying the enormous pretraining bill again. One foundation, many buildings on top. It is why a handful of base models now sit underneath a vast sprawl of AI products.

Which leaves one question hanging, the one the whole next module is about. We keep saying "make it bigger." What actually happened when these base models were scaled up far past anything tried before? The answer was not what anyone expected.

Train once, adapt anywhere

The transformer came with a way of working that has shaped AI ever since, and you have already met both halves of it. It comes in two stages with very different costs.