Billions of numbers

A large language model is, at its core, a very large collection of numbers.

GPT-3 has 175 billion parameters. GPT-4 is thought to have over a trillion. Each parameter is a number — a weight — that was adjusted, in tiny increments, over trillions of training steps, until the model got good at predicting text.

Those numbers are the model. Everything it "knows" is encoded in them.

This is a strange kind of knowledge. There's no database of facts. No lookup table. No list of things the model has memorized. The information is distributed across billions of weights in a form that no human can read directly. When the model recalls that the speed of light is approximately 300,000 kilometers per second, that fact isn't stored in any particular place — it's the emergent result of many weights activating together in response to the prompt.

This distributed nature has consequences that are easy to miss if you're used to how computers normally store information.