What the model sees

Every time a language model generates a response, it starts from scratch. There's no ongoing memory between conversations. No accumulation of past interactions. No persistent awareness.

What the model gets is the context window: a single, long block of text that contains everything relevant to the current exchange. Your message. The conversation history. Any documents or instructions that were passed in. A system prompt from the application configuring the model's behavior. All of it, concatenated into one input.

The model reads that input in a single pass and generates what comes next. That's the complete picture of what it knows in the moment.

Context windows have grown significantly. Early models handled a few thousand tokens. Current models handle hundreds of thousands — enough for entire books. But the limit still exists, and it shapes every interaction.

<!-- TODO: a visual of the context window as a scroll or container with different colored zones (system prompt, history, current message) would help make the architecture tangible -->