What gets cut

When a conversation runs long enough, the context window fills up. Something has to go.

Different applications handle this differently. Some truncate the oldest messages — dropping the beginning of the conversation to make room for new turns. Some summarize earlier content. Some simply stop accepting new input.

Whatever the strategy, the result is the same: the model starts losing access to earlier parts of the conversation. It might forget that you told it to keep responses brief. It might repeat a suggestion it already made. It might lose track of a constraint you set twenty messages ago.

This is not a bug in the traditional sense. It's a structural limitation. The model can only reason about what it can currently see.

Understanding this has a practical implication: in long workflows, things you want the model to keep in mind throughout need to be present throughout — in the system prompt, or reintroduced periodically. Assumptions you set at the beginning may fade.

<!-- TODO: a "filling up" metaphor visualization — maybe a container that gradually fills and older items fall out — would make the mechanism feel intuitive -->