What gets cut
When a conversation runs long enough, the context window fills up. Something has to go.
Different applications handle this differently. Some truncate the oldest messages — dropping the beginning of the conversation to make room for new turns. Some summarize earlier content. Some simply stop accepting new input.
Whatever the strategy, the result is the same: the model starts losing access to earlier parts of the conversation. It might forget that you told it to keep responses brief. It might repeat a suggestion it already made. It might lose track of a constraint you set twenty messages ago.
This is not a bug in the traditional sense. It's a structural limitation. The model can only reason about what it can currently see.
Understanding this has a practical implication: in long workflows, things you want the model to keep in mind throughout need to be present throughout — in the system prompt, or reintroduced periodically. Assumptions you set at the beginning may fade.
<!-- TODO: a "filling up" metaphor visualization — maybe a container that gradually fills and older items fall out — would make the mechanism feel intuitive -->