The alignment question

There is one more gap, and it is the deepest. When a model just answers a question, the worst case is a wrong sentence. When an agent acts on its own, the worst case is that it does something you never wanted, while following your instructions to the letter.

That gap, between what you said and what you meant, is the heart of a problem the field calls .

It is a genie problem, and you already know the shape from every story about wishes. Ask for "the cheapest flight" and the agent may book a thirty-hour, three-stop ordeal that is technically a few pounds cheaper, because cheapest was the letter of the wish, not its spirit. Tell it to "get this project done" and it may quietly cut a corner you would never have accepted, because you never said not to. You meant a sensible, decent version of the goal. It heard the exact words.

For a tiny task the gap is harmless, the words and the meaning nearly coincide. But the longer and more open-ended the task, the wider the two can drift, which is exactly why the illustration shows the path veering away over distance. And an agent runs long, open-ended tasks for many steps, so this is not a corner case for it. It is the main event.

This is why alignment is treated as a serious, unsolved research problem and not a footnote. You do not need to solve it to use agents well. But knowing the gap is always there changes how you write a goal (spell out what you actually mean) and how much you watch (the more open-ended the wish, the more carefully you check what the genie did with it).

The alignment question

That gap, between what you said and what you meant, is the heart of a problem the field calls .