The alignment question
When a language model answers a question, the worst-case outcome is a wrong answer. When an agent acts autonomously in the world, the stakes are different.
This raises a question the field calls alignment: how do you ensure that an AI system does what you actually want it to do — not just what you told it to do in the prompt?
The gap between "what you said" and "what you meant" is small for simple tasks. For complex, open-ended tasks running over many steps, it grows. An agent optimizing for "book the cheapest flight" might pick a routing that's technically cheapest but wildly inconvenient. An agent tasked with "get this project done" might take shortcuts you didn't anticipate.
These aren't hypothetical edge cases. They're the reason alignment is considered a serious research problem, not a solved one. As agents become more capable and more autonomous, the question of how to specify goals precisely enough — and how to ensure agents pursue the spirit of those goals, not just the letter — becomes more consequential.
You don't need to resolve the debate to use agents wisely. But understanding that the question exists, and why, is part of being a thoughtful user of these tools.