Giving the model tools

Start with the gap. The model can describe a good search but cannot run it. It can write the code to fetch a file but cannot execute it. It can lay out a perfect plan but cannot take a single step of it. It is a brilliant mind with no hands.

An agent is that model, given hands.

The mechanism is simpler than it sounds. You hand the model a short list of tools.

A tool is just a button with a label saying what it does: "search the web," "run this code," "read this file," "send this email." The model does not press the button itself. It cannot. It is only a text machine.

Instead, when it decides a tool is needed, it writes out a request in a fixed format, like "search the web for X." A plain piece of software sitting around it watches for that request, actually presses the button, and drops the result back into the model's context window, the board from the last module.

The model reads the result and writes its next move.

So nothing mystical was added. The model still only produces text. What changed is that some of its text is now treated as a command by the software wrapped around it, and the answer comes back onto the board. That thin wrapper, text out, action taken, result back in, is the entire trick that turns a talker into a doer. Not just answering questions about a codebase, but editing and running it. Not just describing the steps, but taking them.

The tools themselves are ordinary, boring software that already existed. The new ingredient is a model sitting in the middle, deciding which button to ask for and when.

Giving the model tools

An agent is that model, given hands.

The mechanism is simpler than it sounds. You hand the model a short list of tools.

The model reads the result and writes its next move.

The tools themselves are ordinary, boring software that already existed. The new ingredient is a model sitting in the middle, deciding which button to ask for and when.