The shift from text-in, text-out
Until a year or two ago, an LLM was a black box that took text and returned text. Now, every major model supports tool use: you give it a list of functions with JSON schemas, and the model can decide to call one.
The flow is:
- You send the user message + a list of tools.
- The model either replies in text, or replies with
tool_use: { name: "weather", input: { city: "Paris" } }. - You notice the tool-use reply, actually execute the tool, and
feed the result back with a
tool_resultmessage. - The model continues, now having the tool's output to work with.
You are the runtime. The model is the orchestrator. This is the foundation of every "AI agent" framework you've seen.
A minimal tool definition
const tools = [
{
name: "get_weather",
description: "Returns current weather for a city.",
input_schema: {
type: "object",
properties: {
city: { type: "string" },
},
required: ["city"],
},
},
];
Three things to notice:
- Name is what the model uses to call it. Pick something specific.
- Description is what the model reads to decide when to call it. This is the most important field. Bad description → model calls it at wrong times or not at all.
- input_schema is standard JSON Schema. The model will produce input that conforms to it (mostly — always validate).
Handling the call
const res = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
tools,
messages,
});
if (res.stop_reason === "tool_use") {
const call = res.content.find((b) => b.type === "tool_use");
const result = await runTool(call.name, call.input);
// Send the result back and let the model continue.
const next = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
tools,
messages: [
...messages,
{ role: "assistant", content: res.content },
{
role: "user",
content: [{
type: "tool_result",
tool_use_id: call.id,
content: JSON.stringify(result),
}],
},
],
});
}
Notice the back-and-forth: the assistant's tool-use reply becomes
part of the conversation history; you append a tool_result as the
user's next turn.
What to expose — and what not to
A tool is a piece of your API the model can hit. Same threat model as any other untrusted caller:
- ✅ Read-only lookups (
get_user,search_docs,get_weather). - ✅ Scoped writes with clear blast radius (
create_draft_email,log_challenge— something the user can review before it ships). - ⚠️ Anything that spends money or sends irrevocable actions. Wrap with confirmation, rate limits, and allowlists.
- ❌ Direct shell execution,
eval,DROP TABLE. No amount of prompt engineering saves you here.
The auth-gate tax
Every route in your backend that spends real money (LLM calls, Devin sessions, third-party APIs) needs an auth check — not because attackers go after your UI, but because scripted callers go after your API endpoints.
We found three unguarded routes in this project's MVP:
/api/devin, /api/agent-chat, /api/tutor. Each one burned real
Anthropic / Devin budget. The fix is always the same shape:
const session = await getServerSession(authOptions);
if (!session?.user) {
return new Response("Sign in.", { status: 401 });
}
Treat this as load-bearing. Any time you add a new route that calls an LLM, the auth check goes in first, before the provider-SDK call.
Homework
Pick one of your keyboard-level skills — a thing you do a lot in your shell. Wrap it as a tool the model can call. Example: "given a filename, return the first 50 lines and a 3-bullet summary."
Write the tool description carefully — it's the only thing the model sees when deciding whether to call it. Share your tool description with the AI tutor in the right sidebar and ask: "Would you call this tool for the task X?"
Next: cost, errors, and putting it in production.
Inspired by Anthropic's "Building with the Claude API" and "Introduction to agent skills".