Building your own Tooling Calling Agent

Any sufficiently advanced technology is indistinguishable from magic.
— Arthur C. Clarke

At times I feel like Claude Code is this magical black box that I can give tasks to and get working code out of. This post attempts to shine light on that black box by building our own agent harness around a model.

To do this, I will use Anthropic’s API to build a simple code review harness, showing how to take a simple chatbot interface into a fully featured developer application.

Why models need a harness

At its core, an LLM is a text completion engine. If you take a look at Anthropic’s Messages API, it re-inforces this. There is only one API!

POST /v1/messages
> Send a structured list of input messages with text and/or image
> content, and the model will generate the next message in the
> conversation.

That’s it. That’s the only interaction possible with the LLM. So how do we turn that into a useful tool for software development? The key is to write some additional software that around the model’s API that combines several repeated message turns with the ability to use tools to complete work. Anthropic describes this as the agentic loop, which has two core components: the model that reasons about what to do based on a request, and tools that take actions requested by the model. Those actions can be gathering additional context by, for example, reading from a file, or completing a task POSTing a request to an API.

The agentic loop is the mechanism that turns the model from a message completion API into something that is useful for real work: the model provides the intelligence and reasoning, while the harness provides the tools, context, and environment to work done in the real world.

Having a Simple Conversation with the Messages API

Let’s take a deeper look at the Messages API. For an example of how this works, you can send a message with the content {"role": "user", "content": "Hello, Claude"} to Claude to get a completion of the conversation.

#!/bin/sh
curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-opus-4-8",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, Claude"}
    ]
}'

This returns the response:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-opus-4-8",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 12,
    "output_tokens": 6
  }
}

The Messages API is completely stateless, which means that you always send the full conversational history to the API. To build up a conversation over time, you have to send every previous message in the conversation, or a compression of the relevant parts.

You likely noticed the additional metadata in the messages defining the role. Claude’s models are trained to recognize key words that define who is talking at a moment in time. If a message has role user it represents user input, while assistant represents LLM responses.

This back and forth is better represented when you send the full conversational history to the API. For example, the following set of messages represents a conversation in progress.

"messages": [
    {"role": "user", "content": "Hello, Claude"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "user", "content": "Can you describe LLMs to me?"}

]

If we send this to the LLM, the response is the next statistically likely message in the conversation:

{
  "id": "msg_018gCsTGsXkYJVqYPxTgDHBU",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Sure, I'd be happy to describe..."
    }
  ],

The Messages API also includes a system role, which allows you to focuses the LLM’s behavior and tone for your use case. system prompts can be considered top-level messages in the conversation history that always apply throughout the entire conversation. For example, an API call with the following content defines the role that the LLM should play throughout all turns of the conversation:

system="You are a helpful coding assistant specializing in Python.",
messages=[
    {"role": "user", "content": "How do I sort a list of dictionaries by key?"}
]

Using Tools

The model does great at reasoning through a conversation. But a conversation on its own is not useful for getting real work done. For this purpose, the model developers like Anthropic have standardized APIs that define how a model can use tools to complete its work.

When you use the Messages API, you can tell the model which tools it has access to and a description of what the tool does. Then, as the model reasons about the response for a conversation, it can choose to use a tool to gather more context that can help it understand the problem and solution better, or it can choose to use a tool to complete some task.

For example, if I have a code review agent, I would need to be able to read files from the local filesystem and a tool for doing that would be great. When I tell the model that I have a tool it can use, I describe what the tool does, and what the input parameters to the tool are. In JSON format, a tool looks something like this:

{
"name": "file_read",
"definition": {
  "description": "Read file content.",
  "parameters": {
    "type": "object",
    "properties": {
      "file_path": {
        "type": "string"
      },
      "start_line": {
        "type": "integer",
        "default": 1
      },
      "end_line": {
        "type": "integer"
      }
    },
    "additionalProperties": false
  }
}

For Claude’s Messages API the response when using a tool will include some metadata with stop_reason: "tool_use" and one or more tool_use blocks. The application code is responsible for parsing the tool_use block, running the tool, and then sending back the response of the tool to the LLM.

For example, in the conversation below, the model decided that reading the file main.go would be helpful in completing the code review task, and so it returns a "stop_reason": "tool_use" along with a message of type tool_use asking to use the tool.

{
  "id": "msg_01Aq9w938a90dw8q",
  "model": "claude-opus-4-8",
  "stop_reason": "tool_use",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "I'll read the file main.go to understand the diff"
    },
    {
      "type": "tool_use",
      "id": "toolu_01A09q90qw90lq917835lq9",
      "name": "file_read",
      "input": { "file_path": "./main.go" }
    }
  ]
}

The application would execute the tool, and then send back the output of the tool to continue the message sequence.

{
  "role": "user",
  "content": [
    { "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": [{"type": "text", "text": "<file contents>"}]
    },
    { "type": "text", "text": "Here is the content of the file. Continue reviewing the code." }
  ]
}

Claude will decide based on the current conversation history whether to call a tool or respond directly. The model will respond directly for data it is already trained on, data in the existing conversation supplied to the model, and creative tasks that do not require additional context. In cases where the model does not feel like it can answer the question with the data already available, and if there is a tool available whose description matches the content that a user is asking about, the model will ask to run a tool to gather more data and context.

The Agentic Loop

We have now built up enough context to understand how to build a simple code review tool using an LLM. The overall workflow is something like this:

Initialize
- Start with a single user message and/or the existing conversation history.
Send
- On each turn, send the full conversation history, system prompt, and tool definitions to the model.
Receive
- Parse the response and check stop_reason:
  - end_turn: The model is done. Extract the text and return it.
  - tool_use: The model wants to call one or more tools. Check the tool_use block for the tool to call and any parameters.
Execute
- Run each requested tool locally (read a file, run a shell command) and collect the results.
Respond
- Send the tool results back as a user message with tool_result blocks.
Repeat
- The model now has more context. It may call more tools or produce its final response.
- The loop continues until the model returns end_turn.

One key to understanding how this works is that the model drives the workflow, not the programmer. The model decides which tools to call, and when it has enough context to respond accurately and end the current conversational turn. The coding harness only executes what the model asks for and feeds the results back.

Building a Code Review Agent

The following example is built using Bedrock to access the Anthropic Messages API. The principles will be the same no matter the implementation choices.

We start our implementation by defining the tools available to the model. These are in JSON format using the schema defined by Anthropic’s Messages API. To keep things simple for an example, this is scoped to just a reading a file and executing a command.

TOOLS = [
    {
        "name": "file_read",
        "description": "Read the contents of a file.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {
                    "type": "string",
                    "description": "Path to the file to read"
                }
            },
            "required": ["file_path"]
        }
    },
    {
        "name": "run_command",
        "description": "Run a shell command and return its output. Use for git commands like git diff.",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "The shell command to execute"
                }
            },
            "required": ["command"]
        }
    }
]

It’s up to us to actually implement the tools for run_command and file_read, but I will skip that for now so we can get on to the actual agent loop. This is just standard Python code.

We also define a system prompt to frame the conversation. As a reminder, this sytem prompt creates a role for the model to follow when reasoning.

SYSTEM_PROMPT = """You are a code review agent. Your job is to review code changes in a git repository.

Now we are ready for the agent loop. The loop takes a user_message as a paremeter. This starts as a basic question to help with code review. As the model requests additional context, we append those messages into the conversation history. This repeats until the model provides a final answer. The core loop is fairly small, and I added some comments around the key concepts.

def agent_loop(user_message):
    # Initialize with the starting message
    messages = [{"role": "user", "content": user_message}]

    # We allow the agent to make up to 20 different requests to answer
    # the prompt. If it fails in that time, return
    for turn in range(20):
        # Make a single user request and gather the response.

        # Each request includes all messages so far, the system prompt,
        # and any available tools.

        # The model is completely stateless. You need to
        # provide all context to the model for every request.
        response = client.invoke_model(
            modelId=MODEL_ID,
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "system": SYSTEM_PROMPT,
                "tools": TOOLS,
                "messages": messages
            })
        )
        result = json.loads(response["body"].read())

        # Gather metadata
        stop_reason = result["stop_reason"]
        content = result["content"]

        # Add the models’ response to the conversation history
        messages.append({"role": "assistant", "content": content})

        if stop_reason == "end_turn":
            # The model completed reasoning and returned a full response
            # Gather the text of the response and return it for output
            return "".join(block["text"] for block in content if block["type"] == "text")

        # The model is continuing to reason and requires more context
        tool_results = []
        for block in content:
            # Find any requests for tool use
            if block["type"] == "tool_use":
                print(f"  Tool: {block['name']}({json.dumps(block['input'])[:80]})")
                # Execute the tool that the model asked for and store the result
                output = execute_tool(block["name"], block["input"])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block["id"],
                    "content": output
                })

        # Add the result of tool use to the conversation history
        messages.append({"role": "user", "content": tool_results})

    return "Max turns reached without a final response."

What’s neat about this loop is that it is agnostic to the actual problem at hand — the same structure can be used for a code review agent, a calendar assistant, or anything else. As long as we can describe and implement the tools that the model has access to, and choose a suitable system prompt and beginning message, the rest of the loop remains the same.

If you followed along this far, I hope it removed some of the magic from Claude Code by showing how an agent harness is really a tool for managing conversation history and executing tools on behalf of the model for interacting with the real world. What’s interesting about this architecture is that the model itself decides what the correct workflow is to resolve a particular problem, and the application developer is responsible for providing the correct tools to the model so that it can resolve the workflow. The model is still where the magic happens, we are just helping it along the way.

The result is a simple working Python agent that can read files, run shell commands, and produce structured code reviews, all driven by a compact agentic loop where the model decides what context it needs and when it has enough to respond.

Putting everything from this post together, this is the full implementation of a basic tool using agent harness.

import boto3
import json
import subprocess

session = boto3.Session(profile_name="dev")
client = session.client("bedrock-runtime", region_name="us-east-1")

MODEL_ID = "us.anthropic.claude-sonnet-4-6"

TOOLS = [
    {
        "name": "file_read",
        "description": "Read the contents of a file. Can optionally read a specific line range.",
        "input_schema": {
            "type": "object",
            "properties": {
                "file_path": {
                    "type": "string",
                    "description": "Path to the file to read"
                },
                "start_line": {
                    "type": "integer",
                    "description": "Optional 1-based start line"
                },
                "end_line": {
                    "type": "integer",
                    "description": "Optional 1-based end line"
                }
            },
            "required": ["file_path"]
        }
    },
    {
        "name": "run_command",
        "description": "Run a shell command and return its output. Use for git commands like git diff.",
        "input_schema": {
            "type": "object",
            "properties": {
                "command": {
                    "type": "string",
                    "description": "The shell command to execute"
                }
            },
            "required": ["command"]
        }
    }
]


def execute_tool(tool_name, tool_input):
    if tool_name == "file_read":
        return tool_file_read(tool_input)
    elif tool_name == "run_command":
        return tool_run_command(tool_input)
    return f"Unknown tool: {tool_name}"


def tool_file_read(tool_input):
    file_path = tool_input["file_path"]
    start_line = tool_input.get("start_line")
    end_line = tool_input.get("end_line")
    try:
        with open(file_path, "r") as f:
            lines = f.readlines()
        if start_line or end_line:
            start = (start_line or 1) - 1
            end = end_line or len(lines)
            lines = lines[start:end]
        numbered = [f"{i + (start_line or 1)}: {line}" for i, line in enumerate(lines)]
        return "".join(numbered)
    except FileNotFoundError:
        return f"Error: File not found: {file_path}"
    except Exception as e:
        return f"Error reading file: {e}"


def tool_run_command(tool_input):
    command = tool_input["command"]
    try:
        result = subprocess.run(
            command, shell=True, capture_output=True, text=True, timeout=30
        )
        output = result.stdout
        if result.stderr:
            output += "\n" + result.stderr
        return output or "(no output)"
    except subprocess.TimeoutExpired:
        return "Error: Command timed out after 30 seconds"
    except Exception as e:
        return f"Error running command: {e}"


SYSTEM_PROMPT = """You are a code review agent. Your job is to review code changes in a git repository.

Start by running `git diff` to see the current unstaged changes. If there are no unstaged changes, try `git diff --cached` for staged changes.

For each changed file, read the relevant parts of the file to understand the surrounding context, then provide a thorough review.

Focus on: bugs, logic errors, security issues, performance problems, and readability.

Organize your review by file. Be specific and actionable."""


def agent_loop(user_message):
    messages = [{"role": "user", "content": user_message}]

    for turn in range(20):
        response = client.invoke_model(
            modelId=MODEL_ID,
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": 4096,
                "system": SYSTEM_PROMPT,
                "tools": TOOLS,
                "messages": messages
            })
        )
        result = json.loads(response["body"].read())
        stop_reason = result["stop_reason"]
        content = result["content"]

        messages.append({"role": "assistant", "content": content})

        if stop_reason == "end_turn":
            return "".join(block["text"] for block in content if block["type"] == "text")

        tool_results = []
        for block in content:
            if block["type"] == "tool_use":
                print(f"  Tool: {block['name']}({json.dumps(block['input'])[:80]})")
                output = execute_tool(block["name"], block["input"])
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block["id"],
                    "content": output
                })
        messages.append({"role": "user", "content": tool_results})

    return "Max turns reached without a final response."


if __name__ == "__main__":
    print("Starting code review agent...\n")
    review = agent_loop("Please review the code changes in this repository.")
    print("\n--- Code Review ---\n")
    print(review)

Why models need a harness#

Having a Simple Conversation with the Messages API#

Using Tools#

The Agentic Loop#

Building a Code Review Agent#

Why models need a harness

Having a Simple Conversation with the Messages API

Using Tools

The Agentic Loop

Building a Code Review Agent