How to build your first AI agent in 2026 (tutorial)
A practical tutorial on building your first AI agent — choosing the right tools, setting up the agent loop, adding tools, and deploying it.
The Anthropic tool use documentation defines the same agent loop pattern — model, tools, and execution loop — as the foundation of production AI agents. This tutorial follows that exact architecture.
TL;DR: An agent is just an LLM in a loop with tools — this 20-line Python loop is all you need to get started. This tutorial builds a code review agent from scratch, covering the core loop, tool definitions, system prompts, and deployment. No ML experience required, just API orchestration.
The ReAct paper (Yao et al., 2022) first formalised this LLM-in-a-loop pattern, showing it dramatically outperforms standard prompting on tasks that require external knowledge and multi-step reasoning.
You’ve used ChatGPT. You’ve maybe used Claude or Copilot to help you code. But building an agent — something that takes actions on its own, loops on feedback, and makes decisions — feels like a different skill entirely.
It’s not. An agent is just an LLM in a loop with tools. That’s it. The magic is in the loop design, not the model.
This tutorial walks through building your first agent: a code review agent that reads files, analyzes them, and produces reports. By the end, you’ll have something running on your machine that does real work.
Key takeaways:
- An agent is just an LLM in a loop with tools — the magic is in the loop design, not the model
- Building from scratch teaches you the fundamentals before you layer on framework abstractions
- A working code review agent can be built in under 50 lines of Python
- The hardest part of production agents isn’t the loop — it’s reliability, cost control, and scope management
What is an agent, really?
Here’s the simplest definition I can give:
An agent = LLM + loop + tools
- The LLM makes decisions (what to do next)
- The loop keeps it going until some condition is met
- The tools let it interact with the world (read files, run commands, call APIs)
A chatbot is an LLM that responds once. An agent is an LLM that keeps going — observing, deciding, acting, and repeating until the job is done.
This is what I call the Vertical Agent Method — build narrow, purpose-built agents that replace one specific workflow, not general-purpose assistants. Our code review agent is a perfect example: it does one thing (review code) and is designed specifically for that workflow. The focus is what makes it reliable.
Choosing your stack
In 2026, you have three main approaches to building agents:
- Use an agentic IDE (Claude Code, Cursor) — great for coding tasks, limited customization
- Use a framework (LangGraph, CrewAI) — good for complex workflows, but adds abstraction overhead
- Build from scratch — full control, minimal dependencies, you understand every line
For this tutorial, we’re building from scratch. Not because frameworks are bad, but because you need to understand the loop before you let a framework manage it for you.
What you’ll need
- Python 3.11+
- An API key from Anthropic or OpenAI (I’ll use Anthropic’s Claude because it’s better at tool use)
- Basic Python knowledge
The core agent loop
Here’s the simplest agent loop that actually works:
import json
from anthropic import Anthropic
client = Anthropic()
def run_agent(system_prompt, messages, tools, max_turns=10):
messages = [{"role": "system", "content": system_prompt}] + messages
for turn in range(max_turns):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=messages,
tools=tools,
)
messages.append({"role": "assistant", "content": response.content})
# Check if the model wants to use a tool
tool_uses = [b for b in response.content if b.type == "tool_use"]
if not tool_uses:
# No tool calls — we have a final response
return response.content[0].text
# Execute each tool
for tool_use in tool_uses:
result = execute_tool(tool_use.name, tool_use.input)
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": str(result),
}]
})
return "Max turns reached without completion."
That’s the entire loop. Twenty lines. The agent gets a system prompt, a list of messages, and a set of tools. It calls the LLM, checks if it used a tool, executes the tool if it did, and feeds the result back. Repeat until the LLM gives a final answer.
Adding tools
Tools are just functions with descriptions. The LLM decides which to call based on the description. Here are the tools for our code review agent:
tools = [
{
"name": "read_file",
"description": "Read the contents of a file at the given path",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Path to the file"}
},
"required": ["path"]
}
},
{
"name": "list_directory",
"description": "List files in a directory",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Directory path"}
},
"required": ["path"]
}
},
{
"name": "run_command",
"description": "Run a shell command and get its output",
"input_schema": {
"type": "object",
"properties": {
"command": {"type": "string", "description": "Command to run"}
},
"required": ["command"]
}
}
]
And the corresponding execute function:
import os
import subprocess
def execute_tool(name, args):
if name == "read_file":
with open(args["path"], "r") as f:
return f.read()
elif name == "list_directory":
return "\n".join(os.listdir(args["path"]))
elif name == "run_command":
result = subprocess.run(
args["command"], shell=True, capture_output=True, text=True, timeout=30
)
return f"STDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}"
return f"Unknown tool: {name}"
The system prompt
The system prompt is where you shape the agent’s behavior. For a code review agent:
system_prompt = """You are a code review agent. Your job is to analyze code and produce a structured review.
For each file you review:
1. Read the file
2. Analyze it for bugs, style issues, and security concerns
3. Note any patterns that could be improved
When you have reviewed all relevant files, produce a final report with:
- Summary of findings (bullet points)
- Critical issues (must fix)
- Warnings (should fix)
- Suggestions (nice to have)
Be thorough but practical. Not every style preference is a bug.
"""
Putting it together
review = run_agent(
system_prompt=system_prompt,
messages=[{"role": "user", "content": "Review the code in /path/to/project"}],
tools=tools,
max_turns=25,
)
print(review)
When you run this, the agent will:
- List the directory to understand the project
- Read each relevant file
- Run linting or type-checking commands
- Produce a structured review
What can go wrong
Building agents from scratch means you encounter every edge case personally. Here are the ones that hit me first:
Infinite loops. The agent keeps calling tools without converging. Fix: set max_turns and log tool call counts.
Blowing through tokens. Reading a 5,000-line file fills the context window fast. Fix: truncate file reads to the first 200 lines, or read specific sections.
Tool call failures. The agent tries to read a file that doesn’t exist. Fix: wrap tool execution in try/except and return a helpful error message.
Cost surprises. One agent run with Sonnet on a medium project costs about ₹30–₹80. For a demo it’s fine. For production, add cost tracking.
# Simple cost tracker
cost_per_input_token = 3e-06 # $0.003 per 1K input tokens (Sonnet)
cost_per_output_token = 15e-06 # $0.015 per 1K output tokens
def track_cost(usage):
input_cost = usage.input_tokens * cost_per_input_token
output_cost = usage.output_tokens * cost_per_output_token
return input_cost + output_cost
Deploying it
For a personal agent, the simplest deployment is a CLI script. But if you want it running as a service:
# review_server.py — accepts requests and runs reviews
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class ReviewRequest(BaseModel):
repo_path: str
depth: str = "standard" # quick, standard, thorough
@app.post("/review")
def run_review(req: ReviewRequest):
result = run_agent(
system_prompt=system_prompt,
messages=[{"role": "user", "content": f"Review {req.repo_path}"}],
tools=tools,
max_turns=50 if req.depth == "thorough" else 15,
)
return {"review": result}
Deploy this on a Railway or Fly.io instance, and you have a code review API for ₹500/month including inference costs.
What’s next
This agent is basic. It has no memory across sessions, no caching, no parallel tool execution, no streaming. But it works. It reviews real code and produces useful reports.
The step from “prompting an LLM” to “building an agent” is smaller than most developers think. You’re not writing a new paradigm — you’re putting a loop around something you already know how to use.
The hard part is what comes after: making the agent reliable, cost-effective, and actually useful in production. That’s where the next 40 hours go. But the first hour — the one where you write the loop and watch it work — is the most important one. It proves the concept is real.
Related: Best AI Agent Frameworks 2026 — a comparison of LangGraph, CrewAI, and AutoGen for production use.
Also: Cursor vs Claude Code vs Copilot — how AI coding tools compare for daily development work.
Related: OpenAI function calling tutorial: building tools for GPT in 2026 — how to use OpenAI function calling to build tools for your first agent.
Copy the 20-line loop above, pick a tool (even a simple one like read_file), and run it. If you have an Anthropic API key, you'll have a working agent in 10 minutes. The gap between theory and practice in agents is mostly just running the loop once.
Related: The Vertical Agent Method — the framework behind how we build and ship AI agents.