THINK · Jun 1, 2026

What is an AI agent? beginner's guide for developers

Everything a developer needs to understand about AI agents — the architecture, the tradeoffs, and how to decide if you should build one.

Agent-ready — drop this post into Claude Code or Codex

TL;DR: An AI agent is an LLM in a loop with tools that keeps going until a task is complete. This guide breaks down the three components — model, loop, and tools — with practical advice on choosing models, designing loops, and writing good tool descriptions. Plus when NOT to build an agent.

“AI agent” is one of those terms that everyone uses and nobody defines consistently. Ask five developers what an agent is and you’ll get six definitions.

I’ve been building agents for 18 months. Here’s my clearest definition: an AI agent is an LLM in a loop, with tools, that keeps going until a task is complete.

The Anthropic tool use documentation defines the same three-component pattern — model, loop, and tools — as the foundation of AI agent architecture. This guide echoes that framework.

That’s the core. Everything else — multi-agent systems, planning, memory — is an extension of this basic pattern.

Research on the ReAct agent loop (Yao et al., 2022) demonstrated that interleaving reasoning with tool use dramatically improves LLM performance on tasks requiring external knowledge.

Key takeaways:

  • An AI agent is an LLM in a loop, with tools, that keeps going until a task is complete
  • The three components are the model (LLM), the loop (control flow), and the tools (functions)
  • Model choice is the most important architectural decision — it affects cost, speed, and reliability
  • Most production agents aren’t general-purpose — they’re narrow, purpose-built vertical agents

What makes something an agent

The line between an LLM chat and an agent is fuzzy. Here’s how I draw it:

CapabilityLLM ChatSimple AgentProduction Agent
Responds to prompts
Maintains conversation context
Calls external tools
Executes multi-step plans
Recovers from errors
Tracks costs per run
Persists state across sessions
Runs autonomously

If your system calls an LLM and returns an answer, that’s a chatbot. If it calls an LLM, decides whether to call a tool or return a result, loops on tool results, and keeps going until a condition is met — that’s an agent.

The architecture

Every agent I’ve built shares the same core loop:

                 ┌─────────────────────────────┐
                 │          LLM                 │
                 │  (makes decisions based on   │
                 │   context + available tools) │
                 └──────────┬──────────────────┘

                ┌───────────┴───────────┐
                │                       │
         Tool call?              Final response?
                │                       │
                ▼                       ▼
      ┌─────────────────┐      ┌──────────────┐
      │ Execute tool    │      │ Return result │
      │ (read file,     │      └──────────────┘
      │  run command,   │
      │  call API, etc) │
      └────────┬────────┘


      ┌─────────────────┐
      │ Add result to   │
      │ context, loop   │
      └────────┬────────┘

               └──────────────→ back to LLM

This is the simplest possible architecture. It works for coding agents, research agents, data processing agents — most of what people call “agents” maps to this loop.

The three components

An agent has three components that you control directly. Everything else flows from these.

1. The model

The LLM that powers decisions. Different models have different tradeoffs:

ModelStrengthWeaknessCostBest For
Claude Sonnet 4Best tool-use, good reasoningSlower, pricier~$0.015/stepGeneral agent work
Claude Haiku 3.5Fast, cheapLess capable~$0.002/stepSimple extraction tasks
GPT-4oGood all-rounderMore verbose~$0.01/stepChat-based agents
GPT-4o-miniCheapStruggles with complex tasks~$0.0005/stepHigh-volume, simple tasks
Gemini 2.5 FlashVery fast, cheapLess reliable tool use~$0.0003/stepReal-time applications
DeepSeek V3Very cheapInconsistent quality~$0.001/stepBudget-constrained projects

The model choice is the most important architectural decision. It affects cost, speed, reliability, and what your agent can actually do.

2. The loop

The loop is the control flow — how the agent decides what to do next. The simplest loop is “call LLM → check for tool calls → execute tools → repeat.” More complex loops add:

  • Conditional branching — “if the tool returns an error, try an alternative approach”
  • Sub-goals — “break the main task into sub-tasks and tackle them sequentially”
  • Human-in-the-loop — “pause and ask for confirmation before executing a destructive action”
  • Timeout handling — “if this step takes too long, fail gracefully”

The loop defines your agent’s behavior more than the model does. A smart model with a bad loop produces unreliable results. A good loop with a weaker model still produces solid output.

3. The tools

Tools are how the agent interacts with the world. Each tool is a function with a name, description, and input schema that the LLM can understand.

Common tool categories:

# File operations
read_file(path) -> content
write_file(path, content) -> status
list_directory(path) -> files

# Code execution
run_command(command) -> output
evaluate_code(code) -> result

# Web access
search_web(query) -> results
fetch_url(url) -> content
scrape_page(url) -> structured_data

# Data operations
query_database(sql) -> rows
call_api(endpoint, payload) -> response
transform_data(input, spec) -> output

The tool descriptions are critical. The LLM decides which tool to call based on the description. A vague description produces wrong tool choices. A specific, well-written description produces correct tool choices.

# Bad description — LLM will misuse this
{
    "name": "search",
    "description": "Search for things"
}

# Good description — LLM uses this correctly
{
    "name": "search_web",
    "description": "Search the web for information. Use this when you need current data, documentation, or external references. Returns top 10 results with titles and snippets. Limit 1000 characters per result."
}

Different types of agents

Not all agents look the same. Here are the common patterns I’ve seen in production:

Coding agents

Claude Code, Cursor agent mode, and similar tools. They read files, write code, run commands, and loop on feedback. The key challenge: knowing when to stop and ask for human input.

Research agents

These take a question and gather information. They search the web, read pages, synthesize findings, and produce a report. The key challenge: evaluating source quality and avoiding hallucinated citations.

Workflow agents

These automate business processes: processing invoices, generating reports, triaging support tickets. They have clear inputs and outputs, and limited scope. The key challenge: handling real-world data variability.

Vertical agents

These are what I build for clients. Specialized agents that automate one specific workflow for one specific business. They’re the most reliable because their scope is tightly constrained.

When NOT to build an agent

Agents have real costs and complexity. Sometimes the right tool is simpler:

Use a prompt, not an agent, if: The task is a single LLM call that produces good enough output. Classification, summarization, simple extraction — these don’t need a loop.

Use a rule-based system, not an agent, if: The task has clear rules and doesn’t need LLM reasoning. Data validation, format conversion, scheduled tasks — write a script, not an agent.

Use a human, not an agent, if: The task requires judgment that you can’t clearly define, or the cost of an error is very high. Legal review, medical diagnosis, anything involving real money — right now, humans are safer.

The cost of agents

Agents are more expensive to run than chatbots because they make multiple LLM calls per task. Here’s a realistic breakdown:

  • Simple 3-step agent (read → analyze → respond): ~$0.05–$0.15 per run
  • Complex 15-step agent (research + analysis + generation): ~$0.50–$2.00 per run
  • Production agent processing 100 tasks/day: ~$5–$30/day for LLM costs

These numbers assume Sonnet-level models. Using Haiku or GPT-4o-mini can reduce costs by 5–10x, with some quality loss.

The cost that surprises most people: debugging. A buggy agent loop that keeps retrying can burn through $20 before you notice. Always set cost limits.


Related: How to build your first AI agent — a step-by-step tutorial from scratch, and Best AI agent frameworks for 2026 — comparing LangChain, CrewAI, and custom builds.

Where agents are going

The field is moving in three directions:

Better tool use. Models are getting better at choosing and using tools. The next generation of models (Claude Opus 5, GPT-5) will handle tool choice more reliably, reducing the work you need to do in system prompts.

Agent-to-agent communication. Multi-agent systems are getting practical. Instead of one agent doing everything, specialized agents will pass work to each other. The challenge is coordination — knowing when to hand off, what to pass, and how to verify.

Reliability engineering. The biggest problem with agents today isn’t capability — it’s reliability. An agent works 80% of the time on its own but 95% with the right infrastructure. The companies that solve reliability will win the market.

If you’re a developer who understands the core loop — LLM + tools + loop — you’re already ahead of most people calling themselves “AI engineers.” The rest is just engineering the details.

Related: The Vertical Agent Method — the framework behind how we build and ship AI agents.

Newsletter

Get the brief on AI agents

Practical posts on shipping agents, automating work, and building in public. No hype, no fluff.