What is OpenAI function calling?

Function calling is an OpenAI API feature where the model can request execution of functions you define. Instead of generating text only, the model outputs structured JSON tool call requests that your application executes and returns results for.

How do I handle parallel function calls?

When the model requests multiple function calls in a single response, iterate through all tool calls, execute each one, append all results to the messages array, and send back to the API. The model processes all results together.

Does function calling work with streaming?

Yes. In streaming mode, tool_calls arrive as chunks with a unique index per call. You accumulate the full function name and arguments by aggregating chunks with the same index. Once all chunks are processed, you execute the function and send the result.

How is OpenAI function calling different from Anthropic tool use?

OpenAI uses JSON Schema for tool definitions and returns tool_calls as structured objects. Anthropic uses a similar schema format but requires a separate 'thinking' block before tool calls and handles tool results differently in the API. OpenAI's approach is more straightforward for simple use cases.

What breaks function calling?

Ambiguous schemas (overlapping or poorly described parameters), contradictory system instructions about when to call functions, and models that refuse to call functions despite clear instructions. Good tool descriptions and explicit reasoning prompts help significantly.

OpenAI function calling tutorial: building tools for GPT

A complete guide to OpenAI function calling — defining tools, handling parallel calls, streaming, and building a tool-using agent from scratch.

TL;DR: Function calling turns a chat model into something that can actually do things — query databases, call APIs, and compute results. This guide covers defining tools with JSON Schema, handling parallel calls, streaming with tool call deltas, and building a complete agent loop in 80 lines of Python with no frameworks.

Function calling is the single most important primitive in building AI agents. It’s what turns a chat model from a text generator into something that can actually do things — query databases, call APIs, send emails, compute results.

I’ve built agents using both OpenAI’s and Anthropic’s tool use APIs. Here’s my complete guide to OpenAI function calling, built from production experience rather than documentation examples.

Key takeaways:

Function calling lets the model request structured function execution — it doesn’t execute functions itself, it asks you to do it

Define tools as JSON Schema objects in the tools parameter alongside messages

Parallel function calling means the model can request multiple tools in a single response — handle them all before returning results

Streaming with function calls works by collecting partial tool_calls delta chunks by index

A complete agent loop needs just OpenAI’s SDK — no frameworks required

OpenAI’s function calling documentation defines the standard for tool-use APIs — models that accept structured tool definitions and return callable function invocations. This is the most widely adopted tool-use format in the industry.

What function calling actually is

The name is misleading. OpenAI’s function calling doesn’t mean the model calls functions on your computer. The model outputs a structured request that says “I want to call this function with these arguments.” Your code decides whether to execute it.

The flow looks like this:

User: "What's the weather in Bengaluru?"

Model: "I should check the weather API."
       ↓
Model outputs: { "function": "get_weather", "args": { "location": "Bengaluru" } }
       ↓
Your code executes get_weather("Bengaluru") → "26°C, partly cloudy"
       ↓
You send the result back to the model

Model: "The weather in Bengaluru is 26°C and partly cloudy."

The model never touches your API keys, never executes code on your server. It just requests tool execution. You control what runs.

Defining tools

Tools are defined as JSON Schema objects. Each tool has a name, description, and parameters schema. The description is critical — it’s how the model knows when to call the tool.

import openai

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location. Returns temperature, conditions, humidity, and wind speed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. 'Bengaluru, India' or 'San Francisco, CA'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units. Defaults to celsius for India, fahrenheit for US."
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_air_quality",
            "description": "Get air quality index and PM2.5 data for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]

Rule of thumb for descriptions: Describe when to call the function, not just what it does. A function name like get_weather is obvious. The description should clarify edge cases:

“Call when user asks about weather, temperature, or climate conditions”
“Call for both current conditions and short-term forecasts”
“Does NOT support historical weather data”

This prevents the model from calling the wrong tool or calling a tool for tasks it can’t handle.

The basic function calling loop

Here’s a working agent loop from scratch — no frameworks, just OpenAI’s SDK:

import json
import openai

def agent_loop(user_input: str, tools: list, system_prompt: str = None):
    messages = []

    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    messages.append({"role": "user", "content": user_input})

    while True:
        response = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto"
        )

        output = response.output

        # Check if the model wants to call tools
        if output and output[0].type == "function_call":
            tool_call = output[0]

            # Extract function name and arguments
            func_name = tool_call.name
            func_args = json.loads(tool_call.arguments)

            print(f"  → Calling: {func_name}({func_args})")

            # Execute the function
            if func_name == "get_weather":
                result = get_weather(**func_args)
            elif func_name == "get_air_quality":
                result = get_air_quality(**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            # Add the function call and result to messages
            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [{
                    "id": tool_call.call_id,
                    "type": "function",
                    "function": {
                        "name": func_name,
                        "arguments": tool_call.arguments
                    }
                }]
            })

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.call_id,
                "content": json.dumps(result)
            })

            # Continue the loop — the model will use the tool result
            continue

        # No tool calls — return the text response
        return output[0].content

This is the core pattern. The loop:

Sends messages to the model with available tools
If the model requests a function call, executes it and sends the result back
If the model returns text, we’re done

Note

I'm using the newer openai.responses.create() API here (the Responses API), which is cleaner for agent loops than the older Chat Completions API. If you're on openai.ChatCompletion.create(), the structure is similar but uses response.choices[0].message.tool_calls instead.

Parallel function calling

One of the biggest improvements in recent OpenAI models is parallel function calling — the model can request multiple function calls at once. This is critical for efficiency.

When a user asks “What’s the weather and air quality in Bengaluru?”, the model can call both get_weather and get_air_quality simultaneously instead of sequentially.

def agent_loop_parallel(user_input: str, tools: list):
    messages = [{"role": "user", "content": user_input}]

    while True:
        response = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto"
        )

        output = response.output

        # Collect all function calls
        function_calls = [item for item in output if item.type == "function_call"]

        if function_calls:
            # Execute ALL function calls (these could run in parallel)
            tool_messages = []
            for fc in function_calls:
                func_name = fc.name
                func_args = json.loads(fc.arguments)
                print(f"  → Calling: {func_name}({func_args})")

                if func_name == "get_weather":
                    result = get_weather(**func_args)
                elif func_name == "get_air_quality":
                    result = get_air_quality(**func_args)
                else:
                    result = {"error": f"Unknown function: {func_name}"}

                # Add each result to the assistant message
                tool_messages.append({
                    "role": "tool",
                    "tool_call_id": fc.call_id,
                    "content": json.dumps(result)
                })

            # Add assistant message with all tool calls
            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {
                        "id": fc.call_id,
                        "type": "function",
                        "function": {"name": fc.name, "arguments": fc.arguments}
                    }
                    for fc in function_calls
                ]
            })

            # Add all tool results
            messages.extend(tool_messages)
            continue

        return output[0].content

The key insight: execute all parallel calls before returning to the model. The model expects to receive all results together.

For performance, I run parallel calls with concurrent.futures.ThreadPoolExecutor:

import concurrent.futures

def execute_parallel_calls(function_calls):
    """Execute multiple function calls in parallel using threads."""
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        future_to_call = {
            executor.submit(execute_function, fc): fc
            for fc in function_calls
        }
        results = []
        for future in concurrent.futures.as_completed(future_to_call):
            fc = future_to_call[future]
            try:
                result = future.result()
                results.append((fc.call_id, result))
            except Exception as e:
                results.append((fc.call_id, {"error": str(e)}))
        return results

Streaming with function calls

Streaming complicates function calling because the model sends tool_calls deltas as stream chunks instead of a complete JSON object. Each chunk has an index property that groups partial arguments for the same function call.

def agent_loop_streaming(user_input: str, tools: list):
    messages = [{"role": "user", "content": user_input}]

    while True:
        stream = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto",
            stream=True
        )

        # Collect streaming chunks
        text_content = ""
        tool_call_deltas = {}  # index → {id, function: {name, arguments}}

        for event in stream:
            if event.type == "response.output_text.delta":
                text_content += event.delta

            elif event.type == "response.function_call_arguments.delta":
                idx = event.item_id
                if idx not in tool_call_deltas:
                    tool_call_deltas[idx] = {"id": "", "name": "", "arguments": ""}

                # Accumulate function call name and arguments
                # (structure depends on SDK version — check your response schema)
                if hasattr(event, 'name'):
                    tool_call_deltas[idx]["name"] += event.name
                if hasattr(event, 'arguments'):
                    tool_call_deltas[idx]["arguments"] += event.arguments

        # After streaming completes, process tool calls
        if tool_call_deltas:
            tool_messages = []
            for call_id, delta in tool_call_deltas.items():
                func_args = json.loads(delta["arguments"])

                if delta["name"] == "get_weather":
                    result = get_weather(**func_args)
                else:
                    result = {"error": f"Unknown function"}

                tool_messages.append({
                    "role": "tool",
                    "tool_call_id": call_id,
                    "content": json.dumps(result)
                })

            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {"id": call_id, "type": "function",
                     "function": {"name": d["name"], "arguments": d["arguments"]}}
                    for call_id, d in tool_call_deltas.items()
                ]
            })
            messages.extend(tool_messages)
            continue

        return text_content

Pro tip

When streaming, always check the stream event type before accessing fields. Different SDK versions structure streaming events differently. I've been burnt by this twice — test your stream parsing against the actual SDK version you're using.

Error handling for function calls

Function calls fail. APIs return 500s. Network drops. Invalid arguments. Your agent needs to handle these gracefully.

def safe_execute_function(func_name: str, func_args: dict) -> dict:
    """Execute a function with error handling. Returns a result dict regardless of outcome."""
    try:
        if func_name == "get_weather":
            return get_weather(**func_args)
        elif func_name == "get_air_quality":
            return get_air_quality(**func_args)
        else:
            return {"error": f"Unknown function: {func_name}", "success": False}
    except KeyError as e:
        return {"error": f"Missing required parameter: {e}", "success": False}
    except TypeError as e:
        return {"error": f"Invalid arguments: {e}", "success": False, "args": func_args}
    except Exception as e:
        return {"error": f"Function execution failed: {str(e)}", "success": False}

When a function fails, return a structured error message to the model. The model can then:

Explain the error to the user
Try again with corrected arguments
Try a different approach

Models handle errors surprisingly well if you return clear error messages. I’ve had the model suggest fixes for API credential issues based on the error text alone.

Comparison with Anthropic tool use

I build with both providers. Here’s how they compare for function calling:

Aspect	OpenAI	Anthropic
Tool definition	JSON Schema in `tools` parameter	JSON Schema in `tools` parameter
Response format	`tool_calls` array on message	`content` blocks with `tool_use` type
Parallel calls	Native in one response	Native in one response
Streaming	Delta chunks with index	Content block deltas
Thinking before tools	No, calls directly	Optional `thinking` block before tool calls
Error recovery	Good with clear messages	Better — Claude is more cautious about retrying

Anthropic’s key difference: Claude can optionally think before calling tools, which produces better results for complex multi-step reasoning. OpenAI’s models tend to call tools more eagerly but also more prematurely.

I use OpenAI for simpler tool use (fetch data, compute results) and Anthropic when the agent needs to reason deeply before acting (multi-step analysis, research agents).

When function calling breaks

After months of production use, here’s what causes function calling to fail:

Ambiguous schemas. If two functions have overlapping descriptions (e.g., search_documents and search_web), the model gets confused about which to call. I’ve seen the model call search_documents when it should call search_web simply because the descriptions weren’t distinct enough.

Fix: Make descriptions mutually exclusive. “Use for searching the local document store” vs “Use for searching the internet.”

Contradictory instructions. If your system prompt says “Never make up information” but you also have a generate_report function that expects complete data, the model may refuse to call the function because it can’t satisfy both constraints.

Fix: Review your system prompt for conflicts with tool descriptions.

Missing required parameters. The model sometimes omits optional parameters it should include. Making the parameter required (in JSON Schema) forces the model to provide it but increases the chance of hallucinated values.

Fix: Accept reasonable defaults in your function implementation instead of requiring the model to provide every parameter.

Related: Best AI agent frameworks in 2026 — where frameworks help and where they get in the way.

Building a simple agent from scratch

Here’s the complete agent pattern I use for production. It’s about 80 lines of Python with no framework dependencies:

import json
import openai
from datetime import datetime

class FunctionCallingAgent:
    def __init__(self, tools: list, functions: dict, model="gpt-4o", max_steps=10):
        self.tools = tools
        self.functions = functions  # {"function_name": callable}
        self.model = model
        self.max_steps = max_steps
        self.steps = 0
        self.messages = []

    def run(self, user_input: str) -> str:
        self.messages = [
            {"role": "system", "content": f"You are a helpful assistant. Today is {datetime.now().strftime('%Y-%m-%d')}. Use tools when needed."},
            {"role": "user", "content": user_input}
        ]

        while self.steps < self.max_steps:
            self.steps += 1

            response = openai.responses.create(
                model=self.model,
                input=self.messages,
                tools=self.tools,
                tool_choice="auto"
            )

            output = response.output
            function_calls = [o for o in output if o.type == "function_call"]

            if not function_calls:
                return output[0].content

            # Execute all tool calls
            assistant_tool_calls = []
            for fc in function_calls:
                func = self.functions.get(fc.name)
                if not func:
                    result = {"error": f"Unknown function: {fc.name}"}
                else:
                    try:
                        args = json.loads(fc.arguments)
                        result = func(**args)
                    except Exception as e:
                        result = {"error": str(e)}

                assistant_tool_calls.append({
                    "id": fc.call_id,
                    "type": "function",
                    "function": {"name": fc.name, "arguments": fc.arguments}
                })

                self.messages.append({
                    "role": "tool",
                    "tool_call_id": fc.call_id,
                    "content": json.dumps(result)
                })

            self.messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": assistant_tool_calls
            })

        return "Agent stopped: max steps reached."

# Usage
tools = [...]  # Your tool definitions
functions = {
    "get_weather": get_weather,
    "get_air_quality": get_air_quality,
}

agent = FunctionCallingAgent(tools, functions)
result = agent.run("What's the weather and air quality in Bengaluru?")

That’s it. No LangChain. No LangGraph. One class, 80 lines, production-ready if you add logging and error handling on top.

Function calling is the foundation. Everything else — state machines, multi-agent orchestration, monitoring — is built on top of this pattern. Master this first, and you can build anything.