Level 3 Agents: Tool Calling Mastery
Build AI agents that select and invoke external tools. Learn tool definition, parameter extraction, result handling, and error recovery patterns.
Level 3 Agents: Tool Calling Mastery
Level 2 router agents decide between predefined paths. But what if your agent needs to interact with external systems? Query a database? Call an API? Execute a shell command?
Level 3 agents gain the ability to select and invoke tools. The LLM does not just choose a path; it decides which tool to use, extracts the correct parameters from context, and interprets the results. This transforms agents from decision-makers into actors who can affect the world.
This guide covers tool-calling agents comprehensively. You will learn how to define tools, enable intelligent tool selection, handle results and errors, and build reliable tool-using systems.
Understanding Tool Calling
What Makes a Tool-Calling Agent?
A tool-calling agent has access to external capabilities and the intelligence to use them appropriately. Three components distinguish these agents:
Tool Selection The LLM examines the current task and selects which tool (or tools) would help accomplish it. This requires understanding what each tool does and when it applies.
Parameter Extraction Once a tool is selected, the LLM must provide the correct arguments. This means extracting relevant values from the conversation context and formatting them appropriately.
Result Interpretation Tools return data. The LLM must understand what the result means, whether the operation succeeded, and what to do next based on the outcome.
The Tool Calling Loop
Tool-calling agents follow an extended version of the agent loop:
Observe → Think → Select Tool → Extract Parameters → Call Tool →
Interpret Result → Think → (Select Another Tool or Complete)
This loop can repeat multiple times as the agent uses tools to gather information and take actions.
Tools vs. Routes
The distinction matters:
| Aspect | Routes (Level 2) | Tools (Level 3) |
|---|---|---|
| Selection | Choose one path | Choose tool(s) |
| Parameters | None | LLM must provide |
| Execution | Workflow continues | External system called |
| Results | Path-dependent | Tool returns data |
| Chaining | Sequential paths | Multiple tools per step |
Level 3 agents often incorporate routing, using it to decide high-level strategy while tools handle specific actions.
Defining Tools for Agents
Tool Definition Structure
Tools need clear specifications that help the LLM understand when and how to use them:
## Available Tools
### Tool: search_database
**Purpose:** Query the customer database for information
**Parameters:**
- `query` (string, required): The search query
- `table` (string, required): Which table to search (customers, orders, products)
- `limit` (integer, optional): Maximum results to return (default: 10)
**Returns:** Array of matching records with id, name, and relevant fields
**Use when:**
- User asks about specific customer information
- Need to look up order history
- Searching for product details
**Do not use when:**
- Information is already in context
- User is asking a general question not requiring data lookup
Tool Categories
Different tool types serve different purposes:
Information Retrieval Tools
### Tool: web_search
Query external knowledge sources
- Input: Search query
- Output: Relevant search results
- Side effects: None
### Tool: read_file
Access local file content
- Input: File path
- Output: File contents
- Side effects: None
Action Tools
### Tool: send_email
Deliver messages to recipients
- Input: Recipient, subject, body
- Output: Confirmation or error
- Side effects: Email is sent (irreversible)
### Tool: create_ticket
Open a support ticket in the system
- Input: Title, description, priority
- Output: Ticket ID
- Side effects: Ticket created in database
Transformation Tools
### Tool: format_date
Convert date formats
- Input: Date string, target format
- Output: Formatted date string
- Side effects: None
### Tool: calculate
Perform mathematical calculations
- Input: Expression
- Output: Numeric result
- Side effects: None
Tool Descriptions That Work
The tool description guides the LLM. Make it specific:
Vague (problematic):
### Tool: process
Processes data
Specific (effective):
### Tool: extract_invoice_data
Parses a PDF invoice and extracts structured data including vendor name,
invoice number, line items with quantities and prices, subtotal, tax,
and total amount. Works with standard invoice formats from major vendors.
Implementing Tool-Calling Agents
Basic Tool-Calling Agent
Here is a research assistant that uses tools:
---
description: Research assistant with web search and file access
version: 1.0.0
tools:
- web_search
- read_file
- write_file
---
# Research Assistant Agent
## Objective
Help users research topics by searching the web, reading relevant documents,
and compiling findings into structured reports.
## Available Tools
### web_search
Search the web for current information on any topic.
- **Input:** `query` (string) - The search query
- **Output:** Array of results with title, snippet, and URL
- **Use for:** Current events, facts, documentation, tutorials
### read_file
Read contents of a local file.
- **Input:** `path` (string) - Path to the file
- **Output:** File contents as string
- **Use for:** Accessing local documents, reading previous research
### write_file
Save content to a local file.
- **Input:** `path` (string), `content` (string)
- **Output:** Confirmation of write
- **Use for:** Saving research notes, creating reports
## Tool Selection Strategy
When user asks a research question:
1. **Check local files first**
- If topic might have existing research, use read_file
- Check for relevant documents in ./research/ directory
2. **Search for current information**
- Use web_search for facts, data, current events
- Use specific, targeted queries
- Search multiple times if needed for comprehensive coverage
3. **Synthesize and save**
- Compile findings from multiple sources
- Use write_file to save structured research
- Include sources for all facts
## Execution Pattern
### For research requests:
1. Clarify scope if query is ambiguous
2. Check for existing local research (read_file)
3. Conduct web searches (web_search)
4. Synthesize findings
5. Save compiled research (write_file)
6. Present summary to user
### For follow-up questions:
1. Check if answer is in existing research (read_file)
2. If not, conduct targeted search (web_search)
3. Add new findings to research file (write_file)
4. Answer user's question
Multi-Tool Workflows
Complex tasks require multiple tools in sequence:
## Workflow: Customer Issue Resolution
### Step 1: Gather Context
Tools: search_database, read_file
- Search customer database for account info
- Read any attached documents or previous tickets
### Step 2: Diagnose Issue
Tools: check_system_status, run_diagnostic
- Check if issue relates to known system problems
- Run diagnostic checks if applicable
### Step 3: Take Action
Tools: update_ticket, send_notification, apply_fix
- Apply resolution if automated fix is available
- Update ticket with findings
- Notify customer of status
### Step 4: Document
Tools: write_file, update_knowledge_base
- Document resolution for future reference
- Update knowledge base if new issue pattern found
Parallel Tool Calling
When tools are independent, call them in parallel:
## Parallel Execution Rules
### Can run in parallel:
- Multiple read operations
- Independent searches
- Status checks for different systems
Example:
User: "Compare pricing between competitor A and competitor B" → web_search("competitor A pricing") AND web_search("competitor B pricing") → Wait for both results → Synthesize comparison
### Must run sequentially:
- Write operations that depend on read results
- Actions that depend on previous action success
- Operations with side effects that affect each other
Example:
User: "Update the customer record then send confirmation" → update_customer(data) → Wait for success → send_email(confirmation)
Parameter Extraction
Explicit Parameters
When users provide clear values:
## Parameter Extraction Rules
### Direct Mapping
User: "Search for Python tutorials"
→ web_search(query="Python tutorials")
User: "Read the file at /docs/readme.md"
→ read_file(path="/docs/readme.md")
### Implicit Formatting
User: "Find orders from last week"
→ search_database(
query="orders",
date_from=<calculated: today - 7 days>,
date_to=<calculated: today>
)
Inferred Parameters
When values must be derived from context:
## Parameter Inference
### From Conversation Context
If user previously mentioned "customer Acme Corp":
→ Customer-related queries use customer_id for Acme Corp
### From Environment
If operation requires current date:
→ Use system date, do not ask user
### From Defaults
If optional parameter not specified:
→ Use documented default value
### When Ambiguous
If required parameter is unclear:
→ Ask user for clarification before calling tool
Parameter Validation
Validate before calling:
## Pre-Call Validation
### Required Parameters
- All required parameters must have values
- If missing, ask user rather than guessing
### Type Checking
- Numeric fields must be numbers
- Dates must be valid date formats
- Enums must match allowed values
### Range Validation
- Quantities must be positive
- Dates must be reasonable (not year 3000)
- Limits must be within allowed maximums
### Security Validation
- Paths must not escape allowed directories
- Queries must not contain injection patterns
- IDs must match expected formats
Handling Tool Results
Success Handling
When tools return successfully:
## Success Processing
### Information Results
- Extract relevant data from result
- Summarize for user if result is large
- Use data to inform next steps
### Action Results
- Confirm action completed
- Note any IDs or references returned
- Proceed to dependent actions
### Empty Results
- Distinguish between "not found" and "error"
- Report to user that search found nothing
- Consider alternative approaches
Error Handling
When tools fail:
## Error Recovery
### Transient Errors (retry appropriate)
- Network timeout
- Rate limiting
- Temporary unavailability
Action: Wait and retry (max 3 attempts)
### Permanent Errors (do not retry)
- Invalid parameters
- Resource not found
- Permission denied
Action: Report error, ask for corrected input
### Partial Success
- Some operations succeeded, others failed
Action: Report what succeeded, handle failures individually
### Unknown Errors
- Unexpected error format or message
Action: Log details, report generic error, suggest manual intervention
Result Interpretation
Make sense of tool output:
## Result Interpretation
### Numeric Results
- Understand units and context
- Compare to expected ranges
- Format appropriately for user
### Structured Results
- Parse JSON/XML responses
- Extract fields relevant to query
- Handle missing or null fields
### Boolean Results
- Map to meaningful messages
- Handle edge cases (operation succeeded but with warnings)
### Collection Results
- Handle empty collections gracefully
- Paginate or summarize large collections
- Identify patterns or anomalies
Advanced Tool Patterns
Tool Chaining
Tools that feed into each other:
## Chain: Research and Summarize
### Step 1: Search
Tool: web_search
Input: user query
Output: search_results
### Step 2: Extract Content
Tool: fetch_page (for each top result)
Input: url from search_results
Output: page_content[]
### Step 3: Summarize
Tool: summarize_text
Input: combined page_content
Output: summary
### Step 4: Save
Tool: write_file
Input: summary, path
Output: confirmation
Conditional Tool Use
Select tools based on intermediate results:
## Conditional Logic
### Pattern: Check Then Act
1. Use read_tool to check current state
2. Based on result:
- If condition A: use tool_x
- If condition B: use tool_y
- If unexpected: ask user
### Example: File Update
1. read_file(path) → content
2. If file exists and contains expected structure:
→ write_file(path, updated_content)
3. If file missing:
→ create_file(path, initial_content)
4. If file has unexpected format:
→ ask user how to proceed
Tool Fallbacks
When primary tool fails:
## Fallback Strategy
### Primary: API Tool
Use the dedicated API for accurate data
### Fallback 1: Cache Tool
If API unavailable, check local cache
### Fallback 2: Web Search
If no cache, search web for public data
### Fallback 3: User Input
If all else fails, ask user for data
### Rules:
- Try fallbacks in order
- Log which source was used
- Note if data might be stale
MCP Integration
The Model Context Protocol provides standardized tool access:
MCP Tool Discovery
## MCP Server Integration
### Available MCP Servers
#### filesystem
Local file operations
- list_directory
- read_file
- write_file
- search_files
#### github
GitHub API operations
- search_repos
- get_issue
- create_issue
- list_pull_requests
#### database
Database operations
- query
- insert
- update
- delete
### Tool Selection with MCP
When task requires file operations:
→ Use filesystem MCP server tools
When task requires GitHub data:
→ Use github MCP server tools
MCP Tool Calling
## MCP Tool Invocation
### Format
mcp_call(
server="server_name",
tool="tool_name",
arguments={...}
)
### Example
mcp_call(
server="github",
tool="search_repos",
arguments={
"query": "language:python stars:>1000",
"limit": 10
}
)
### Response Handling
- MCP returns standardized response format
- Check `success` field for operation status
- Extract `result` field for tool output
- Handle `error` field for failure details
Building a Complete Example
Let us build a code review agent that uses multiple tools:
---
description: Automated code review with multi-tool analysis
version: 1.0.0
tools:
- git_diff
- run_linter
- run_tests
- search_codebase
- write_comment
---
# Code Review Agent
## Objective
Perform comprehensive code review on pending changes using multiple
analysis tools. Identify issues, suggest improvements, and document
findings.
## Available Tools
### git_diff
Get the diff of pending changes
- **Input:** `base_branch` (string, default: "main")
- **Output:** Diff with changed files and content
- **Use for:** Understanding what changed
### run_linter
Execute linting on changed files
- **Input:** `files` (array of file paths)
- **Output:** Linting results with issues
- **Use for:** Catching style and basic errors
### run_tests
Execute test suite
- **Input:** `scope` (string: "all", "changed", "related")
- **Output:** Test results with pass/fail status
- **Use for:** Ensuring changes do not break existing functionality
### search_codebase
Search for patterns in code
- **Input:** `pattern` (string), `file_type` (string, optional)
- **Output:** Matching code locations
- **Use for:** Finding related code, checking consistency
### write_comment
Add review comment to specific location
- **Input:** `file` (string), `line` (number), `comment` (string)
- **Output:** Confirmation
- **Use for:** Recording review findings
## Review Process
### Phase 1: Understand Changes
1. Call git_diff to get all changes
2. Categorize changes by type:
- New files
- Modified files
- Deleted files
3. Identify change scope:
- Small (< 100 lines): Standard review
- Medium (100-500 lines): Thorough review
- Large (> 500 lines): Suggest breaking up
### Phase 2: Automated Checks
1. Run linter on changed files
run_linter(files=changed_files)
2. Run tests with related scope
run_tests(scope="related")
3. Collect all automated findings
### Phase 3: Contextual Analysis
For each significant change:
1. Search for similar patterns
search_codebase(pattern=<pattern from change>)
2. Check for consistency with existing code
3. Identify potential issues:
- Missing error handling
- Hardcoded values
- Duplicated logic
- Performance concerns
### Phase 4: Write Review Comments
For each finding:
1. Determine severity (error, warning, suggestion)
2. Write clear, actionable comment
3. Add comment to specific location
write_comment( file="path/to/file", line=42, comment="[Warning] Consider adding null check here" )
### Phase 5: Summary
Compile review summary:
- Total files reviewed
- Issues found by severity
- Test results
- Overall assessment (approve, request changes, needs discussion)
## Error Handling
### Linter Fails to Run
- Check if linter is installed
- Fall back to basic syntax checking
- Note limitation in review summary
### Tests Fail
- Distinguish test failures from broken tests
- If related tests fail, mark as blocking issue
- If unrelated tests fail, note for investigation
### Large Diffs
- If diff is too large, review in chunks
- Prioritize high-risk files
- Suggest splitting PR in review
Testing Tool-Calling Agents
Tool Mock Testing
Test without calling real tools:
## Mock Testing Strategy
### Setup
Create mock implementations for each tool:
- Return predictable responses
- Simulate success and failure cases
- Record all calls for verification
### Test Cases
#### Happy Path
- Mock tools return expected results
- Verify agent completes task correctly
- Check correct tools were called with correct params
#### Error Path
- Mock tools return errors
- Verify agent handles gracefully
- Check fallback behavior works
#### Edge Cases
- Empty results
- Large results (pagination needed)
- Timeout responses
Integration Testing
Test with real tools in safe environment:
## Integration Testing
### Test Environment
- Use sandbox/test databases
- Use test API keys with limited scope
- Create reversible actions only
### Test Scenarios
1. End-to-end workflow with all tools
2. Recovery from real tool failures
3. Performance under realistic conditions
### Cleanup
- Revert any changes made during tests
- Clear test data created
- Reset to known state
Best Practices
Tool Granularity
Too coarse (problematic):
Tool: do_everything
Handles all operations
Too fine (problematic):
Tool: get_first_name
Tool: get_last_name
Tool: get_email
Tool: get_phone
...
Right level:
Tool: get_customer_profile
Returns complete customer information
Tool: update_customer_profile
Updates specified customer fields
Clear Tool Boundaries
Each tool should have a single, clear purpose:
### Good: Single Responsibility
- search_customers: Find customers matching criteria
- get_customer: Get one customer by ID
- update_customer: Modify customer data
### Bad: Mixed Responsibilities
- handle_customer: Searches, gets, or updates based on parameters
Safe Tool Design
Build safety into tools:
### Safety Principles
1. **Idempotent when possible**
Running the same operation twice has same effect as once
2. **Reversible when possible**
Provide undo capability for destructive operations
3. **Scoped permissions**
Tools only access what they need
4. **Rate limited**
Prevent accidental overuse
5. **Logged**
All tool calls recorded for audit
Summary
Level 3 agents bridge the gap between reasoning and acting. By giving agents the ability to select and invoke tools, you enable them to interact with external systems, gather real information, and take meaningful actions.
Key principles:
- Define tools clearly with specific purposes and parameters
- Enable intelligent selection by describing when each tool applies
- Extract parameters carefully using conversation context
- Handle results robustly including errors and edge cases
- Chain tools effectively for complex multi-step operations
Tool-calling agents form the foundation for most practical agent applications. Master this pattern, and you can build agents that research, analyze, communicate, and automate real workflows.
Ready to coordinate multiple agents working together? Continue to Level 4 Agents: Multi-Agent Orchestration to learn how to build agent teams.