Level 4 Agents: Multi-Agent Orchestration
Design and implement multi-agent systems. Learn orchestration patterns, agent communication, task delegation, and building collaborative AI teams.
Level 4 Agents: Multi-Agent Orchestration
Single agents have limits. They can reason, route, and call tools, but complex tasks often require specialized expertise that spans multiple domains. A code review might need a security specialist, a performance expert, and a style guide enforcer. A research project might need a data gatherer, an analyst, and a writer.
Level 4 agents solve this through orchestration. A manager agent coordinates multiple specialist agents, delegating tasks to those best equipped to handle them. Each specialist focuses on its area of expertise while the manager ensures coherent overall execution.
This guide covers multi-agent system design. You will learn orchestration patterns, agent communication, conflict resolution, and how to build effective agent teams.
Understanding Multi-Agent Systems
Why Multiple Agents?
Single agents face inherent constraints:
Context Limits Every agent works within a context window. Complex tasks with multiple knowledge domains can exhaust available context, degrading performance.
Expertise Dilution An agent that tries to be an expert at everything often masters nothing. Specialized agents outperform generalists in their domain.
Cognitive Load Complex reasoning across multiple domains simultaneously leads to errors. Decomposing into focused subtasks improves reliability.
Parallel Execution Some subtasks are independent. Multiple agents can work simultaneously, reducing total completion time.
Multi-Agent Architecture
A typical multi-agent system has three components:
Manager Agent Coordinates the team. Receives high-level goals, decomposes into tasks, delegates to specialists, aggregates results, and handles conflicts.
Specialist Agents Each focuses on a specific domain. Security agent reviews security implications. Performance agent analyzes efficiency. Documentation agent ensures clarity.
Communication Layer How agents exchange information. Includes task assignments, progress updates, results, and queries between agents.
Orchestration vs. Collaboration
Two primary patterns:
Orchestration (Hierarchical)
Manager
/ | \
Agent1 Agent2 Agent3
Manager directs all work. Agents report to manager.
Collaboration (Peer-to-Peer)
Agent1 ←→ Agent2
↕ ↕
Agent3 ←→ Agent4
Agents communicate directly. Emergent coordination.
Most production systems use orchestration for predictability. We focus on this pattern.
Designing Agent Teams
Identifying Specialist Roles
Start by analyzing the task domain:
## Task Analysis: Code Review
### Required Expertise
1. **Security**
- Vulnerability detection
- Authentication/authorization review
- Input validation
- Secrets management
2. **Performance**
- Algorithmic complexity
- Database query efficiency
- Memory management
- Concurrency issues
3. **Architecture**
- Design pattern adherence
- Separation of concerns
- API contract compliance
- Dependency management
4. **Style**
- Coding standards
- Naming conventions
- Documentation completeness
- Test coverage
### Agent Team
- Security Specialist Agent
- Performance Specialist Agent
- Architecture Specialist Agent
- Style Specialist Agent
- Review Manager Agent (coordinator)
Defining Agent Boundaries
Each specialist needs clear scope:
## Security Agent Scope
### Responsible For
- Identifying security vulnerabilities (OWASP Top 10)
- Checking authentication/authorization logic
- Validating input sanitization
- Reviewing cryptographic usage
- Detecting hardcoded secrets
### Not Responsible For
- General code style
- Performance optimization
- Architecture decisions (unless security-related)
- Test coverage (unless security tests)
### Interfaces With
- Architecture Agent: For security-related architecture concerns
- Manager Agent: For findings and queries
Agent Skill Definitions
Each specialist agent gets a focused skill:
---
description: Security specialist for code review
version: 1.0.0
expertise: security
tools:
- code_search
- vulnerability_scan
- secret_detection
---
# Security Review Agent
## Expertise
I am a security specialist with deep knowledge of:
- OWASP Top 10 vulnerabilities
- Authentication and authorization patterns
- Cryptographic best practices
- Input validation and sanitization
- Secure session management
## Review Focus
When reviewing code, I focus on:
1. Direct security vulnerabilities
2. Security anti-patterns
3. Missing security controls
4. Potential attack vectors
## Output Format
For each finding:
SECURITY ISSUE: [Severity: Critical/High/Medium/Low] Location: [file:line] Finding: [Description of security issue] Risk: [What could go wrong] Recommendation: [How to fix]
## Constraints
I do not review:
- General code style
- Performance (unless security-impacting)
- Business logic (unless security-impacting)
Building the Manager Agent
Manager Responsibilities
The manager agent handles:
## Manager Agent Responsibilities
### Task Decomposition
- Receive high-level goal
- Break into delegable subtasks
- Identify which specialist handles each
### Task Assignment
- Match tasks to appropriate specialists
- Provide necessary context to each
- Track assignment status
### Progress Monitoring
- Receive updates from specialists
- Detect stuck or failing agents
- Intervene when needed
### Result Aggregation
- Collect results from all specialists
- Resolve conflicts between findings
- Compile coherent final output
### Quality Control
- Verify completeness
- Check for missed areas
- Ensure consistency
Manager Agent Implementation
---
description: Orchestrates multi-agent code review
version: 1.0.0
role: manager
agents:
- security_agent
- performance_agent
- architecture_agent
- style_agent
---
# Code Review Manager
## Objective
Coordinate comprehensive code review by delegating to specialist
agents and synthesizing their findings into a coherent review.
## Orchestration Process
### Phase 1: Analysis
1. Receive code changes to review
2. Analyze scope and complexity
3. Identify which specialists are needed
4. Determine task parallelization potential
### Phase 2: Delegation
For each required specialist:
1. Prepare context package:
- Relevant code sections
- Focus areas for this specialist
- Any specific concerns to investigate
2. Assign task:
delegate_to(agent="security_agent", task={ context: <prepared context>, focus: ["authentication", "input validation"], deadline: "within current review" })
3. Track assignment
### Phase 3: Monitoring
While specialists are working:
1. Monitor for completion signals
2. Handle clarification requests
3. Detect timeouts or failures
4. Provide additional context if needed
### Phase 4: Aggregation
When all specialists complete:
1. Collect all findings
2. Deduplicate overlapping issues
3. Resolve conflicting recommendations
4. Organize by severity and type
5. Generate unified review
### Phase 5: Delivery
1. Format final review output
2. Prioritize critical issues
3. Include all relevant details
4. Present to user
## Conflict Resolution
When specialists disagree:
### Different Perspectives
Both findings are valid from different angles.
- Include both perspectives in final review
- Note the different viewpoints
### Contradictory Recommendations
Specialists recommend opposite actions.
- Analyze the reasoning of each
- Determine which priority takes precedence
- If unclear, escalate to user
### Overlapping Findings
Same issue found by multiple specialists.
- Merge into single finding
- Include all relevant details from each
- Credit all identifying specialists
## Escalation Rules
Escalate to user when:
- Critical security issue found
- Specialists fundamentally disagree
- Required information is missing
- Task exceeds allocated time
- Confidence in finding is low
Communication Patterns
Task Assignment Messages
Manager to specialist:
## Task Assignment Format
```json
{
"type": "task_assignment",
"from": "manager",
"to": "security_agent",
"task_id": "review-001-security",
"content": {
"objective": "Review authentication changes",
"files": ["auth.js", "session.js"],
"focus_areas": ["session handling", "token validation"],
"context": "<relevant code snippets>",
"constraints": {
"time_limit": "2 minutes",
"output_format": "structured findings"
}
}
}
### Progress Updates
Specialist to manager:
```markdown
## Progress Update Format
```json
{
"type": "progress_update",
"from": "security_agent",
"to": "manager",
"task_id": "review-001-security",
"status": "in_progress",
"completion": 60,
"findings_so_far": 2,
"estimated_remaining": "45 seconds",
"blocked": false
}
### Result Delivery
Specialist to manager:
```markdown
## Result Delivery Format
```json
{
"type": "task_result",
"from": "security_agent",
"to": "manager",
"task_id": "review-001-security",
"status": "completed",
"findings": [
{
"id": "SEC-001",
"severity": "high",
"title": "Missing CSRF protection",
"location": "auth.js:45",
"description": "...",
"recommendation": "..."
}
],
"summary": "Found 2 issues, 1 high severity",
"confidence": 0.9
}
### Clarification Requests
When specialist needs more information:
```markdown
## Clarification Request Format
```json
{
"type": "clarification_request",
"from": "performance_agent",
"to": "manager",
"task_id": "review-001-performance",
"question": "Is this endpoint expected to handle >1000 RPS?",
"context": "Database query pattern differs based on expected load",
"blocking": true
}
Manager Response
{
"type": "clarification_response",
"from": "manager",
"to": "performance_agent",
"task_id": "review-001-performance",
"answer": "Yes, this is a high-traffic endpoint (2000+ RPS)",
"additional_context": "Consider caching strategies"
}
## Orchestration Patterns
### Sequential Orchestration
Agents work one after another:
```markdown
## Sequential Pattern
Use when:
- Later agents depend on earlier results
- Order matters for correctness
- Resource constraints require serialization
Example: Documentation Pipeline
1. Code Analyzer Agent → Extracts structure
2. Documentation Writer Agent → Writes docs (uses #1 output)
3. Style Checker Agent → Reviews docs (uses #2 output)
4. Manager → Compiles final documentation
Parallel Orchestration
Agents work simultaneously:
## Parallel Pattern
Use when:
- Tasks are independent
- Speed is important
- Agents do not need each other's output
Example: Code Review
1. Manager assigns tasks to all specialists simultaneously:
- Security Agent → Security review
- Performance Agent → Performance review
- Style Agent → Style review
- Architecture Agent → Architecture review
2. All agents work in parallel
3. Manager collects all results when complete
4. Manager synthesizes unified review
Hierarchical Orchestration
Multi-level agent hierarchy:
## Hierarchical Pattern
Use when:
- Complex domain with sub-domains
- Need multiple levels of coordination
- Very large tasks
Example: Full Application Audit
Level 1: Audit Manager
Level 2:
├── Frontend Review Manager
│ ├── React Component Agent
│ ├── CSS/Styling Agent
│ └── Accessibility Agent
├── Backend Review Manager
│ ├── API Agent
│ ├── Database Agent
│ └── Security Agent
└── Infrastructure Review Manager
├── Docker Agent
├── CI/CD Agent
└── Monitoring Agent
Pipeline Orchestration
Assembly-line processing:
## Pipeline Pattern
Use when:
- Each stage transforms output for next stage
- Clear processing phases
- Consistent structure through stages
Example: Content Pipeline
Document → [Extraction Agent] → Raw Content
Raw Content → [Analysis Agent] → Structured Data
Structured Data → [Enhancement Agent] → Rich Data
Rich Data → [Formatting Agent] → Final Output
Handling Failures
Agent Failure Detection
## Failure Detection
### Timeout
Agent does not respond within expected time
- Detection: Timer expires without completion
- Response: Retry or reassign
### Error Response
Agent explicitly reports failure
- Detection: Error message in response
- Response: Analyze error, retry if transient
### Invalid Output
Agent returns but output is malformed
- Detection: Output validation fails
- Response: Request correction or reassign
### Stuck State
Agent reports progress but never completes
- Detection: Progress updates but no completion
- Response: Request status, may need intervention
Recovery Strategies
## Recovery Strategies
### Retry
Try the same task again with same agent
- When: Transient failures (network, timeout)
- How: Resend task assignment
- Limit: Max 3 retries
### Reassign
Give task to different agent
- When: Agent-specific failure
- How: Assign to backup agent or manager fallback
- Fallback: Manager handles directly if no alternative
### Degrade
Complete without failed component
- When: Non-critical specialist fails
- How: Continue with other agents, note gap
- Report: Indicate incomplete coverage
### Escalate
Human intervention required
- When: Critical failure with no recovery path
- How: Report issue with full context
- Wait: For human guidance before proceeding
Graceful Degradation
## Degradation Levels
### Level 0: Full Operation
All specialists available and functioning
- Result: Complete, high-quality output
### Level 1: Minor Degradation
One non-critical specialist unavailable
- Result: Complete output, noted gap in one area
- Example: Style agent down, security still covered
### Level 2: Significant Degradation
Multiple specialists unavailable
- Result: Partial output, significant gaps
- Action: Warn user about limitations
- Example: Only security and performance covered
### Level 3: Minimal Operation
Manager alone or one specialist
- Result: Basic output, many gaps
- Action: Strongly recommend human review
- Example: Only basic scan possible
### Level 4: Complete Failure
Cannot produce meaningful output
- Result: Failure message
- Action: Escalate to human, provide context
Building a Complete Multi-Agent System
Let us build a document processing system with multiple agents:
---
description: Multi-agent document processing system
version: 1.0.0
role: orchestrator
---
# Document Processing Orchestrator
## Agent Team
### Document Parser Agent
- Input: Raw document (PDF, DOCX, etc.)
- Output: Extracted text and structure
- Expertise: Document format handling, OCR
### Content Analyzer Agent
- Input: Extracted text
- Output: Entities, topics, sentiment
- Expertise: NLP, entity recognition
### Summarization Agent
- Input: Analyzed content
- Output: Concise summaries at various lengths
- Expertise: Abstractive summarization
### Quality Checker Agent
- Input: All processing outputs
- Output: Quality scores and issues
- Expertise: Validation, error detection
## Orchestration Flow
### Step 1: Document Intake
1. Receive document from user
2. Identify document type and characteristics
3. Determine processing strategy
4. Initialize processing context
### Step 2: Parallel Initial Processing
Launch simultaneously:
**Parser Task:**
delegate_to(agent="parser_agent", task={ document: <raw document>, extract: ["text", "structure", "metadata"] })
**Initial Quality Check:**
delegate_to(agent="quality_agent", task={ document: <raw document>, check: ["format validity", "completeness"] })
### Step 3: Content Analysis
After parsing completes:
delegate_to(agent="analyzer_agent", task={ content: <parsed content>, analyze: ["entities", "topics", "sentiment", "key phrases"] })
### Step 4: Summarization
After analysis completes:
delegate_to(agent="summary_agent", task={ content: <parsed content>, analysis: <analysis results>, generate: ["one_sentence", "paragraph", "full_page"] })
### Step 5: Final Quality Check
delegate_to(agent="quality_agent", task={ all_outputs: { parsed: <parser output>, analyzed: <analyzer output>, summarized: <summary output> }, validate: ["consistency", "completeness", "accuracy"] })
### Step 6: Result Compilation
1. Collect all agent outputs
2. Merge into unified result
3. Include quality assessment
4. Format for delivery
## Result Format
```json
{
"document": {
"type": "PDF",
"pages": 12,
"words": 5420
},
"extracted_content": {
"text": "...",
"structure": {...},
"metadata": {...}
},
"analysis": {
"entities": [...],
"topics": [...],
"sentiment": "neutral",
"key_phrases": [...]
},
"summaries": {
"one_sentence": "...",
"paragraph": "...",
"full_page": "..."
},
"quality": {
"overall_score": 0.92,
"issues": [],
"confidence": 0.88
}
}
Error Handling
Parser Failure
- Retry with alternative parsing strategy
- If OCR-related, try different OCR engine
- If format unsupported, report to user
Analyzer Failure
- Can proceed with parsed content only
- Note missing analysis in output
- Suggest manual analysis if needed
Summary Failure
- Can provide parsed content and analysis
- Generate basic extractive summary as fallback
- Note limitation in output
Quality Check Failure
- Proceed with other outputs
- Flag as "not quality-verified"
- Recommend manual review
## Best Practices
### Keep Specialists Focused
Each agent should excel in one domain:
```markdown
### Good: Focused Agent
Security Agent: Only security concerns
Performance Agent: Only performance concerns
### Bad: Generalist Agent
General Review Agent: Security, performance, style, architecture
(Too broad, loses specialist advantage)
Define Clear Interfaces
Agents should have predictable inputs and outputs:
### Agent Contract
Input Schema:
{
type: "task_type",
content: "what to process",
options: { ... }
}
Output Schema:
{
status: "completed" | "failed",
findings: [...],
confidence: 0-1,
metadata: { ... }
}
Enable Graceful Degradation
System should work even when components fail:
### Degradation Strategy
1. Identify critical vs. optional agents
2. Define fallback for each agent
3. Test degraded operation modes
4. Communicate limitations to user
Monitor Agent Performance
Track how agents perform:
### Metrics to Track
Per Agent:
- Task completion rate
- Average completion time
- Error rate
- Confidence distribution
System-Wide:
- End-to-end completion time
- Overall quality scores
- Escalation rate
- User satisfaction
Summary
Multi-agent systems unlock capabilities beyond single agents. By distributing specialized tasks to focused agents and coordinating through a manager, you can tackle complex problems that would overwhelm any individual agent.
Key principles:
- Specialize agents for focused expertise
- Define clear boundaries between responsibilities
- Coordinate through a manager for coherent execution
- Handle failures gracefully with degradation strategies
- Communicate clearly with structured messages
Multi-agent orchestration represents the current practical limit of agent complexity. Level 5 agents push further into full autonomy, but Level 4 systems are where most production applications live.
Ready for full autonomy? Continue to Level 5 Agents: Fully Autonomous AI Systems to explore agents that generate and execute their own code.