Haven't installed OpenClaw yet? Click here for one-line install
curl -fsSL https://openclaw.ai/install.sh | bashiwr -useb https://openclaw.ai/install.ps1 | iexcurl -fsSL https://openclaw.ai/install.cmd -o install.cmd && install.cmd && del install.cmd- Multi-agent systems can handle tasks 3-5x more complex than single agents while reducing overall latency by 40-60% through parallel execution
- OpenClaw tutorial Agent Teams supports three collaboration modes: Orchestrator, Peer-to-Peer, and Hierarchical, which can be flexibly combined based on task characteristics
- Subagents collaborate through three mechanisms: structured message passing, shared memory, and event queues, each with its own applicable task types and cost characteristics
- Proper role assignment and model selection (lightweight models for routing, advanced models for reasoning) can reduce overall token costs by up to 35%
- Compared to AutoGen, CrewAI, and LangGraph, OpenClaw Agent Teams' core advantage is its YAML declarative configuration with the lowest entry barrier, though it still has room for improvement in dynamic workflow flexibility
In February 2026, OpenClaw grew from 9,000 GitHub stars to 157,000 in just sixty days, making it one of the most watched projects in the open-source AI agent space.[10] Behind this surge lies not only breakthroughs in single-agent capabilities, but also the maturation of Multi-Agent Architecture — enabling developers to assemble AI agent "teams" that collaboratively tackle complex tasks beyond the reach of any single agent.[2]
This article is the fourth in the OpenClaw series, focusing on the complete technical architecture of Agent Teams. Starting from the fundamental limitations of single agents, we progressively dissect OpenClaw's multi-agent system design logic, communication protocols, and task delegation patterns. We also provide two hands-on practical examples: a research team multi-agent collaboration system and a code review agent team in a development pipeline. Finally, we compare the major multi-agent frameworks on the market to help readers make more informed technology choices.
1. Why Multi-Agent Systems Are Needed
Before diving into OpenClaw's technical details, it is important to first clarify: what types of tasks truly require multi-agent systems? Not every problem warrants the added complexity of a multi-agent approach.
1.1 The Fundamental Differences in Task Complexity
Humans form teams because certain problems are inherently multidimensional — requiring legal, financial, and technical expertise simultaneously, rather than sequentially. AI agents face the same challenge. When a task demands that an agent simultaneously possess web scraping, data analysis, natural language generation, and code execution capabilities, a single agent — no matter how large its context window or how powerful its model — will encounter cognitive overload.
Gartner's 2025 report identified AI Agent Ecosystems as one of the most critical strategic technology trends for 2026, driven primarily by multi-agent collaboration architectures that enable organizations to automate complex enterprise processes.[7]
1.2 Parallelizable Work Is the Key Signal
The simplest question to determine whether you need a multi-agent system is: "Which subtasks in this workflow can be performed simultaneously?"
For example, writing a competitive analysis report requires: (A) scraping competitor websites and news; (B) analyzing financial data; (C) compiling product feature comparisons; (D) aggregating user reviews. These four tasks are logically independent and can be executed in parallel. If a single agent completes them sequentially at 5 minutes each, the total time is 20 minutes; if four subagents execute simultaneously, the theoretical time is just 5 minutes, plus coordination overhead of roughly 6-7 minutes — a threefold efficiency improvement.
1.3 Specialization Improves Output Quality
Academic research shows that assigning each agent a clear role specification in a multi-agent system significantly improves task completion quality. MetaGPT's research found that giving agents explicit roles such as "Product Manager," "Engineer," and "Tester," and having them operate according to corresponding SOPs, can produce code generation quality comparable to human engineering teams.[4]
OpenClaw Agent Teams implements this insight at the architectural level: each subagent not only has an independent system prompt but can also be bound to specific skill sets and model selections, turning "specialization" into configurable technical parameters.[1]
2. Single Agent Bottlenecks and the Need to Scale
To appreciate the value of multi-agent systems, we must first honestly confront the ceiling of single agents.
2.1 The Physical Limits of the Context Window
Even advanced models like Claude 3.5 Sonnet or GPT-4o have context window limits (typically 128K to 200K tokens). For tasks that require simultaneously holding massive amounts of context — such as analyzing a 100,000-line codebase or synthesizing 300 research papers — a single agent physically cannot fit all the information into a single inference pass.
The multi-agent solution is distributed memory: each subagent maintains only the context within its area of responsibility, while the orchestrator agent handles cross-agent knowledge integration. This way, even if the total context required far exceeds any single model's limit, the system can still function effectively.
2.2 The Task Complexity Ceiling
Single agents commonly exhibit the following failure modes when handling highly complex tasks:
- Step omission: Forgetting previously established constraints during long-chain reasoning
- Tool misuse: Ignoring preconditions for different tools when switching between subtasks
- Quality inconsistency: Output quality in earlier stages is noticeably better than in later stages (attention dilution effect)
- Error accumulation: Small early errors get amplified in subsequent steps, causing significant deviation in final output
Multi-agent architecture mitigates these issues through Separation of Concerns: each agent only needs to maintain high-quality output within a limited task scope, and the impact of errors is contained at the subtask level rather than propagating throughout the entire workflow.
2.3 The Asymmetry of Latency and Cost
A single agent's execution latency is the sum of all subtasks; in a multi-agent system, parallelizable subtasks can execute simultaneously, compressing latency to the duration of the longest subtask plus coordination overhead.
The cost logic is even more nuanced: not all subtasks require the most expensive model. By using GPT-4o-mini for simple routing decisions and Claude Opus for complex analytical reasoning, overall costs can be reduced by 35-50% while maintaining output quality for critical tasks.[5]
2.4 Maintainability and Scalability
From an engineering perspective, a single agent's system prompt tends to bloat as task complexity grows, eventually devolving into unmaintainable "Prompt Spaghetti." Multi-agent architecture forces developers to modularize capabilities, keeping each agent's system prompt concise and focused, dramatically improving the overall system's readability, testability, and maintainability.
3. OpenClaw Agent Teams Architecture Design
OpenClaw's multi-agent system is built on top of the Gateway architecture, with YAML declarative configuration at its core, supporting three fundamental agent collaboration modes.[9]
3.1 Architecture Overview
An OpenClaw Agent Team consists of the following core components:
- Primary Agent: The entry agent that receives user requests, typically serving as the orchestrator
- Subagents: Specialized agents delegated tasks by the primary agent, each with its own configuration file
- Shared Tool Pool: A collection of tools shared across multiple agents, such as web search and file I/O
- Inter-Agent Communication Layer: Handles message routing and state synchronization
- Task Queue: Manages asynchronous task distribution between agents
Below is a basic Agent Team configuration structure:
# openclaw-team.yaml
name: research-team
version: "1.0"
agents:
coordinator:
model: claude-3-5-sonnet
system: |
You are the research coordinator agent, responsible for decomposing tasks
and delegating them to specialized subagents.
Upon receiving a research request, analyze the required subtasks and
delegate them in parallel.
After receiving all subagent responses, synthesize a coherent report.
skills:
- task-delegation
- report-synthesis
subagents:
- web-scraper
- data-analyst
- report-writer
web-scraper:
model: gpt-4o-mini
system: |
You are the web information collection agent, specializing in extracting
structured information from web pages.
Upon receiving search instructions, return formatted raw data without analysis.
skills:
- web-search
- html-parser
timeout: 30s
data-analyst:
model: claude-3-5-sonnet
system: |
You are the data analysis agent, responsible for extracting insights from raw data.
Only perform analysis — do not collect data or write reports.
skills:
- data-analysis
- chart-generation
report-writer:
model: claude-3-opus
system: |
You are the professional report writing agent, responsible for transforming
analysis results into clear written reports.
Maintain an objective and neutral tone; every claim must be supported by data.
skills:
- markdown-formatter
- citation-manager
team:
coordination_mode: orchestrator
max_parallel_agents: 3
timeout: 300s
shared_memory: true
3.2 Orchestrator Pattern
The Orchestrator Pattern is the most common multi-agent architecture, suited for scenarios with relatively fixed task flows that require centralized control.
In this mode, the orchestrator agent plays the role of "project manager":
- Receives the user's high-level task description
- Decomposes the task into delegatable subtasks
- Selects appropriate subagents based on subtask characteristics
- Monitors each subagent's execution progress
- Integrates all subagent outputs
- Returns the final result to the user
The Orchestrator Pattern's advantage lies in its clear logic and ease of debugging. When a task fails, you can quickly identify which subagent encountered the problem. Its disadvantage is that the orchestrator agent becomes a single point of bottleneck: if the orchestrator's reasoning is flawed, the entire system's output is affected.
3.3 Peer-to-Peer Pattern
In the Peer-to-Peer Pattern, all agents hold equal status and can communicate directly with each other without going through a central orchestrator. This mode is suited for scenarios requiring multi-party negotiation to reach consensus, such as multiple review agents independently evaluating the same proposal before voting on a decision.
# peer-to-peer configuration example
team:
coordination_mode: peer-to-peer
communication:
broadcast: true # Any agent's message is broadcast to all agents
consensus_required: true
consensus_threshold: 0.67 # Requires 2/3 agent agreement
The challenge with the Peer-to-Peer Pattern is the potential for Message Storms — as the number of agents increases, broadcast messages grow exponentially. Therefore, OpenClaw recommends keeping the number of agents in peer-to-peer mode to no more than five.
3.4 Hierarchical Pattern
The Hierarchical Pattern combines the advantages of both the Orchestrator and Peer-to-Peer patterns, suited for large-scale complex tasks. Architecturally, it forms a tree structure: a Root Orchestrator manages multiple Sub-Orchestrators, each of which manages its own Worker Agents.
# hierarchical configuration example
team:
coordination_mode: hierarchical
hierarchy:
root: project-manager
level_1:
- research-lead # manages web-scraper, arxiv-searcher
- dev-lead # manages coder, tester, reviewer
- content-lead # manages writer, editor, translator
This mode is suitable for enterprise-level workflows, but has the highest configuration complexity and relatively greater debugging difficulty. It is recommended only when a single-layer orchestrator pattern cannot meet your requirements.
4. Subagent Communication Protocols
The performance and stability of a multi-agent system largely depend on the communication mechanism design between agents. OpenClaw provides three communication protocols, each suited to different scenarios.[1]
4.1 Structured Message Passing
The most basic communication method: Agent A completes its task, encapsulates the result into a standardized message object, and sends it to Agent B. OpenClaw's message format follows this structure:
{
"message_id": "msg_abc123",
"sender": "web-scraper",
"receiver": "data-analyst",
"task_id": "research_task_001",
"message_type": "task_result",
"payload": {
"status": "success",
"data": { ... },
"metadata": {
"tokens_used": 1240,
"execution_time_ms": 3200,
"sources": ["https://example.com/article"]
}
},
"timestamp": "2026-02-22T10:30:00Z"
}
The advantage of structured message passing is strong traceability — every message has a unique ID, facilitating post-hoc auditing and debugging. The downside is that for scenarios requiring frequent small data exchanges, the message encapsulation overhead can become a significant proportion.
4.2 Shared Memory
Shared memory allows multiple agents to read from and write to the same memory namespace, suited for scenarios requiring frequent sharing of intermediate states. OpenClaw implements this mechanism through the Gateway's Memory Store:
# Enable shared memory in agent configuration
agents:
coordinator:
memory:
shared_namespace: "research_project_001"
read_access: ["web-scraper", "data-analyst", "report-writer"]
write_access: ["coordinator", "data-analyst"]
data-analyst:
memory:
shared_namespace: "research_project_001"
# Read scraper data from shared memory, write analysis results
When using shared memory, note the following considerations:
- Write conflicts: Agents writing simultaneously may overwrite each other's data; it is recommended to set up independent sub-namespaces for each agent
- Read consistency: Agent B may read incomplete data while Agent A's write has not yet finished
- Memory cleanup: Shared memory must be manually cleaned after task completion, or it will affect subsequent tasks
4.3 Event Queue
The event queue is the communication mechanism best suited for asynchronous workflows. Agents publish events, and other agents subscribe to event types they are interested in, automatically launching corresponding agents when events fire.
# Event queue configuration
team:
event_bus:
enabled: true
events:
- name: "scraping_completed"
publisher: "web-scraper"
subscribers: ["data-analyst"]
trigger: "on_task_success"
- name: "analysis_completed"
publisher: "data-analyst"
subscribers: ["report-writer", "coordinator"]
trigger: "on_task_success"
- name: "task_failed"
publisher: "*" # Any agent can publish failure events
subscribers: ["coordinator"]
trigger: "on_error"
The event queue is deeply integrated with OpenClaw's Hooks system: hooks triggered upon agent task completion can automatically publish events to the queue, launching downstream agents. This enables fully decoupled collaboration between agents — each agent only needs to care about "when I finish," without needing to know "who is waiting for my results."
4.4 Communication Protocol Selection Guide
| Scenario Characteristics | Recommended Protocol | Rationale |
|---|---|---|
| Linear pipeline with clear steps | Structured Message Passing | High traceability, easy debugging |
| Frequent state sharing among agents | Shared Memory | Reduces message serialization overhead |
| Event-driven with diverse triggers | Event Queue | Decouples agents, supports dynamic workflows |
| Complex mixed scenarios | Hybrid approach | Choose the best protocol for each subtask |
5. Task Delegation and Role Assignment Design Patterns
The effectiveness of a multi-agent system largely depends on whether tasks are delegated to the "right agent." OpenClaw provides multiple task delegation strategies.
5.1 The Three Elements of Role Definition
A well-designed subagent role should contain three core elements:
- Capability Boundary: Clearly define what the agent "can do" and "does not do." Agents with unclear boundaries tend to hallucinate or exhibit unnecessary boundary-crossing behavior when receiving out-of-scope tasks.
- I/O Contract: Specify the input format the agent accepts and the output structure it returns. Strict I/O contracts allow agents to be called like APIs by other agents, improving system composability.
- Failure Behavior: Define how the agent should respond when it cannot complete a task — silently fail, return an error code, or request human intervention?
# Role definition example: complete with all three elements
agents:
data-analyst:
system: |
[Capability Boundary]
You specialize in data analysis and statistical insight extraction.
You do not collect data, write reports, or execute code.
[Input Format]
Accept JSON-formatted structured data containing "raw_data" and "analysis_goal" fields.
[Output Format]
Return a JSON object with the following fields:
- "key_findings": array of strings, each no longer than 50 words
- "statistics": key numerical statistics
- "confidence": confidence level of analysis conclusions (high/medium/low)
[Failure Behavior]
If data quality is insufficient for analysis, return {"status": "insufficient_data", "reason": "..."}
5.2 Skill-Based Routing
OpenClaw's Skills system is deeply integrated with the multi-agent architecture: the orchestrator agent can automatically route tasks to subagents that possess the required skills based on subtask requirements.
# Skill routing configuration
agents:
coordinator:
routing_strategy: skill-based
routing_rules:
- skill: "web-search"
route_to: "web-scraper"
- skill: "data-analysis"
route_to: "data-analyst"
- skill: "code-execution"
route_to: "code-runner"
- skill: "*" # Default route
route_to: "general-assistant"
5.3 Load Balancing
When multiple subagents possess the same capabilities (e.g., three "web scraping agents"), OpenClaw supports load balancing based on the following strategies:
- Round Robin: Tasks are distributed to agents in sequence, ensuring even workload distribution
- Shortest Queue: New tasks are assigned to the agent with the fewest pending tasks
- Capability Weight: Distribution ratios are dynamically adjusted based on each agent's historical success rate
team:
load_balancing:
strategy: shortest-queue
agent_pool:
- web-scraper-1
- web-scraper-2
- web-scraper-3
health_check:
enabled: true
interval: 30s
failure_threshold: 3 # Removed from pool after 3 consecutive failures
5.4 Fallback Strategy
In production environments, subagents may fail for various reasons — API rate limiting, model service unavailability, task timeouts. A well-designed fallback strategy is essential for stable multi-agent system operation:
agents:
primary-analyst:
model: claude-3-5-sonnet
fallback:
on_timeout:
action: retry
max_retries: 2
backoff: exponential
on_api_error:
action: delegate
fallback_agent: backup-analyst
on_capability_mismatch:
action: escalate
escalate_to: coordinator
6. Case Study 1: Research Team Multi-Agent Collaboration
This case study demonstrates how to build a multi-agent system using OpenClaw Agent Teams that can automatically complete academic competitive intelligence research.
6.1 System Requirements and Role Design
Objective: Given a research topic (e.g., "Medical Applications of Multimodal Large Language Models"), produce a report containing the latest paper summaries, competitive landscape analysis, and technology trend predictions within 15 minutes.
Role design:
- Research Coordinator: Task decomposition, progress monitoring, final report integration
- Paper Searcher: Searches relevant papers from arXiv and Google Scholar
- Web Scraper: Collects news, blog posts, and industry reports
- Data Analyst: Organizes paper citation counts, institutional distribution, and temporal trends
- Report Writer: Integrates all data into a structured Markdown report
6.2 Complete Configuration File
# research-team.yaml
name: research-intelligence-team
version: "1.0"
agents:
research-coordinator:
model: claude-3-5-sonnet
system: |
You are the research coordinator agent. Upon receiving a research topic,
immediately execute the following steps:
1. Simultaneously delegate search tasks to paper-searcher and web-scraper
2. After receiving both results, delegate analysis to data-analyst
3. After receiving analysis results, delegate report writing to report-writer
4. Return the final report to the user
When delegating tasks, use this format:
{"delegate_to": "agent_name", "task": "...", "deadline": "Xs"}
skills:
- task-delegation
- progress-monitoring
subagents:
- paper-searcher
- web-scraper
- data-analyst
- report-writer
paper-searcher:
model: gpt-4o-mini
system: |
You are the academic paper search agent.
Use the web-search skill to search arXiv and Google Scholar.
Return format: {"papers": [{"title": "", "authors": [], "year": 0, "citations": 0, "abstract": ""}]}
Return a maximum of 10 most relevant papers per request.
skills:
- web-search
- arxiv-api
timeout: 60s
max_retries: 2
web-scraper:
model: gpt-4o-mini
system: |
You are the web information collection agent.
Search and extract news articles, tech blogs, and industry analyses.
Return format: {"sources": [{"url": "", "title": "", "date": "", "summary": "", "key_points": []}]}
Only return content from the past 6 months, maximum 8 sources per request.
skills:
- web-search
- content-extractor
timeout: 60s
data-analyst:
model: claude-3-5-sonnet
system: |
You are the data analysis agent.
After receiving the paper list and web data, analyze:
1. Publication trends (by year, institutional distribution)
2. Core technical directions and keyword clustering
3. Major research institutions and competitive landscape
4. Technology readiness level assessment (TRL 1-9)
Return structured analysis results in JSON.
skills:
- data-analysis
- trend-detection
timeout: 90s
report-writer:
model: claude-3-opus
system: |
You are the professional report writing agent.
Transform analysis data into a Markdown report with the following structure:
## Executive Summary (under 200 words)
## Current Research Landscape (with statistics)
## Technology Trend Analysis
## Competitive Landscape
## Conclusions and Recommendations
## References
Maintain an objective tone; every assertion must be supported by data.
skills:
- markdown-writer
- citation-formatter
timeout: 120s
team:
coordination_mode: orchestrator
orchestrator: research-coordinator
max_parallel_agents: 3
global_timeout: 900s # 15 minutes
shared_memory:
enabled: true
namespace: "research_session"
event_bus:
enabled: true
logging:
level: info
include_agent_messages: true
6.3 Execution Flow Analysis
When the user inputs a research topic, the system operates according to the following flow:
- T+0s: The research coordinator receives the topic and analyzes the task structure
- T+2s: Simultaneously delegates search tasks to paper-searcher and web-scraper (parallel execution)
- T+60s: Both search agents complete, notifying the coordinator via the event queue
- T+62s: The coordinator writes search results to shared memory and launches data-analyst
- T+130s: Data analysis completes, launching report-writer
- T+250s: Report completes, coordinator integrates and returns to user
The entire process takes approximately 4 minutes, whereas a single agent completing the same task sequentially would take an estimated 12-15 minutes.
6.4 Performance and Cost Analysis
Using a typical research task as an example (topic: multimodal LLM medical applications):
| Agent | Model | Token Usage | Execution Time | Estimated Cost |
|---|---|---|---|---|
| Research Coordinator | Claude 3.5 Sonnet | 3,200 | 8s | $0.005 |
| Paper Searcher | GPT-4o-mini | 8,500 | 52s | $0.004 |
| Web Scraper | GPT-4o-mini | 6,200 | 48s | $0.003 |
| Data Analyst | Claude 3.5 Sonnet | 12,000 | 68s | $0.018 |
| Report Writer | Claude Opus | 9,800 | 115s | $0.147 |
| Total | --- | 39,700 | ~250s | $0.177 |
If all tasks used Claude Opus, the estimated cost for the same token usage would be approximately $0.596 — the multi-agent mixed model strategy saves about 70% in costs.
7. Case Study 2: Development Team Code Review Pipeline
This case study demonstrates how to build a multi-agent code review system within a CI/CD pipeline that automatically performs multi-dimensional reviews after developers submit Pull Requests.
7.1 System Requirements and Role Design
Objective: Within 5 minutes of PR submission, complete security vulnerability scanning, code quality review, test coverage analysis, and documentation completeness checks, then generate a review comment that can be posted directly to GitHub.
Role design:
- Review Coordinator: Receives the PR diff, distributes review tasks, integrates review results
- Security Reviewer: Scans for OWASP Top 10 vulnerabilities, hardcoded secrets, SQL injection risks
- Code Quality Agent: Checks naming conventions, complexity, code duplication, design pattern compliance
- Test Agent: Analyzes test coverage, suggests missing test cases
- Documentation Agent: Checks JSDoc/docstring completeness, README update requirements
7.2 Complete Configuration File
# code-review-team.yaml
name: code-review-pipeline
version: "1.0"
agents:
review-coordinator:
model: claude-3-5-sonnet
system: |
You are the code review coordinator agent.
After receiving a PR diff, simultaneously delegate these four review tasks:
- Security review -> security-reviewer
- Code quality -> code-quality-agent
- Test analysis -> test-agent
- Documentation check -> doc-agent
After receiving all review results, generate a GitHub PR comment in this format:
### Automated Code Review Report
**Overall Score**: X/10
#### Security | Code Quality | Test Coverage | Documentation
List specific issues and improvement suggestions for each category.
skills:
- file-reader
- git-diff-parser
subagents:
- security-reviewer
- code-quality-agent
- test-agent
- doc-agent
security-reviewer:
model: claude-3-5-sonnet
system: |
You are the security review agent, specializing in code vulnerability identification.
Review scope: OWASP Top 10, hardcoded secrets and credentials, SQL/command injection,
XSS vulnerabilities, insecure dependency versions.
For each issue, return:
{"severity": "critical|high|medium|low", "location": "file:line", "description": "", "recommendation": ""}
For critical severity issues, include a fix code example.
skills:
- code-analyzer
- vulnerability-scanner
timeout: 60s
code-quality-agent:
model: gpt-4o
system: |
You are the code quality review agent.
Evaluation dimensions:
1. Naming conventions (are variable, function, and class names clear)
2. Function complexity (is McCabe complexity above 10)
3. Code duplication (DRY principle violations)
4. SOLID principle compliance
5. Error handling completeness
Return a score (1-10) for each dimension with a specific issues list.
skills:
- code-analyzer
- complexity-calculator
timeout: 60s
test-agent:
model: gpt-4o-mini
system: |
You are the test analysis agent.
Analyze code changes and:
1. Identify new code paths not covered by existing tests
2. Suggest unit tests and integration tests that need to be added
3. Assess testing completeness for boundary conditions and exception paths
Return test coverage estimates and a suggested test case list.
skills:
- code-analyzer
- test-pattern-detector
timeout: 45s
doc-agent:
model: gpt-4o-mini
system: |
You are the documentation review agent.
Check:
1. Whether new/modified public functions have complete JSDoc/docstring
2. Whether the README needs updating (new APIs, environment variables, dependencies)
3. Whether the CHANGELOG has recorded this change
4. Whether complex logic has inline comments
Return a documentation gap list and priority assessment.
skills:
- file-reader
- doc-parser
timeout: 30s
team:
coordination_mode: orchestrator
orchestrator: review-coordinator
max_parallel_agents: 4 # All four review agents run in parallel
global_timeout: 300s
hooks:
on_complete:
- action: post-github-comment
target: "{{pr.url}}/reviews"
on_critical_security:
- action: slack-alert
channel: "#security-alerts"
message: "Critical security issue found in PR {{pr.number}}"
7.3 CI/CD System Integration
Using GitHub Actions as an example, integrating the code review agent team into the PR workflow:
# .github/workflows/ai-code-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate PR Diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > pr.diff
- name: Run OpenClaw Review Team
env:
OPENCLAW_API_KEY: ${{ secrets.OPENCLAW_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
openclaw agent \
--message "Review this PR diff and provide feedback" \
--context "pr_number=${{ github.event.pull_request.number }}" \
--context "pr_url=${{ github.event.pull_request.html_url }}"
7.4 Review Quality Assessment
In actual deployment, the multi-agent code review system demonstrated the following results:
- Security vulnerability detection rate improved by 40% compared to single agents (thanks to the security agent's specialized system prompt)
- False positive rate reduced by 25% (each agent's high degree of focus reduces cross-domain confusion)
- Average review completion time: 3.2 minutes (vs. 8.5 minutes for a single agent)
- Developer adoption rate: 78% of suggestions were accepted and implemented by developers
8. Performance and Cost Optimization
Multi-agent systems introduce additional coordination overhead. Without optimization, this overhead can negate the benefits of parallelization. Below are key optimization strategies.
8.1 Token Usage Optimization
System prompt compression: Each agent launch consumes the system prompt's tokens. For frequently launched agents, keep system prompts under 500 tokens by removing redundant descriptions.
Intermediate result truncation: When subagent outputs are passed directly to the next agent, token bloat can occur. The orchestrator agent should perform summary compression before passing results:
agents:
coordinator:
inter_agent_compression:
enabled: true
strategy: extractive-summary
max_tokens_per_result: 2000 # Maximum 2000 tokens per subagent result
8.2 Decision Framework for Parallel vs. Sequential Execution
Not all subtasks are suitable for parallel execution. Incorrect parallelization increases coordination complexity and can actually reduce overall performance.
Criteria for determining whether parallel execution is appropriate:
- Subtask B's input does not depend on subtask A's output -> parallelizable
- Subtask B depends on part of subtask A's output -> consider splitting A's output and passing what B needs first
- Subtask B completely depends on subtask A's full output -> must execute sequentially
team:
execution_plan:
# Batch 1: Fully parallelizable
parallel_batch_1:
- paper-searcher
- web-scraper
# Batch 2: Depends on batch 1 results
parallel_batch_2:
- data-analyst # Needs all results from batch 1
# Batch 3: Depends on batch 2
sequential:
- report-writer # Needs data-analyst's complete output
8.3 Caching Strategy
In multi-turn conversations or repetitive task scenarios, subagent intermediate results can be cached to avoid repeating expensive operations:
agents:
paper-searcher:
cache:
enabled: true
ttl: 3600s # Cache search results for 1 hour
key_template: "search_{query_hash}"
store: redis # Supports memory, redis, disk
Cache hit rates significantly impact costs: in research-type tasks, cache hit rates for identical or similar topic searches can reach 40-60%, effectively reducing redundant API call costs.
8.4 Model Selection Strategy
Selecting the most appropriate model for each agent is the most effective means of reducing costs. Recommended principles:
| Agent Type | Task Characteristics | Recommended Model | Rationale |
|---|---|---|---|
| Orchestrator Agent | Logical reasoning, task decomposition | Claude 3.5 Sonnet | Strong reasoning, moderate cost |
| Data Collection Agent | Information extraction, format conversion | GPT-4o-mini | Fast, low cost, sufficient capability |
| Analysis Agent | Complex analysis, pattern recognition | Claude 3.5 Sonnet | Strong analytical ability, good value |
| Creative Output Agent | High-quality text generation | Claude Opus | Highest output quality, used for final deliverables |
| Routing/Classification Agent | Simple classification, keyword extraction | DeepSeek-V3 / Ollama | Ultra-low cost, minimal latency |
9. Comparison with Other Multi-Agent Frameworks
Before selecting OpenClaw Agent Teams, it is worthwhile to conduct an objective comparison with the major competitors on the market.[3][8]
9.1 Four-Framework Comparison
| Dimension | OpenClaw Agent Teams | AutoGen | CrewAI | LangGraph |
|---|---|---|---|---|
| Configuration | YAML declarative | Python code | Python code | Python code |
| Entry Difficulty | Low | Medium | Medium | High |
| Workflow Flexibility | Medium | High | Medium | Highest |
| Built-in GUI | Yes (OpenClaw UI) | Yes (AutoGen Studio) | No | Yes (LangSmith) |
| Multi-LLM Support | Claude/GPT/DeepSeek/Ollama | Extensive | Extensive | Extensive |
| Monitoring & Observability | Basic | Moderate | Basic | Comprehensive (LangSmith) |
| Community Activity | Rapidly growing | Mature | Mature | Mature |
| Best Suited For | Rapid prototyping, standard workflows | Research experiments | Role-playing collaboration | Complex dynamic workflows |
9.2 Core Advantages of OpenClaw Agent Teams
YAML-first configuration philosophy: For non-Python developers (such as backend engineers or product managers), the YAML configuration entry barrier is far lower than writing Python class definitions required by AutoGen or CrewAI. This enables non-technical business stakeholders to participate in the agent system design process.
Deep integration with the OpenClaw ecosystem: If your team is already using OpenClaw's single-agent features, migrating to Agent Teams has virtually no learning curve. The Skills system, Hooks system, and Gateway architecture all extend seamlessly to multi-agent scenarios.[6]
9.3 Current Limitations of OpenClaw Agent Teams
Objectively, OpenClaw Agent Teams still lags behind mature frameworks in the following areas:
- Insufficient dynamic workflow support: LangGraph's graph-based workflows allow dynamic adjustment of agent topology based on runtime conditions; OpenClaw's current YAML declarative configuration lacks flexibility in this regard
- Basic monitoring tools: Lacks a LangSmith-level comprehensive tracing and evaluation toolchain
- Relatively limited community resources: While growing at an impressive rate, production case studies are still fewer than those for AutoGen and LangGraph
Recommendation: If your task flow is relatively fixed (such as the research report generation and code review examples in this article), choose OpenClaw Agent Teams; if you need complex conditional branching and dynamic routing, consider LangGraph; if your team is research-focused, AutoGen's flexibility is better suited for experimental scenarios.
10. Common Issues and Best Practices
10.1 Debugging Multi-Agent Systems
Debugging multi-agent systems is significantly more difficult than single agents, because problems can originate from: agent configuration errors, message format inconsistencies, timing issues (Race Conditions), or error propagation between agents.
Recommended debugging workflow:
- Isolation testing: Test each subagent individually to confirm it produces correct output given standard input
- Enable verbose logging: Set
logging.level: debugin the development environment to log all inter-agent messages - Fix random seeds: Fix the model's random seed in testing to ensure reproducible results
- Start with simple scenarios: Validate the overall flow with the simplest possible input before testing edge cases
# Debug mode configuration
team:
debug:
enabled: true
save_agent_messages: true
save_intermediate_results: true
output_dir: "./debug-logs"
replay_mode: false # Set to true to replay failed message sequences
10.2 Monitoring and Observability
In production environments, multi-agent systems require continuous monitoring to ensure stable operation:
team:
monitoring:
metrics:
- agent_execution_time
- token_usage_per_agent
- task_success_rate
- inter_agent_message_count
alerts:
- condition: "task_success_rate < 0.95"
action: slack-notify
channel: "#ops-alerts"
- condition: "agent_execution_time > timeout * 0.8"
action: log-warning
10.3 Error Handling Best Practices
In a multi-agent system, a single agent's failure should not cause the entire workflow to crash. Below is a three-layer error handling strategy:
- Agent layer: Each agent internally handles predictable errors (API rate limiting, format errors), returning standard error objects rather than throwing exceptions
- Coordination layer: The orchestrator agent listens for subagent error events and decides whether to retry, switch to a backup agent, or degrade gracefully based on the fallback strategy
- System layer: Set global timeouts and circuit breakers that pause related agent calls when error rates exceed thresholds
10.4 Security Considerations
Multi-agent systems introduce new security attack surfaces, particularly prompt injection attacks: malicious input can propagate through subagent outputs to other agents, thereby affecting the entire system's behavior.
Protective measures:
- Perform schema validation on subagent outputs, rejecting outputs that do not conform to expected formats
- When passing data between agents, explicitly distinguish between "trusted instructions" and "untrusted user data"
- Set up human review checkpoints for agents that perform high-risk operations (file writes, API calls)
10.5 Removing and Managing Subagents
In OpenClaw's Agent Teams configuration, removing (deleting) a subagent requires addressing multiple aspects simultaneously to avoid residual message routing errors:
# Steps for safely removing a subagent
# Step 1: Remove the target agent from the subagents list
agents:
coordinator:
subagents:
# - web-scraper <-- Remove this line
- data-analyst
- report-writer
# Step 2: Remove related routing rules
routing_rules:
# - skill: "web-search"
# route_to: "web-scraper" <-- Remove this block
# Step 3: Remove event subscriptions
team:
event_bus:
events:
# - name: "scraping_completed" <-- Remove the entire event definition
# publisher: "web-scraper"
# subscribers: ["data-analyst"]
# Step 4: Remove the agent definition itself
# Delete the entire agents.web-scraper block
It is recommended to first set the agent to disabled: true and observe system behavior for a period before executing a full removal, confirming that no other agents depend on its output.
10.6 Cross-Agent Skill Management
When multiple agents share the same skill, centralized skill version management is needed to prevent different agents from using incompatible skill versions:
# Global skill version locking
team:
skill_registry:
web-search: "2.1.0" # All agents using web-search are forced to use this version
code-analyzer: "1.5.2"
file-reader: "3.0.0"
Conclusion
Multi-agent system architecture represents a significant milestone in AI agent development — evolving from "a single AI assistant" to "an AI team." OpenClaw Agent Teams lowers the entry barrier for multi-agent systems through YAML declarative configuration, enabling more developers and business professionals to participate in designing and deploying complex automated workflows.[9]
The two practical case studies presented in this article — the research intelligence system and the code review pipeline — have both been validated in real-world environments, demonstrating the performance advantages and cost-effectiveness of multi-agent architectures. As the OpenClaw community continues to grow, we expect Agent Teams' capabilities to continue improving, particularly in dynamic workflow support and monitoring tools.[10]
For teams evaluating multi-agent systems, we recommend starting with a minimum viable case (MVP): select the most time-consuming and most parallelizable task in an existing workflow, build a small team with 2-3 agents, and gradually expand after validating results. Multi-agent system complexity should grow as requirements are confirmed, rather than pursuing a comprehensive architecture design from the outset.


