April 2,2025
Research
The field of autonomous agents is experiencing a renaissance. These AI systems—designed to reason, interact with tools, and complete complex tasks—are making rapid and tangible progress. From cutting-edge research frameworks to powerful platforms enabling agents to manage incredibly intricate workflows. These systems are no longer just promising demos, they’re beginning to reshape how we think about digital labor and automation.
A key enabler of this progress is the Model Context Protocol (MCP), introduced by Anthropic. MCP serves as a new standard for connecting AI assistants to the systems where data lives—including content repositories, business tools, and development environments. It has quickly gained traction, especially with Cursor and Windsurf's integration. OpenAI recently announced their support for MCP in their agent SDK, marking a significant step for the ecosystem. We have also integrated it into the CAMEL framework to embrace the MCP ecosystem.
Despite these advancements, agents still face a fundamental limitation: they struggle with long-term decision-making and adaptation. While they can execute well-scoped tasks, they falter on multi-step objectives that require learning, revising plans, or reacting to change. Current agents follow instructions but don’t truly evolve through experience.
This gap stems from the static nature of internet training data. Language models learn from passive text, not from interaction. To gain real autonomy, agents must operate and evolve within environments—digital or physical spaces where they can perceive, act, and learn from experience. Only through this feedback loop can agents begin to improve through trial and error.
To address this “last mile” challenge in agent automation, we introduce OWL and CRAB, two agent automations projects and MCP integration that are designed specifically for interactive environments.
OWL (Optimized Workforce Learning), built on top of the CAMEL-AI Framework, is our recently released project for real-world task automation. OWL has shown promise in task automation, achieving an impressive average score of 58.18 on the GAIA benchmark—ranking #1 among open-source submissions.
OWL is a multi-agent system for automating digital tasks through the use of a browser, terminal, code execution, function calls, and MCP tools. The project has integrated:
The core of OWL’s functionality is built on the CAMEL framework’s RolePlaying module, which creates unique initial settings for different agents through predefined prompts. This system primarily utilizes two main agents:
This architecture enables OWL to handle complex workflows through dynamic agent interactions, making it particularly effective for task automation across diverse domains.
Furthermore, OWL employs a multi-agent system with context isolation for handling long-horizon tasks. Specialized sub-agents maintain isolated context windows for their domain (e.g., WebAgent keeps browser interaction history separate from main agent context).
MCP has emerged as the “USB interface” of the LLM field, becoming a universal solution for addressing AI information silos, with its ecosystem growing daily. OWL supports the MCP protocol to call MCP servers within its ecosystem, achieving more standardized and efficient tool invocation.
1. Setting Up MCP Servers
First, install the required MCP servers:
# Install MCP Playwright Server
npm install -g @executeautomation/playwright-mcp-server
npx playwright install-deps
2. Configure MCP Servers
Create a configuration file named `mcp_servers_config.json` with the following structure:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["-y", "@executeautomation/playwright-mcp-server"]
}
}
}
3. Implementation in OWL
Here’s how to integrate OWL with MCP in your code:
import asyncio
import sys
from camel.models import ModelFactory
from camel.toolkits import MCPToolkit
from camel.types import ModelPlatformType, ModelType
from camel.societies import RolePlaying
from camel.logger import set_log_level
from owl.utils.enhanced_role_playing import arun_society
set_log_level(level="DEBUG")
async def main():
# Initialize MCP toolkit and connect
mcp_toolkit = MCPToolkit(config_path="mcp_servers_config.json")
try:
await mcp_toolkit.connect()
# Get task from command line or use default
task = sys.argv[1] if len(sys.argv) > 1 else (
"Using a web browser, search Google Scholar for Andrew Ng's academic profile. Create a comprehensive report that includes: (1) his main research directions in AI and machine learning, (2) at least five of his most influential published papers with citation counts, (3) his affiliated institutions throughout his career, and (4) a summary of his impact on the field."
)
# Setup model
model = ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O,
)
# Create and run society
society = RolePlaying(
task_prompt=task,
user_role_name="user",
user_agent_kwargs={"model": model},
assistant_role_name="assistant",
assistant_agent_kwargs={
"model": model,
"tools": mcp_toolkit.get_tools(),
},
)
answer, chat_history, token_count = await arun_society(society)
print(f"\033[94mAnswer: {answer}\033[0m")
finally:
try:
await mcp_toolkit.disconnect()
except Exception:
print("Disconnect failed")
if __name__ == "__main__":
asyncio.run(main())
Consider this task: “Using a web browser, search Google Scholar for Andrew Ng's academic profile. Create a comprehensive report that includes: (1) his main research directions in AI and machine learning, (2) at least five of his most influential published papers with citation counts, (3) his affiliated institutions throughout his career, and (4) a summary of his impact on the field.”
The OWL framework with MCP can handle this by:
OWL’s development roadmap focuses on enhancing its capabilities in several key areas:
The recent integration of MCPToolkit, FileWriteToolkit, and TerminalToolkit represents significant progress toward these goals, enhancing OWL agents with MCP tool calling, file writing capabilities, and terminal command execution.
CRAB stands for CRoss-environment Agent Benchmark, is the first agent framework that supports cross-device task execution. This project aims to build a benchmark that enables agents to perform tasks across multiple environments. For instance, within the CRAB framework, an agent can read a message on a smartphone and then operate a PC based on the message content.
The term environment is crucial in CRAB. In the example above, there are two environments: an Ubuntu PC and an Android smartphone. In fact, an environment can be any device, application, or even a more complex multi-device system—as long as it has a well-defined action space and observation space.
Cross-environment capability is a crucial consideration in our framework, enabling agents to interact simultaneously with multiple devices or applications. This involves coordinating across environments, leveraging information between them, and passing messages. Much like humans who naturally navigate diverse environments—each with different action/observation spaces and logic, to solve complex problems, this capability is vital. However, it stands in contrast to most existing agent benchmarks, which are typically limited to interactions within a single device or application.
CRAB introduces the first cross-environment agent benchmark, CRAB Benchmark v0, which includes 120 tasks spanning more than 20 applications on Ubuntu desktops and Android smartphones. We believe that scaling agent environments is a key step toward building capable and practical agents.
The cross-environment capability unlocks tremendous potential for real-world applications. One exciting possibility is applying CRAB to IoT scenarios—imagine controlling all your devices through a single intelligent agent assistant. In industries such as networking and cloud computing, managing a large number of heterogeneous devices is a constant challenge. Our cross-environment paradigm offers a promising path forward in these domains.
We are actively improving CRAB and planning several key upgrades in the upcoming version:
We’ll be integrating more components into our official GitHub repo, including:
The integration of OWL and CRAB creates a potent ecosystem for developing, testing, and scaling agents.
OWL and CRAB complement each other in several important ways:
Combining these projects enables the generation of high-quality training data. Once established, the environments can be used to:
This data generation capability creates a virtuous cycle where agent performance continuously improves through iterative testing and refinement.
CAMEL-AI has identified environment as one of the three key dimensions in the scaling laws of agents—alongside:
This highlights how crucial environment design is to advancing agent technology.
Environments provide the context in which agents operate and learn. They define:
As environments become more diverse and complex, they drive the development of more sophisticated agent capabilities. This creates a scaling effect—better environments lead to better agents, which in turn can handle more complex environments.
The ability to operate across different environments represents a significant leap in agent capabilities. It requires:
CRAB’s focus on cross-environment benchmarking directly addresses these challenges, providing a structured way to measure and improve these critical capabilities.
CAMEL-AI’s hypothesis on the scaling laws of agents emphasizes that intelligence emerges from the interplay between agents and their environments. This aligns with Marvin Minsky’s Society of Mind concept—suggesting that intelligence is not monolithic, but emerges from diverse interactions. Environments serve as crucial testing grounds, stretching and refining agent capabilities. By developing increasingly complex environments, we drive the creation of more sophisticated agents—mirroring how human intelligence evolved through natural and social interactions.
As agent technology advances, environment design will likely focus on:
The combination of OWL's advanced agent capabilities and CRAB's rigorous environment specifications offers an ideal platform for exploring these frontiers.
The integration of OWL, CRAB, and MCP represents a significant step forward in solving the “last mile” challenge of agent automation.
By creating environments where agents can learn from experience, operate across platforms, and leverage standardized tool interfaces, we’re building the foundation for truly autonomous systems. As these projects continue to evolve, they promise to unlock new possibilities for AI agents—from more effective task automation to cross-environment coordination and continuous improvement through interaction. The future of agent technology lies not just in better models, but in better environments—environments that allow those models to learn, adapt, and grow through experience.
Join us in exploring this frontier of AI research and development—where the boundaries between environments dissolve, and agents gain the power to navigate our complex digital world with increasing autonomy and effectiveness. Ready to join? Click the link or paste it into your browser to apply now.
The combination of OWL and CRAB provides an ideal platform for exploring these directions, with OWL's sophisticated agent capabilities complemented by CRAB's rigorous environment specifications.
OWL GitHub: https://github.com/camel-ai/owl
CRAB GitHub: https://github.com/camel-ai/crab