CAMEL-AICAMEL-AI
DocsAbout
Back
BLOG / Research

Introducing the Hybrid Browser Toolkit: Faster, Smarter Web Automation for MCP

All you need to know about the Hybrid browser toolkit

October 2, 2025|32 min read
Introducing the Hybrid Browser Toolkit: Faster, Smarter Web Automation for MCP
  • **Enter the Hybrid Browser Toolkit**
  • Table of Contents
  • Architecture Overview
  • Table of Contents
  • Architecture Evolution
  • HybridBrowserToolkit Architecture
  • Core Architectural Improvements
  • 1. Multi-Mode Operation System
  • 2. TypeScript Framework Integration
  • 3. Enhanced Element Identification
  • Legacy System:
  • Custom ID injection
  • New ARIA Mapping System
  • 4. \_snapshotForAI and ARIA Mapping Mechanism
  • 5. Enhanced Stealth Mechanism
  • 6. Tool Registration and Screenshot Handling
  • 7. Form Filling Optimization
  • Tools Reference
  • Browser Session Management
  • browser_open
  • Basic browser opening
  • With default URL configuration
  • Browser opens directly to Google
  • browser_close
  • Always close the browser when done
  • Navigation Tools
  • browser_visit_page
  • Visit a single page
  • Visit multiple pages (creates multiple tabs)
  • browser_back
  • Navigate through history
  • Go back
  • browser_forward
  • Navigate forward after going back
  • Go forward again
  • Information Retrieval Tools
  • browser_get_page_snapshot
  • Get full page snapshot
  • Output:
  • - link "Home" [ref=1]
  • - button "Sign In" [ref=2]
  • - textbox "Search" [ref=3]
  • - link "Products" [ref=4]
  • With viewport limiting
  • Only returns elements currently visible in viewport
  • browser_get_som_screenshot
  • Basic screenshot capture
  • "Screenshot captured with 42 interactive elements marked (saved to: ./assets/screenshots/page_123456_som.png)"
  • With AI analysis
  • "Screenshot captured... Agent analysis: Found 5 form fields: username [ref=3], password [ref=4], email [ref=5], phone [ref=6], submit button [ref=7]"
  • For visual verification
  • Complex UI with overlapping elements
  • The tool automatically handles:
  • - Dropdown menus that overlay other content
  • - Modal dialogs
  • - Nested interactive elements
  • - Elements with transparency
  • Parent-child fusion example
  • A button containing an icon and text will be marked as one element, not three
  • <button [ref=5]>
  • <i class="icon"></i>
  • <span>Submit</span>
  • </button>
  • Will appear as single "button Submit [ref=5]" instead of separate elements
  • browser_get_tab_info
  • Check all open tabs
  • Find a specific tab
  • Interaction Tools
  • browser_click
  • Simple click
  • Click that opens new tab
  • Click with error handling
  • browser_type
  • Single input
  • Handle dropdown/autocomplete with intelligent detection
  • Autocomplete example with diff detection
  • Multiple inputs at once
  • Clear and type
  • Working with combobox elements
  • Automatic child element discovery
  • When the ref points to a container, browser_type finds the input child
  • Even though ref="search-container" might be a <div>, the tool will find
  • and type into the actual <input> element inside it
  • Complex UI component example
  • The visible element might be a styled wrapper
  • Tool automatically finds the actual input field within the date picker component
  • browser_select
  • Select by value attribute
  • Common pattern: type to filter, then select
  • Snapshot shows filtered options
  • browser_enter
  • Submit search form
  • Page navigates to search results
  • Submit login form
  • Page Control Tools
  • browser_scroll
  • Basic scrolling
  • Scroll to load more content
  • Paginated scrolling
  • browser_mouse_control
  • Click at specific coordinates
  • Right-click for context menu
  • Double-click to select text
  • Click on canvas or image maps
  • browser_mouse_drag
  • Drag item to trash
  • Reorder list items
  • Move file to folder
  • browser_press_key
  • Single key press
  • Key combinations
  • Navigation shortcuts
  • Function keys
  • Tab Management Tools
  • browser_switch_tab
  • Get current tabs and switch
  • Switch to the second tab
  • Switch back to first tab
  • Pattern: Open link in new tab and switch
  • browser_close_tab
  • Close current tab
  • Close all tabs except the first
  • Developer Tools
  • browser_console_view
  • Check for JavaScript errors
  • Monitor console during interaction
  • browser_console_exec
  • Get page information
  • Modify page elements
  • Extract data
  • Trigger custom events
  • Check element states
  • Special Tools
  • browser_wait_user
  • Wait for CAPTCHA solving
  • Indefinite wait for complex manual steps
  • Wait with instructions
  • Complete Automation Example
  • Operating Modes
  • Overview
  • Text Mode
  • Text Mode Key Features
  • Text Mode Basic Usage
  • Standard Text Mode Operation
  • Initialize toolkit in default text mode
  • Open browser and get initial snapshot
  • Navigate to a page - automatically returns new snapshot
  • Output example:
  • - link "Home" [ref=1]
  • - button "Login" [ref=2]
  • - textbox "Username" [ref=3]
  • - textbox "Password" [ref=4]
  • - link "Register" [ref=5]
  • Interact with elements
  • Type into input fields
  • Submit form
  • On-Demand Snapshot Retrieval
  • Get snapshot without performing actions
  • Viewport-limited snapshot (only visible elements)
  • Advanced Text Mode
  • Full Visual Mode (No Automatic Snapshots)
  • Initialize with full_visual_mode to disable automatic snapshots
  • Actions now return minimal information
  • Must explicitly request snapshots when needed
  • Useful for performance-critical operations
  • Diff Snapshot Feature
  • Visual Mode
  • Visual Mode Key Features
  • Visual Mode Basic Usage
  • Taking SoM Screenshots
  • Initialize toolkit for visual operations
  • Basic screenshot capture
  • Output: "Screenshot captured with 23 interactive elements marked
  • (saved to: ./assets/screenshots/example_com_home_123456_som.png)"
  • Screenshot with custom analysis
  • print(result)
  • ref e4 is shopping cart
  • Quick Decision Matrix
  • Best Practices
  • Connection Modes
  • Overview
  • Standard Playwright Connection
  • Basic Setup
  • Basic initialization - creates new browser instance
  • Open browser
  • Perform actions
  • Close browser when done
  • Configuration Options
  • Headless vs Headed Mode
  • Headless mode (default) - no visible browser window
  • Headed mode - visible browser window
  • Headed mode with specific window size
  • User Data Persistence
  • Persist browser data (cookies, localStorage, etc.)
  • Example: Login once, reuse session
  • Stealth Mode
  • Timeout Configuration
  • Comprehensive timeout configuration
  • Standard Use Cases
  • 1. Testing and Development
  • Development setup with logging
  • All browser actions will be logged
  • Logs include timing, inputs, outputs, and errors
  • 2. Multi-Session Automation
  • Create multiple independent browser sessions
  • 3. Long-Running Automation
  • Configuration for long-running tasks
  • CDP Connection Mode
  • What is CDP?
  • Step 1: Launch Browser with Remote Debugging
  • Launch Chrome with remote debugging port
  • macOS
  • Windows
  • Linux
  • Step 2: Get WebSocket Endpoint
  • Get browser WebSocket endpoint
  • Example: ws://localhost:9222/devtools/browser/abc123...
  • Step 3: Connect via CDP
  • Connect to existing browser
  • The browser is already running, so we don't call browser_open()
  • We can immediately start interacting with it
  • Get current tab info
  • Work with existing tabs
  • Keep Current Page Mode
  • Connect without creating new tabs
  • Work with the existing page
  • CDP with Custom Configuration
  • CDP connection with full configuration
  • MCP configuration
  • **Quick Setup**
  • **1. Install the MCP Server**
  • **2. Configure Claude Desktop**
  • **3. Restart Claude Desktop**
  • **Available Browser Tools**
  • **Basic Usage Example**‍
  • Claude can now control browsers with simple commands:
  • **Customization**
  • **Troubleshooting**
  • Summary

If you've been working with our original Camel BrowserToolkit, you might have noticed a pattern: it got the job done but had some limitations. It operated mainly by taking screenshots and injecting custom IDs into pages to find elements. It was a single-mode, monolithic Python setup that worked for basic tasks, but we knew we could do better. The screenshot-based approach meant you were essentially teaching an AI to click on pictures, which worked but felt a bit like using a smartphone with thick gloves on. Plus, the quality of those visual snapshots wasn't always great, and you couldn't easily access the underlying page structure when you needed it.

Enter the Hybrid Browser Toolkit

That's where our new Hybrid Browser Toolkit comes in. We've rebuilt everything from the ground up using a TypeScript-Python architecture that gives you the best of both worlds. Why TypeScript? It's not just about Playwright's native AI-friendly snapshot features – TypeScript is fundamentally better suited for efficient browser operations with its event-driven nature, native async/await support, and direct access to browser APIs without the overhead of language bridges. The TypeScript server handles all the heavy lifting of browser control while Python provides the familiar interface you love. But we didn't stop there. We've added support for CDP (Chrome DevTools Protocol) mode to connect to existing browser instances, and MCP (Model Context Protocol) integration for enhanced AI agent capabilities. You're not just limited to visual operations anymore – now you can seamlessly switch between visual and text-based interactions, access detailed DOM information, and enjoy snapshots that are crisp, accurate, and actually make sense. We've obsessed over the details too, from better element detection to smarter action handling, making the whole experience feel more natural and reliable. It's like upgrading from that smartphone-with-gloves setup to having direct, precise control over everything you need to do in a browser.

This blog organized into four main chapters:

Table of Contents

  1. Architecture Overview
  2. Tools Reference
  3. Operating Modes
  4. Connection Modes

Architecture Overview

This chapter provides a comprehensive comparison between the legacy BrowserToolkit and the new HybridBrowserToolkit, highlighting the architectural improvements, new features, and enhanced capabilities.

Table of Contents

  • Architecture Evolution

  • Core Architectural Improvements

      1. Multi-Mode Operation System
      2. TypeScript Framework Integration
    1. Enhanced Element Identification
    1. _snapshotForAI and ARIA Mapping Mechanism
    1. Enhanced Stealth Mechanism
    1. Tool Registration and Screenshot Handling

Architecture Evolution

The name "Hybrid" hints at the key change: we combined Python and TypeScript to get the best of both worlds. Instead of one heavy Python process doing everything, we now have a layered architecture where Python and a Node.js (TypeScript) server work together.

In the hybrid_browser_toolkit architecture, when you issue a command, it goes through a WebSocket to a TypeScript server that is tightly integrated with Playwright's Node.js API. This server manages a pool of browser instances and routes commands asynchronously. Python remains the interface (so you still write Python code to use the toolkit), but the heavy lifting happens in Node.js. Why is this good news? Because direct Node.js calls to Playwright are faster, and Playwright's latest features (like new selectors or the _snapshotForAI function) are fully available to us.

This layered design also makes the system modular. We have:

  • A Python layer for the API you call and configuration management.
  • A WebSocket bridge connecting the Python and TypeScript layers.
  • A TypeScript server layer that acts as the controller (routing commands, managing sessions).
  • A Browser control layer with controllers for different modes (text, visual, hybrid) and connection types.
  • The Playwright integration layer where actual browser actions happen with Playwright's Node.js capabilities (including things like _snapshotForAI and ARIA selectors).

In simpler terms, Python is the brain giving high-level instructions, and TypeScript is the brawn executing them efficiently. By splitting responsibilities this way, the toolkit can do more in parallel and handle complicated tasks without getting stuck.

HybridBrowserToolkit Architecture

The HybridBrowserToolkit introduces a modular, multi-layer architecture:

Core Architectural Improvements

1. Multi-Mode Operation System

The HybridBrowserToolkit supports three distinct operating modes:

  • Text Mode: Pure textual snapshot from _snapshotForAI
  • Visual Mode: Text snapshot filtered and visualized as SoM screenshot
  • Hybrid Mode: Intelligent switching between text and visual outputs

2. TypeScript Framework Integration

AdvantageLegacy Python ApproachTypeScript FrameworkBenefits
Browser API IntegrationPython → JS bridge with overheadDirect native Playwright API calls- Lower latency - Better performance - Access to latest features
Asynchronous OperationsLimited async supportNative async/await throughout- Non-blocking operations - Better concurrency - Efficient resource usage
Element InteractionCustom JavaScript injectionNative Playwright methods- More reliable - Better error handling - Cleaner code
Real-time EventsPolling-based updatesWebSocket event streaming- Instant updates - Lower resource usage - Better responsiveness
Type SafetyRuntime type checking onlyCompile-time type checking- Catch errors early - Better IDE support - Safer refactoring
PerformanceMultiple language contextsSingle runtime environment- Low-latency calls - Lower CPU usage
Browser FeaturesLimited to Python bindingsFull Playwright API access- Playwright SnapshotForAI - Advanced debugging
Error HandlingCross-language error propagationNative error boundaries- Clearer stack traces - Better error recovery - Easier debugging

3. Enhanced Element Identification

Legacy System:

# Custom ID injection
page.evaluate("__elementId = '123'")
target = page.locator("[__elementId='123']")

‍

New ARIA Mapping System

// Native Playwright ARIA selectors
await page.locator('[aria-label="Submit"]').click()
await page.getByRole('button', { name: 'Submit' }).click()//
_snapshotForAI integration
const snapshot = await page._snapshotForAI();// Returns structured element data with ref mappings

‍

4. _snapshotForAI and ARIA Mapping Mechanism

‍

Pipeline:

  • _snapshotForAI analyzes the DOM and extracts ARIA properties
  • Elements are classified by their semantic roles
  • A unified ref ID system maps to ARIA selectors
  • The same foundation serves both text and visual modes
  • Visual mode is built on top of the text snapshot by filtering and adding markers

5. Enhanced Stealth Mechanism

Key Stealth Enhancements:

  1. Legacy Approach:
    • Single flag
    • Hardcoded user agent string
    • Applied only during browser launch
    • No flexibility for different contexts
  2. HybridBrowserToolkit Approach:
    • Comprehensive Flag Set: Multiple anti-detection browser arguments
    • Configurable System: StealthConfig object allows customization
    • Context Adaptation: Different behavior for CDP vs standard launch
    • Dynamic Headers: Can set custom HTTP headers and user agents
    • Persistent Context Support: Maintains stealth across sessions

6. Tool Registration and Screenshot Handling

Key Differences from Legacy:

  • Legacy: Screenshot stored in memory, passed as object
  • Hybrid: Screenshot saved to disk, agent accesses via file path
  • Memory Efficiency: Only file path in memory, not entire image
  • Agent Integration: Uses registered agent pattern for clean separation

7. Form Filling Optimization

New Features:

  • Multi-input support in single command
  • Intelligent dropdown detection
  • Diff snapshot for dynamic content
  • Error recovery mechanisms

‍

Tools Reference

This chapter provides a comprehensive reference for all tools available in the HybridBrowserToolkit. Each tool is designed for specific browser automation tasks, from basic navigation to complex interactions

Browser Session Management

browser_open

Opens a new browser session. This must be the first browser action before any other operations.

Parameters:

  • None

Returns:

  • result (str): Confirmation message
  • snapshot (str): Initial page snapshot (unless in full_visual_mode)
  • tabs (List[Dict]): Information about all open tabs
  • current_tab (int): Index of the active tab
  • total_tabs (int): Total number of open tabs

Example:

# Basic browser opening
toolkit = HybridBrowserToolkit(headless=False)
result = await toolkit.browser_open()

print(f"Browser opened: {result['result']}")
print(f"Initial page snapshot: {result['snapshot']}")
print(f"Total tabs: {result['total_tabs']}")

# With default URL configuration
toolkit = HybridBrowserToolkit(
    default_start_url="https://www.google.com"
)
result = await toolkit.browser_open()
# Browser opens directly to Google

‍

browser_close

Closes the browser session and releases all resources. Should be called at the end of automation tasks.

Parameters:

  • None

Returns:

  • (str): Confirmation message

Example:

# Always close the browser when done
try:
    await toolkit.browser_open()
    # ... perform automation tasks ...
finally:
    result = await toolkit.browser_close()
    print(result)  # "Browser session closed."

‍

Navigation Tools

browser_visit_page

Opens a URL in a new browser tab and switches to it. Creates a new tab each time it’s called.

Parameters:

  • url (str): The web address to load

Returns:

  • result (str): Confirmation message
  • snapshot (str): Page snapshot after navigation
  • tabs (List[Dict]): Updated tab information
  • current_tab (int): Index of the new active tab
  • total_tabs (int): Updated total number of tabs

Example:

# Visit a single page
result = await toolkit.browser_visit_page("https://example.com")
print(f"Navigated to: {result['result']}")
print(f"Page elements: {result['snapshot']}")

# Visit multiple pages (creates multiple tabs)
sites = ["https://github.com", "https://google.com", "https://stackoverflow.com"]
for site in sites:
    result = await toolkit.browser_visit_page(site)
    print(f"Tab {result['current_tab']}: {site}")
print(f"Total tabs open: {result['total_tabs']}")

‍

browser_back

Navigates back to the previous page in browser history for the current tab.

Parameters:

  • None

Returns:

  • result (str): Confirmation message
  • snapshot (str): Snapshot of the previous page
  • tabs (List[Dict]): Current tab information
  • current_tab (int): Index of active tab
  • total_tabs (int): Total number of tabs

Example:

# Navigate through history
await toolkit.browser_visit_page("https://example.com")
await toolkit.browser_visit_page("https://example.com/about")

# Go back
result = await toolkit.browser_back()
print(f"Navigated back to: {result['result']}")

‍

browser_forward

Navigates forward to the next page in browser history for the current tab.

Parameters:

  • None

Returns:

  • Same as browser_back

Example:

# Navigate forward after going back
await toolkit.browser_visit_page("https://example.com")
await toolkit.browser_back()  # Back to homepage

# Go forward again
result = await toolkit.browser_forward()
print(f"Navigated forward to: {result['result']}")

‍

Information Retrieval Tools

browser_get_page_snapshot

Note: This is a passive tool that must be explicitly called to retrieve page information. It does not trigger any page actions.

Gets a textual snapshot of all interactive elements on the current page. Each element is assigned a unique ref ID for interaction.

Parameters:

  • None (uses viewport_limit setting from toolkit initialization)

Returns:

  • (str): Formatted string listing all interactive elements with their ref IDs

Example:

# Get full page snapshot
snapshot = await toolkit.browser_get_page_snapshot()
print(snapshot)
# Output:
# - link "Home" [ref=1]
# - button "Sign In" [ref=2]
# - textbox "Search" [ref=3]
# - link "Products" [ref=4]

# With viewport limiting
toolkit_limited = HybridBrowserToolkit(viewport_limit=True)
visible_snapshot = await toolkit_limited.browser_get_page_snapshot()
# Only returns elements currently visible in viewport

‍

browser_get_som_screenshot

Captures a screenshot with interactive elements highlighted and marked with ref IDs (Set of Marks). This tool uses an advanced injection-based approach with browser-side optimizations for accurate element detection.

Technical Features:

‍1. Injection-based Implementation: The SoM (Set of Marks) functionality is injected directly into the browser context, ensuring accurate element detection and positioning

  1. Efficient Occlusion Detection: Browser-side algorithms detect when elements are hidden behind other elements, preventing false positives

  2. Parent-Child Element Fusion: Intelligently merges parent and child elements when they represent the same interactive component (e.g., a button containing an icon and text)

  3. Smart Label Positioning: Automatically finds optimal positions for ref ID labels to avoid overlapping with page content

Parameters:

  • read_image (bool, optional): If True, uses AI to analyze the screenshot. Default: True
  • instruction (str, optional): Specific guidance for AI analysis

Returns:

  • (str): Confirmation message with file path and optional AI analysis

Example:

# Basic screenshot capture
result = await toolkit.browser_get_som_screenshot(read_image=False)
print(result)
# "Screenshot captured with 42 interactive elements marked (saved to: ./assets/screenshots/page_123456_som.png)"

# With AI analysis
result = await toolkit.browser_get_som_screenshot(
    read_image=True,
    instruction="Find all form input fields"
)
# "Screenshot captured... Agent analysis: Found 5 form fields: username [ref=3], password [ref=4], email [ref=5], phone [ref=6], submit button [ref=7]"

# For visual verification
result = await toolkit.browser_get_som_screenshot(
    read_image=True,
    instruction="Verify the login button is visible and properly styled"
)

# Complex UI with overlapping elements
result = await toolkit.browser_get_som_screenshot(read_image=False)
# The tool automatically handles:
# - Dropdown menus that overlay other content
# - Modal dialogs
# - Nested interactive elements
# - Elements with transparency

# Parent-child fusion example
# A button containing an icon and text will be marked as one element, not three
# <button [ref=5]>
#   <i class="icon"></i>
#   <span>Submit</span>
# </button>
# Will appear as single "button Submit [ref=5]" instead of separate elements

‍

browser_get_tab_info

Note: This is a passive information retrieval tool that provides current tab state without modifying anything.

Gets information about all open browser tabs including titles, URLs, and which tab is active.

Parameters:

  • None

Returns:

  • tabs (List[Dict]): List of tab information, each containing:
  • id (str): Unique tab identifier
  • title (str): Page title
  • url (str): Current URL
  • is_current (bool): Whether this is the active tab
  • current_tab (int): Index of the active tab
  • total_tabs (int): Total number of open tabs

Example:

# Check all open tabs
tab_info = await toolkit.browser_get_tab_info()

print(f"Total tabs: {tab_info['total_tabs']}")
print(f"Active tab index: {tab_info['current_tab']}")

for i, tab in enumerate(tab_info['tabs']):
    status = "ACTIVE" if tab['is_current'] else ""
    print(f"Tab {i}: {tab['title']} - {tab['url']} {status}")

# Find a specific tab
github_tab = next(
    (tab for tab in tab_info['tabs'] if 'github.com' in tab['url']),
    None
)
if github_tab:
    await toolkit.browser_switch_tab(tab_id=github_tab['id'])

‍

Interaction Tools

browser_click

Performs a click action on an element identified by its ref ID.

Parameters:

  • ref (str): The ref ID of the element to click

Returns:

  • result (str): Confirmation of the action
  • snapshot (str): Updated page snapshot after click
  • tabs (List[Dict]): Current tab information
  • current_tab (int): Index of active tab
  • total_tabs (int): Total number of tabs
  • newTabId (str, optional): ID of newly opened tab if click opened a new tab

Example:

# Simple click
result = await toolkit.browser_click(ref="2")
print(f"Clicked: {result['result']}")

# Click that opens new tab
result = await toolkit.browser_click(ref="external-link")
if 'newTabId' in result:
    print(f"New tab opened with ID: {result['newTabId']}")
    # Switch to the new tab
    await toolkit.browser_switch_tab(tab_id=result['newTabId'])

# Click with error handling
try:
    result = await toolkit.browser_click(ref="submit-button")
except Exception as e:
    print(f"Click failed: {e}")

‍

browser_type

Types text into input elements. Supports both single and multiple inputs with intelligent dropdown detection and automatic child element discovery.

Special Features:

  1. Intelligent Dropdown Detection:
    • When typing into elements that might trigger dropdown options (such as combobox, search fields, or autocomplete inputs), the tool automatically:
      • Detects if new options appear after typing
      • Returns only the newly appeared options via diffSnapshot instead of the full page snapshot
      • This optimization reduces noise and makes it easier to interact with dynamic dropdowns
  2. Automatic Child Element Discovery:
    • If the specified ref ID points to a container element that cannot accept text input directly, the tool automatically:
      • Searches through child elements to find an input field
      • Attempts to type into the first suitable child input element found
      • This is particularly useful for complex UI components where the visible element is a wrapper around the actual input

Parameters (Single Input):

  • ref (str): The ref ID of the input element (or container with input child)
  • text (str): The text to type

Parameters (Multiple Inputs):

  • inputs (List[Dict[str, str]]): List of dictionaries with ‘ref’ and ‘text’ keys

Returns:

  • result (str): Confirmation message
  • snapshot (str): Updated page snapshot (full snapshot for regular inputs)
  • diffSnapshot (str, optional): For dropdowns, shows only newly appeared options
  • details (Dict, optional): For multiple inputs, success/error status for each
  • Tab information fields

Example:

# Single input
result = await toolkit.browser_type(ref="3", text="john.doe@example.com")

# Handle dropdown/autocomplete with intelligent detection
result = await toolkit.browser_type(ref="search", text="laptop")
if 'diffSnapshot' in result:
    print("Dropdown options appeared:")
    print(result['diffSnapshot'])
    # Example output:
    # - option "Laptop Computers" [ref=45]
    # - option "Laptop Bags" [ref=46]
    # - option "Laptop Accessories" [ref=47]

    # Click on one of the options
    await toolkit.browser_click(ref="45")
else:
    # No dropdown appeared, continue with regular snapshot
    print("Page snapshot:", result['snapshot'])

# Autocomplete example with diff detection
result = await toolkit.browser_type(ref="city-input", text="San")
if 'diffSnapshot' in result:
    # Only shows newly appeared suggestions
    print("City suggestions:")
    print(result['diffSnapshot'])
    # - option "San Francisco" [ref=23]
    # - option "San Diego" [ref=24]
    # - option "San Antonio" [ref=25]

# Multiple inputs at once
inputs = [
    {'ref': '3', 'text': 'username123'},
    {'ref': '4', 'text': 'SecurePass123!'},
    {'ref': '5', 'text': 'john.doe@example.com'}
]
result = await toolkit.browser_type(inputs=inputs)
print(result['details'])  # Success/failure for each input

# Clear and type
await toolkit.browser_click(ref="3")  # Focus
await toolkit.browser_press_key(keys=["Control+a"])  # Select all
await toolkit.browser_type(ref="3", text="new_value")  # Replaces content

# Working with combobox elements
async def handle_searchable_dropdown():
    # Type to search/filter options
    result = await toolkit.browser_type(ref="country-select", text="United")

    if 'diffSnapshot' in result:
        # Shows only countries containing "United"
        print("Filtered countries:", result['diffSnapshot'])
        # - option "United States" [ref=87]
        # - option "United Kingdom" [ref=88]
        # - option "United Arab Emirates" [ref=89]

        # Select one of the filtered options
        await toolkit.browser_click(ref="87")

# Automatic child element discovery
# When the ref points to a container, browser_type finds the input child
result = await toolkit.browser_type(ref="search-container", text="product name")
# Even though ref="search-container" might be a <div>, the tool will find
# and type into the actual <input> element inside it

# Complex UI component example
# The visible element might be a styled wrapper
result = await toolkit.browser_type(ref="styled-date-picker", text="2024-03-15")
# Tool automatically finds the actual input field within the date picker component

‍

browser_select

Selects an option in a dropdown (

CAMEL-AI TeamCAMEL-AI Team

CAMEL-AI Team

Contributor

Recent Posts

SETA: Scaling Environments for Terminal Agents
ResearchJan 9, 2026

SETA: Scaling Environments for Terminal Agents

In SETA, we start with building robust toolkits to empower the agent’s planning and terminal capability, and achieved SOTA performance

CAMEL-AI TeamCAMEL-AI Team
CAMEL-AI Team21 min
Brainwash Your Agent: How We Keep The Memory Clean
Nov 7, 2025

Brainwash Your Agent: How We Keep The Memory Clean

CAMEL-AI TeamCAMEL-AI Team
CAMEL-AI Team13 min
How CAMEL Rebuilt Browser Automation: From Python to TypeScript for Reliable AI Agents
Oct 31, 2025

How CAMEL Rebuilt Browser Automation: From Python to TypeScript for Reliable AI Agents

How CAMEL AI rebuilt browser automation with TypeScript for faster, smarter, and more reliable AI web interactions.

CAMEL-AI TeamCAMEL-AI Team
CAMEL-AI Team14 min

Finding the Scaling Laws of Agents.

Subscribe to get product updates, research highlights, and new frameworks for building your fully local AI workforce.

CAMEL-AICAMEL-AI

Building the world's first and best multi-agent framework to power autonomous, reliable AI workforces.

GitHubTwitter/XDiscordLinkedInYouTubeReddit

Product

Open Source CoworkEnvironments for Agents

Research

CAMELSETAOWLOASISCRABLOONGAGENT-TRUSTEMOS

Community

Community HubAmbassadorMCP

Framework

Information RetrievalSoftware EngineeringMultimodalEmbodied AISocial SimulationDomain-specific

Resources

BlogsDocsTech StackAboutCareersBranding

© 2026 CAMEL-AI. All rights reserved.