Skip to content

Browser Automation

Yunque Agent can control a headless or headed browser to navigate websites, extract content, fill forms, take screenshots, and perform complex web automation tasks.

Architecture

Agent (Planner)
  ↓ browser_exec skill
BrowserHub (WebSocket server)
  ↓ commands (navigate, click, type, screenshot)
Browser Extension (Chrome/Edge/Firefox)
  ↓ CDP (Chrome DevTools Protocol)
Browser Tab

The browser is controlled via a companion browser extension that connects to the agent's WebSocket hub. This approach works with any Chromium-based browser and doesn't require running a separate headless Chrome instance.

Setup

1. Install the Extension

The browser extension is located in browser-extension/. Load it as an unpacked extension in Chrome:

  1. Open chrome://extensions
  2. Enable Developer Mode
  3. Click "Load unpacked" → select browser-extension/
  4. The extension connects to ws://localhost:9090/v1/browser/ws

2. Enable Browser Skills

Browser skills are automatically registered at startup. No additional configuration is needed.

Capabilities

ActionDescription
navigateOpen a URL in the browser
clickClick on an element by CSS selector
typeType text into an input field
screenshotCapture a full-page or element screenshot
extractExtract text content from the page
scrollScroll to a position or element
waitWait for an element or condition
evaluateRun JavaScript in the page context

Usage in Chat

Simply describe what you want the agent to do with a browser:

"Go to Hacker News and find the top 3 stories about AI"

"Open my GitHub dashboard and screenshot my contribution graph"

"Fill out the contact form on example.com with my name and email"

The agent's planner automatically delegates to the browser_exec skill when web interaction is needed.

E2B Desktop Sandbox

For cloud-based browser automation without a local browser, Yunque supports E2B Desktop sandboxes:

  • Full desktop environment: Ubuntu with browser, terminal, and desktop apps
  • noVNC streaming: Real-time desktop view in the Dashboard via iframe
  • Automatic URL generation: https://6080-{sandboxID}.e2b.app/vnc.html

Configuration

VariableDescription
SANDBOX_CLOUD_ENABLEDEnable E2B cloud sandbox
SANDBOX_CLOUD_API_KEYE2B API key
SANDBOX_CLOUD_BASE_URLE2B API base URL
SANDBOX_CLOUD_TEMPLATESandbox template ID
SANDBOX_CLOUD_TIMEOUTSandbox timeout (seconds)

API

POST /v1/sandbox/desktop           # Create desktop sandbox
GET  /v1/sandbox/desktop/status    # Check status
POST /v1/sandbox/desktop/destroy   # Destroy sandbox

Connection Stability

The browser bridge includes reliability features:

  • Exponential backoff reconnection: 2s → 60s with jitter
  • CDP command retry: automatic reattach on failure (2 retries)
  • Dynamic timeouts: navigate (60s), screenshot (20s), others (45s)
  • Health monitoring: error count tracking and health status API
  • Content script injection: automatic re-injection on page navigation

Dashboard

The Browser page in the Dashboard provides:

  • Connection status indicator
  • Active session list with screenshots
  • Manual navigation controls
  • E2B Desktop tab for cloud sandbox management

© 2025 云鸢科技(青岛)有限公司 × Dream Lab