Browser Automation
Yunque Agent can control a headless or headed browser to navigate websites, extract content, fill forms, take screenshots, and perform complex web automation tasks.
Architecture
Agent (Planner)
↓ browser_exec skill
BrowserHub (WebSocket server)
↓ commands (navigate, click, type, screenshot)
Browser Extension (Chrome/Edge/Firefox)
↓ CDP (Chrome DevTools Protocol)
Browser TabThe browser is controlled via a companion browser extension that connects to the agent's WebSocket hub. This approach works with any Chromium-based browser and doesn't require running a separate headless Chrome instance.
Setup
1. Install the Extension
The browser extension is located in browser-extension/. Load it as an unpacked extension in Chrome:
- Open
chrome://extensions - Enable Developer Mode
- Click "Load unpacked" → select
browser-extension/ - The extension connects to
ws://localhost:9090/v1/browser/ws
2. Enable Browser Skills
Browser skills are automatically registered at startup. No additional configuration is needed.
Capabilities
| Action | Description |
|---|---|
navigate | Open a URL in the browser |
click | Click on an element by CSS selector |
type | Type text into an input field |
screenshot | Capture a full-page or element screenshot |
extract | Extract text content from the page |
scroll | Scroll to a position or element |
wait | Wait for an element or condition |
evaluate | Run JavaScript in the page context |
Usage in Chat
Simply describe what you want the agent to do with a browser:
"Go to Hacker News and find the top 3 stories about AI"
"Open my GitHub dashboard and screenshot my contribution graph"
"Fill out the contact form on example.com with my name and email"
The agent's planner automatically delegates to the browser_exec skill when web interaction is needed.
E2B Desktop Sandbox
For cloud-based browser automation without a local browser, Yunque supports E2B Desktop sandboxes:
- Full desktop environment: Ubuntu with browser, terminal, and desktop apps
- noVNC streaming: Real-time desktop view in the Dashboard via iframe
- Automatic URL generation:
https://6080-{sandboxID}.e2b.app/vnc.html
Configuration
| Variable | Description |
|---|---|
SANDBOX_CLOUD_ENABLED | Enable E2B cloud sandbox |
SANDBOX_CLOUD_API_KEY | E2B API key |
SANDBOX_CLOUD_BASE_URL | E2B API base URL |
SANDBOX_CLOUD_TEMPLATE | Sandbox template ID |
SANDBOX_CLOUD_TIMEOUT | Sandbox timeout (seconds) |
API
POST /v1/sandbox/desktop # Create desktop sandbox
GET /v1/sandbox/desktop/status # Check status
POST /v1/sandbox/desktop/destroy # Destroy sandboxConnection Stability
The browser bridge includes reliability features:
- Exponential backoff reconnection: 2s → 60s with jitter
- CDP command retry: automatic reattach on failure (2 retries)
- Dynamic timeouts: navigate (60s), screenshot (20s), others (45s)
- Health monitoring: error count tracking and health status API
- Content script injection: automatic re-injection on page navigation
Dashboard
The Browser page in the Dashboard provides:
- Connection status indicator
- Active session list with screenshots
- Manual navigation controls
- E2B Desktop tab for cloud sandbox management