Browser Automation
How agents interact with the web using accessibility-tree snapshots instead of screenshots.
Overview
The browser plugin gives agents full web browsing capability inside the sandbox. Instead of screenshots (5MB images that consume huge token budgets), OIAB uses accessibility tree snapshots — text representations of page structure with ref=N element IDs.
A typical snapshot looks like:
[page] Acme Corp — Dashboard
[nav ref=1] Main Navigation
[link ref=2] Home
[link ref=3] Reports
[main ref=4]
[heading ref=5] Q2 2026 Pipeline
[button ref=6] Export CSV
[table ref=7] Opportunities
[row ref=8] Acme Deal | $50k | Closing Q2
The agent reads the text, says browser_click ref=6, and the plugin translates that to a Playwright page.click() call. This is ~100× cheaper than screenshots.
Available Tools
| Tool | Arguments | Description |
|---|---|---|
browser_navigate | url | Open URL; returns accessibility snapshot |
browser_click | ref | Click element by ref ID; returns new snapshot |
browser_type | ref, text | Type into element; returns new snapshot |
browser_snapshot | — | Return current page accessibility tree |
browser_screenshot | — | Return base64 PNG screenshot (use sparingly) |
browser_scroll | direction, amount? | Scroll page; returns new snapshot |
Example Agent Interaction
User: Go to acme.com and download the latest invoice
Agent: [browser_navigate url="https://acme.com"]
→ snapshot shows login form
Agent: [browser_type ref=12 text="alice@acme.com"]
Agent: [browser_type ref=13 text="***"]
Agent: [browser_click ref=14] (Sign In button)
→ snapshot shows dashboard
Agent: [browser_click ref=22] (Invoices link)
→ snapshot shows invoice list
Agent: [browser_click ref=31] (Download PDF for latest invoice)
→ File saved to /workspace/user/invoice-2026-03.pdf
Sandbox Requirements
Browser automation requires the sandbox image, which already includes:
- Chromium +
chromium-codecs-ffmpeg - Playwright (installed in the sandbox via
bun add playwright) - Xvfb virtual display (
:1) - noVNC at
:6080for visual debugging
The browser plugin uses a lazy singleton — one Chromium instance per sandbox process, created on first browser_navigate call and closed on process exit.
Debugging Visually
Open http://localhost:6080 in your browser to see the live Chromium session via noVNC. This is invaluable for debugging complex web interactions.
Audit Trail
Every browser action is logged to the audit log:
| Action | Logged Fields |
|---|---|
browser.navigate | url, sessionId |
browser.click | ref, url |
browser.type | ref (content redacted) |
browser.screenshot | url |
Performance Tips
- Prefer
browser_snapshotoverbrowser_screenshot— snapshots are text and don't consume image tokens - Use
browser_clickwith specificrefIDs rather than broad selectors - Close unnecessary tabs: the plugin supports a single page instance; navigate to a new URL to replace it
