
Vibecoder1
.png&w=828&q=75)
Chrome offers an unusually deep programmatic surface to AI agents — the DevTools Protocol, the accessibility tree, JavaScript evaluation, network interception, multi-tab and multi-context isolation. For most of the web, that surface is enough to skip visual reasoning entirely from the first action, replacing screenshot-and-guess with structured targeting that survives DOM changes. This project drives all of it through a single MCP server, with accessibility references and a six-rung escalation ladder that auto-selects targeting strategy. Vision and OCR remain available as fallbacks for canvas apps, custom-drawn UIs, and anything without useful structure. What makes the architecture more than a Chrome controller is the workflow layer alongside it. While the agent works, network traffic is captured and the underlying API patterns extracted into replayable flows. On subsequent runs the agent skips the browser entirely and executes direct HTTP calls — millisecond execution at a fraction of the inference cost. Credentials and TOTP seeds live in an OS-level vault, so replay works across sessions and machines without secrets ever touching chat context. Optional personal-assistant mode extends the same engine beyond Chrome to native Windows applications via UI Automation and OCR-based control of anything else, for workflows where browser scope isn't enough. Most agentic browser tooling today is either tied to a vendor's cloud or wraps a single automation library. This stack stays fully open, MCP-native, and self-hostable — running over local STDIO, which removes the network hop between agent and tool at every step and meaningfully cuts step latency on multi-action tasks. Composable with any compatible host and any other MCP tool. The reasoning layer runs on AMD Developer Cloud with ROCm-hosted open vision and language models. No proprietary inference dependencies anywhere in the stack.
10 May 2026