1
1
1 year of experience

AXON is an open-source, vision-based autonomous desktop agent built on the paradigm of "GitHub Copilot for your entire Operating System." Unlike traditional RPA or automation tools that rely on brittle element selectors, fixed coordinate macros, or application-specific APIs, AXON interacts with computers exactly like a human does: by looking at the screen. Using an advanced cognitive loop, AXON captures live screen feeds, analyzes the visual layout using Vision Language Models (VLMs) and OCR, dynamically charts a multi-step execution plan, and fires native system-level commands to control the cursor and keyboard. If a user can see a task on their display, AXON can automate it. Key Technical Architecture & Features: Multi-LLM Integration: Seamlessly supports cloud-based frontier models (Google Gemini, Anthropic Claude) alongside fully local, privacy-focused deployments via Ollama. Fluid PyQt6 Overlay: Uses a hardware-accelerated transparent canvas with a color-coded visual reticle that tracks agent states (Idle, Thinking, Moving, Clicking) without taking over the screen. Safety Engineering: Built with a global hardware F12 Kill Switch to instantly halt background threads, paired with algorithmic stuck-loop detection to prevent runaway inputs. By turning complex system workflows into a simple natural language interface, AXON bridges the gap between AI reasoning and native OS execution—democratizing desktop automation for everyone.
17 May 2026