Footer navigation

Unlocking state-of-the-art artificial intelligence and building with the world's talent

LinkedIn
Twitter/X
Instagram
Discord
YouTube
Twitch

Other group brands:

Links

AI Tech
AI Hackathons
AI Tutorials
AI Applications
NativelyAI
AI Articles
Leaderboard
Writers

lablab

About
Brand
Hackathon Guidelines
Terms of Use
Code of Conduct
Privacy Policy

Get in touch

Discord
Sponsor
Cooperation
Contribute
community@lablab.ai

© 2026 NativelyAI Inc. All rights reserved.

3.42.1

Help CenterBrowse FAQs and ask our AI.

Discord CommunityChat with mentors and the team.

Claim FREE $100

creditsAI Hackathons AI Apps AI Tech AI Tutorials AI Articles NativelyAI Sponsor

AI app: Visual Agents — Autonomous AI Computer Control for AI Agent Olympics Hackathon

Home
App Discovery
Visual Agents — Autonomous AI Computer Control

Visual Agents — Autonomous AI Computer Control

Created by team Tenzero on May 18, 2026

Gemini 3 Flash Gemini 3 pro Gemini AI AI Studio Generative AI Studio Generative Agents AgentOps

Intelligent ReasoningAgentic WorkflowsMultimodal IntelligenceEnterprise UtilityGoogle Track

Visual Agents is an autonomous AI system that moves beyond chatbots into an agent that literally sees your screen, understands your intent, and completes complex tasks across ANY application on your OS, not just browsers. WHAT IT DOES Unlike browser-only or sandboxed tools like Comet or Claude Computer Use, Visual Agents operates across your entire operating system: Chrome, Excel, Photoshop, Slack, Terminal, SAP, desktop apps, internal enterprise tools, anything visible on screen. Give it a real-world instruction like "Pull last quarter's sales data from our ERP system, cross-reference it in Excel, build a summary chart, then email the final report via Outlook to the leadership team" and it plans every step, switches between apps, reads live UI state, handles errors mid-task, and delivers the result. No APIs. No plugins. No scripts. THE ARCHITECTURE: SEE, THINK, ACT, REMEMBER SEE: Gemini Live API streams real-time screen capture. OmniParser and SOM visual grounding identify interactive elements with pixel-level precision across any UI, any app, any OS state. THINK: A Task Planner powered by Gemini breaks goals into executable steps using state-aware planning (OSCAR-inspired), detecting failures and replanning autonomously without human input. ACT: The Action Executor performs clicks, typing, scrolling, app-switching, and keyboard shortcuts with post-action screenshot verification after every step. REMEMBER: A hierarchical memory system stores successful action trajectories. The agent gets smarter with every completed task. KEY HIGHLIGHTS Full OS control, not just browser automation V4 Mode: SOM grounding, trajectory memory, adaptive replanning, Gemini Live voice Real-Time Voice: Speak your task, no typing required Privacy-Aware: Never stores credentials or sensitive data TECH STACK Gemini Live API, Gemini 3 Pro, OmniParser, PyAutoGUI, MSS, PyAudio, Python 3.11 Open-source under MIT license. The age of manual computing is ending.

Category tags:

Productivity, Desktop Application, Virtual Assistant, Voice Assistant, Operating System (OS) Utility, Web Scraping & Data Extraction, ProjectFromScratch

Github Presentation Demo

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server

Gowtham

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server