SuperHuman AI: Automating web-based tasks with AI-driven intelligence for improved efficiency Objective: SuperHuman AI is designed to automate complex activities over web browsers based on user-defined objectives. By leveraging advanced AI models for vision and text intelligence, the project focuses on streamlining tasks such as job applications, form-filling, and data entry. It reduces manual effort and human error, enabling faster execution and better user productivity. Key Features: Visual Intelligence: Powered by Llama 3.2 11B Multimodal Vision Instruct Model, capable of processing visual data to automate complex tasks that require an understanding of on-screen elements. Automation through Selenium: Uses Python Selenium for navigating and interacting with web pages, understanding DOM elements, and mimicking user interactions. Advanced RAG Pipeline: Integrates Vectara and Chroma DB for advanced retrieval-augmented generation (RAG) to provide real-time, accurate answers and suggestions during automation. AI Agent with CrewAI: Built using CrewAI for orchestrating complex tasks in sequence, with the ability to break down objectives into manageable steps and execute them efficiently. AI Agent Steps: Initiates the browser and navigates to the target website. Utilizes Selenium to analyze and understand the site's DOM elements. Visual intelligence processes visual data for more advanced activities (e.g., interpreting dynamic content, etc.). Gathers the userโs overall objective and breaks it into implementable steps. Executes each step while adapting to the dynamic web environment. Provides analytics and feedback to the user at the end of the task. Core Technologies: AI Agent: CrewAI LangGraph AI Tools: Llama 3.2 11B Multimodal Vision Instruct Model Llama 3.2 3B Lightweight Text Model Advanced RAG Pipeline Use-cases: LinkedIn Job Automation Google Forms Filler KYC Processing Data Entry Automation