BlindAssist is a groundbreaking application tailored to empower blind and visually impaired individuals by integrating advanced artificial intelligence technologies. At its core, the application utilizes Falcon AI71 to process and analyze information from the user's surroundings, providing real-time, actionable insights. Through a combination of computer vision and natural language processing, BlindAssist transforms visual data into comprehensible text and audio feedback, enhancing the userโs ability to navigate and interact with their environment. The application aims to bridge accessibility gaps and offer greater independence and confidence to visually impaired users.
SuperHuman AI: Automating web-based tasks with AI-driven intelligence for improved efficiency Objective: SuperHuman AI is designed to automate complex activities over web browsers based on user-defined objectives. By leveraging advanced AI models for vision and text intelligence, the project focuses on streamlining tasks such as job applications, form-filling, and data entry. It reduces manual effort and human error, enabling faster execution and better user productivity. Key Features: Visual Intelligence: Powered by Llama 3.2 11B Multimodal Vision Instruct Model, capable of processing visual data to automate complex tasks that require an understanding of on-screen elements. Automation through Selenium: Uses Python Selenium for navigating and interacting with web pages, understanding DOM elements, and mimicking user interactions. Advanced RAG Pipeline: Integrates Vectara and Chroma DB for advanced retrieval-augmented generation (RAG) to provide real-time, accurate answers and suggestions during automation. AI Agent with CrewAI: Built using CrewAI for orchestrating complex tasks in sequence, with the ability to break down objectives into manageable steps and execute them efficiently. AI Agent Steps: Initiates the browser and navigates to the target website. Utilizes Selenium to analyze and understand the site's DOM elements. Visual intelligence processes visual data for more advanced activities (e.g., interpreting dynamic content, etc.). Gathers the userโs overall objective and breaks it into implementable steps. Executes each step while adapting to the dynamic web environment. Provides analytics and feedback to the user at the end of the task. Core Technologies: AI Agent: CrewAI LangGraph AI Tools: Llama 3.2 11B Multimodal Vision Instruct Model Llama 3.2 3B Lightweight Text Model Advanced RAG Pipeline Use-cases: LinkedIn Job Automation Google Forms Filler KYC Processing Data Entry Automation