16
2
Pakistan
2 years of experience
I am a Software Engineer with 2 years of industry experience. Working in Python for 4 years and utilizing it for developing desktop and web applications, working with Odoo ERP System, building AI model, and cloud engineering. While my expertise lies in Python. I am also experienced in TypeScript, Java, and C. I can work as a full-stack developer bur I am currently focusing on professional development to advance in the field of AI and Data Science so would prefer working in these areas.
Analyze It takes data as input, cleans it, and generates analysis, including visualizations, to make it easy for the user to understand. The system runs on the underlying architecture of Crew AI, Llama 3.1, Streamlit, and Grok. There are two crews involved: the Engineering Crew and the Analysis Crew. The work of the analysis crew begins after the engineering crew's work, so a pipeline is established to streamline the process. The engineering crew performs data engineering, which involves cleaning the data through a series of tasks. The engineering crew consists of multiple agents, each responsible for a specific task and equipped with a set of tools. The available tools include tools for data observation (getting full data, getting data by column), formatting / type casting of columns (formatting to datetime, number, or typecasting to specific type), handling null values (removing nulls, replacing nulls, with mean, median, mode, or a specific value), cleaning bad data (removing or replacing bad data with correct value), and removing duplicates. The analysis crew is responsible for analyzing the data, creating explanatory plots, and presenting the findings to the user. Here is the breakdown of the steps that Analyze It takes to analyze your data: 1. User inserts data. 2. The pipeline starts working. 3. The Engineering crew members clean the data using the available tools. 4. After the data is clean, it is given to the analysis crew for analysis. 5. The analysis crew analyzes the data, creates plots to visualize the data trends and patterns, and explains the plots to the user. 6. A report is generated for the user to review.
## Introduction In the software industry, creating software is a passion, but dealing with bugs is a headache. Similarly, in the data industry, working with clean data is a joy, while cleaning it is a chore. This year, many voiced concerns about AI taking over their jobs. They want AI to automate tasks they dislike, freeing up time for what they love. As one person put it, "I don't want AI to automate creative designing so that I can do my laundry, I want it to do my laundry so that I can design." This sentiment inspired me to create **Clean It**, a data cleaning service to tackle industry problems at their core. ## How It Works? **Clean It** is a multi-agent system that streamlines the data cleaning process through a sequential pipeline. Each agent has a specific set of tools and tasks, working in a linear fashion. This ensures that prerequisites are met to prevent data loss. All activities are logged for transparency. Simply input your data, click "clean it," and the logs will be available for review until you refresh the application, eliminating the need for constant monitoring. ## Key Features - **Automated Data Cleaning**: Clean It handles missing value imputation, outlier detection and removal, data standardization, and error correction. - **Accuracy**: Powered by the cutting-edge o1-mini model, ensuring top-notch performance. - **User-Friendly Interface**: Features a simple and intuitive interface for all users. ## How Clean It is built Clean It leverages the latest technology, utilizing Streamlit, a popular Python library, for a user-friendly interface. Its AI model, powered by the o1-mini large language model by Meta, identifies and solves complex data issues. ## Conclusion With **Clean It**, the data industry gains a powerful tool to automate data cleaning, allowing professionals to spend more time extracting valuable insights. This automation empowers informed decisions and drives data innovation.