Problem Statement: 1) Over 70% of PDFs contain critical data in images like charts and tables, especially research articles 2) Gemini is released for English only today. Can we build a solution for 1) Answering natural language questions based on images in PDFs ? 2) Making Gemini accessible for non english speakers? By leveraging Spire, Open AI GPT 3.5, Gemini Pro Vision and Trulens, I have built an application that solves both problems - Spire for Image Extraction - Open AI for Translation to English (optional) - Gemini-Pro-Vision for the answer - TruLens for Monitoring
Category tags:"Great use of Gemini to make PDFs and images more accessible + use of trulens to make sure it's safe. Areas of improvement: - A narrower use case can often be more impactful than a general one, and bring a lot of value! Focus on selling to your first customers, not the whole market. - It would have been nice to see evaluations that validated the core capabilities of the app in addition to the harmlessness evaluations you completed."
Josh Reini
DevRel
"excellent work. amazing and very useful idea"
Walaa Nasr Elghitany
Lablab Head Judge