We use 2022-2023 Football Player Stats from Kaggle. The data encompasses nearly 2500 players across Premier League, Ligue 1, Bundesliga, Serie A, and La Liga. Covering 125 metrics, ranging from basic player information such as name, age, and nation, to performance statistics like goals and pass completion rates, our dataset is extensive and diverse. To harness and organize this wealth of information, we leverage Cohere Embedding and Weaviate Cloud Service (WCS), employing vector transformation, storage, and indexing. The focal points of our application are the Chat and Compare Player features, each powered by advanced language models, including Cohere and ChatCohere. Both functionalities employ Retrieval-augmented Generation (RAG) techniques, albeit with distinct details and components. For the Chat feature, we've constructed a compressor retriever using Cohere Rag Retriever, incorporating a web-search connector and CohereRerank as a compressor. Within the ConversationBufferMemory chain, this chain processes chat history (a list of messages) and new questions, ultimately delivering a response. The algorithm in this chain comprises three key steps: first, the creation of a "standalone question" using chat history and the new question; second, passing this question to the retriever to fetch relevant documents; and finally, utilizing the retrieved documents in conjunction with either the new question or the original question and chat history to generate a comprehensive response. Conversely, the Player Comparison feature utilizes the Weaviate Hybrid Search Retriever to extract statistical data of players by their names from WCS. Through an LLM chain, we then summarize this data and generate a comprehensive report based on the retrieved documents. Our approach ensures a robust and dynamic platform for users seeking nuanced insights into player performances across top football leagues.
Embark on a journey into the intricate world of company 10K filings with The Financial Pandalist ā your premier ally in deciphering financial intricacies! Are you seeking expert advice or in-depth analysis on a particular company's 10K filing? Look no further! Our platform offers a seamless experience ā just upload the 10K document of your choice and dive into a personalized chat with our financial experts. That is what we aim to do in our project. We mainly employ Retrieval-Augmented Generation (RAG) to our project. Our technology stack encompasses TruLens as an evaluation tool, LLM and Embedding model from VertexAI, LangChain for crafting diverse chain types, and ChromaDB serving as the vector store. Our goal is to compare four distinct chain-type techniques: Stuff, Map Rescue, Refine, and Map Re-rank. To assess their effectiveness, we employ the RAG Triad concept and evaluate each chain type across three dimensions: Context Relevance, Groundedness, and Answer Relevance. For evaluation purposes, we employ approximately 20 questions related to financial analysis. Leveraging TruLens, we observe that the Stuff and Refine chain types score notably high on Groundedness and Answer Relevance, respectively, while the others fall short. To enhance overall performance scores, several factors, such as varying prompts, need further investigation.