## Problem 1. AI coding assistants (Copilot, Cursor, Aider.chat) accelerate software development. 2. People typically code not by reading documentation but by asking Llama, ChatGPT, Claude, or other LLMs. 3. LLMs struggle to understand documentation as it requires reasoning. 4. New projects or updated documentation often get overshadowed by legacy code. ## Solution - To help LLMs comprehend new documentation, we need to generate a large number of usage examples. ## How we do it 1. Download the documentation from the URL and clean it by removing menus, headers, footers, tables of contents, and other boilerplate. 2. Analyze the documentation to extract main ideas, tools, use cases, and target audiences. 3. Brainstorm relevant use cases. 4. Refine each use case. 5. Conduct a human review of the code. 6. Organize the validated use cases into a dataset or RAG system. ## Tools we used https://github.com/kirilligum/synth-dev - **Restack**: To run, debug, log, and restart all steps of the pipeline. - **TogetherAI**: For LLM API and example usage. See: https://github.com/kirilligum/synth-dev/blob/main/streamlit_fastapi_togetherai_llama/src/functions/function.py - **Llama**: We used Llama 3.2 3b, breaking the pipeline into smaller steps to leverage a more cost-effective model. Scientific research shows that creating more data with smaller models is more efficient than using larger models. See: https://github.com/kirilligum/synth-dev/blob/main/streamlit_fastapi_togetherai_llama/src/functions/function.py - **LlamaIndex**: For LLM calls, prototyping, initial web crawling, and RAG. See: https://github.com/kirilligum/synth-dev/blob/main/streamlit_fastapi_togetherai_llama/src/functions/function.py
Category tags:Team member not visible
This profile isn't complete, so fewer people can see it.