"Accelerating Research in Artificial Intelligence for Structural Biology." The project's main goal is to leverage Llama 3.2 to analyze and gain insights from scientific papers focused on molecular docking—specifically, how ligands bind to protein targets. Understanding the intricacies of protein-small molecule docking can reveal binding affinities and specific molecular interactions like hydrogen bonds and hydrophobic contacts. By analyzing these binding patterns, researchers can optimize ligand design for better therapeutic outcomes. PLINDER, a comprehensive dataset of 449,383 PLI systems, aims to enhance predictions in small molecule drug design by addressing limitations found in existing datasets, such as size and diversity. Despite its grandeur, there were gaps in citation information for certain structures, prompting the team to engage with the PLINDER team via Discord, who recommended accessing the Protein Data Bank (PDB). To address these gaps, the bioAI team presents "Plinderp," a website and database providing citations, metadata links to relevant papers, and sometimes full-text articles for all PLINDER entries. They also created a Python package, "plinderpdoibio," enhancing user access to the Plinder dataset. For the hackathon, the team developed a new website, plinderp.doi.bio, which aggregates citation links for PLINDER entries, allowing users to access valuable research information efficiently. A specific example is provided for the TEM beta-lactamase enzyme in Escherichia coli, showcasing its citation links and full-text availability. Additionally, an API is available for easy access to Plinderp functionalities. The Python package hosted on PyPi allows fast access to the original dataset for community use. Focusing on reducing barriers for Llama developers, the team highlights the complexities of working with large language models in structural biology, particularly in data preparation and evaluation.
Category tags: