Achieving value with Large Language Models (LLMs) hinges on a reliable data foundation. This is becoming increasingly relevant with the introduction of conversational AI agents that exploit RAG (retrieval augmented generation) techniques to extract information from biomedical data. What isn’t emphasized enough, is the crucial role that well-annotated data and its accessibility to the models plays.
In this webinar, we look at how data quality affects the performance of LLMs. For this, we assess how LLM-powered AI agents query across three versions of the same gene expression corpus, but with varying degrees of quality:
- Unstructured Data from GEO (Gene expression Omnibus)
- Structured Data from the CREEDS project
- ML-ready data, annotated using Elucidata’s Polly
- Abhishek Jha, CEO & Co-Founder at Elucidata