Data Quality for LLMs: Building a Reliable Data Foundation

When

JavaScript Disabled
JavaScript Disabled
(Date and time are shown in your browser's local time zone)

Event Type

Achieving value with Large Language Models (LLMs) hinges on a reliable data foundation. This is becoming increasingly relevant with the introduction of conversational AI agents that exploit RAG (retrieval augmented generation) techniques to extract information from biomedical data. What isn’t emphasized enough, is the crucial role that well-annotated data and its accessibility to the models plays.

In this webinar, we look at how data quality affects the performance of LLMs. For this, we assess how LLM-powered AI agents query across three versions of the same gene expression corpus, but with varying degrees of quality:

  • Unstructured Data from GEO (Gene expression Omnibus)
  • Structured Data from the CREEDS project
  • ML-ready data, annotated using Elucidata’s Polly
Speaker
  • Abhishek Jha, CEO & Co-Founder at Elucidata

 
Please click here to view a recording of this event and other past webinars.