PyData Seattle 2023

Monitoring in the era of Generative AI, LLVMs, and embeddings – why truly scalable approaches matter
04-27, 15:30–16:15 (America/Los_Angeles), St. Helens

Monitoring data science and AI applications is different in the era of generative AI, large language and vision models (LLVMs), and embeddings, especially given the massive datasets involved. We discuss how to monitor this increasingly common data in a truly scalable way using open source data logging library, whylogs.

Machine learning and data science for high-dimensional complex and unstructured data types like images, text, and embeddings is becoming increasingly common, thanks in part to ChatGPT and others. Whether you’re hoping to train your own large language or computer vision model (LLVM) or using an existing offering via API, it is critical that you understand how your system works and is being used in training and production.

At WhyLabs, we’ve created an open source data logging library, whylogs, for data scientists and engineers to take advantage of our highly accurate and lightweight approach to aggregating batch and/or streaming data in a way that works for generative AI, LLVMs, embeddings data, and tabular data .

In this talk, we (1) discuss why monitoring that’s scalable and mergeable is crucial for data science in the generative AI era; (2) how you can practically monitor generative AI and unstructured data; and (3) discuss how we approach these issues in an open source context with whylogs.

Prior Knowledge Expected

No previous knowledge expected

Bernease Herman is a data scientist at WhyLabs and a research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Her academic research focuses on evaluation metrics and interpretable ML with specialty on synthetic data and societal implications. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.