PyData Seattle 2023

Trust Fall: Hidden Gems in MLFlow that Improve Experiment Reproducibility
04-28, 11:00–11:45 (America/Los_Angeles), Rainier

When it comes to data driven projects, verifying and trusting experiment results is a particularly grueling challenge. This talk will explore both how we can use Python to instill confidence in performance metrics for data science experiments and the best way to keep experiments versioned to increase transparency and accessibility across the team. The tactics demonstrated will help data scientists and machine learning engineers save precious development time and increase transparency by incorporating metric tracking early on.


Data driven projects are challenging due to the complexity of handling large data sets and versioning data, but verifying and trusting experiment results in large teams can be a particularly grueling challenge. The problem becomes more complex with more team members and larger scope. Data scientists and engineers need to integrate metric tracking throughout the project lifecycle to ensure that their experiment results are reliable, reproducible and transparent. This talk will explore both how we can use Python to instill confidence in data driven projects and the best way to keep experiments versioned to increase transparency and accessibility across the team.

In this talk, I'll present the following three MLFlow features that can improve transparency and credibility in experiment results. If engineers aren’t applying these techniques, they could risk presenting false or irreproducible model results and waste precious development time retracing their steps to find the best experiment trial.

MLFlow Autologging: This feature requires minimal code and stores all relevant configurations, parameters and metrics, reducing the likelihood that some aspect of the project has not been captured. It makes it easy to compare the results of data driven experiments over a long period of time.
MLFlow System Tags: A small but mighty feature, these reserved tags store important metadata. They capture the state of the codebase at the time of the run, such as git branch, commit hash and developer name. It helps solve a familiar issue to machine learning engineers - how to rerun the exact data pipeline code that was used to generate the best metrics in an experiment by another developer.
*MLFlow Model Registry: As the project gets closer to finish, it can be daunting to sift through experiments and runs. Leveraging some of MLFlow's Python API, we'll show how engineers can version their best experiment. This makes it easy to share experiment performance and encourage other engineers to reproduce the results.

By the end of this talk, Python developers will know how to leverage some powerful techniques to increase model performance credibility, integrate tracking to enhance collaboration over the project lifecycle and version models to always be ready for production.


Prior Knowledge Expected

No previous knowledge expected

Krishi Sharma is a software developer at KUNGFU.AI where she builds software applications that power machine learning models and deliver data for a broad range of services. As a former data scientist and machine learning engineer, she is passionate about building tools that ease the infrastructure dependencies and reduce potential technical debt around handling data. She helped build and maintains an internal Python tool, Potluck, which allows machine learning engineers the ability to bootstrap a containerized, production ready application with data pipelining templates so that her team can focus on the data and metrics without squandering too much time finagling with deployment and software.

This speaker also appears in: