PyData Seattle 2023

MLOps Deployment Patterns with Delta Lake and MLflow
04-27, 16:15–17:00 (America/Los_Angeles), St. Helens

Would you be better off deploying an ML model or the code that generates the model? This talk, targeted to practitioners, covers different deployment patterns for machine learning applications. Beyond introducing these patterns, we’ll discuss the downstream implications of each with respect to reproducibility, audit tracing, and CI/CD. To demonstrate solution driven architecture, we’ll lean on Delta and MLflow as core technologies to track lineage and manage the deployment strategy. The goal of this session is to empower practitioners to design efficient, automated, and robust machine learning systems.

The traditional CI/CD process has been leveraged in software engineering for decades; however deploying a machine learning model includes several additional steps. These nuanced differences are described within MLOps. However, there’s no MLOps deployment golden ticket, different paradigms require different approaches. In evaluating each approach, it’s important to note how the entire system will be impacted. Additionally, each approach comes with its own risks and challenges. Delta, an open source storage framework, and MLflow, an open source python machine learning lifecycle library, are technologies that mitigate these risks and challenges.Risks associated ML model deployment can be mitigated with Delta by providing transactional consistency guarantees and enabling easy versioning of datasets. Similarly, MLflow can help mitigate risks associated with deploying machine learning models by providing reproducible experiment tracking, model versioning, and deployment flexibility.

Problem framing and introductions
What are Delta Lake and MLflow
Why do we care about MLOps
Deployment patterns
Deploy code, deploy model, hybrid
Reproducibility and Audit Tracing
Delta Lake, time travel, transaction log, MLflow model registry, etc.
MLflow deployment optimizations
Pyfunc using arrow
Working with Delta Lake via (Rust) Python-bindings
Python centric toolset overview

The presentation will last 30 minutes including 5 minutes for Q&A. It is oriented towards practitioners. Some familiarity with CI/CD and ML development is beneficial but not required. The audience will leave this talk with the ability to evaluate their specific ML deployment needs and architect the appropriate deployment pattern.

Prior Knowledge Expected

No previous knowledge expected

Mary Grace Moesta is a senior data science consultant at Databricks. She's been working in the big data and data science space for several years with opportunities to collaborate across several verticals, with the majority of her work focused in the Retail and CPG space. Prior to Databricks, Mary Grace has been able to contribute to several machine learning applications namely - personalization use cases, forecasting, recommendation engines, and customer experience measures.