04-26, 13:30–15:00 (America/Los_Angeles), St. Helens
As a data science and machine learning practitioner, you’ll learn how Flyte, an open source data- and machine-learning-aware orchestration tool, is designed to overcome the challenges in building and maintaining ML models in production. You'll experiment with using Flyte to build ML pipelines with increasing complexity and scale!
Machine learning models present unique challenges beyond those encountered in the traditional software development lifecycle. This tutorial will examine four key obstacles that arise during ML model development: scalability, data quality, reproducibility, recoverability, and auditability. By leveraging Flyte, an open-source orchestration tool designed for data science and machine learning workflows, we will demonstrate how to overcome these challenges and generalize the techniques for a deeper understanding of the underlying principles.
First, we will provide a definition and description of the four challenges in the context of ML model development. Next, we will delve into the ways in which Flyte offers solutions to these challenges, explaining the reasoning behind Flyte's data-centric and ML-aware design, including:
Flyte tasks and workflows: the building blocks for expressing execution graphs
Dynamic workflows: for defining execution graphs at runtime
Map tasks: Scale embarrassingly parallel workflows
Plugins: Extend Flyte's core functionality
Type System: See the benefits of static type safety
DataFrame Types: Validate dataframe-like objects at runtime
Reproducibility: Containerize and harden your execution graph
Caching: Don't waste precious compute re-running nodes
Recovering Executions: Build fault-tolerant pipelines
Checkpointing: Checkpoint progress within a node
Flyte Decks: Create rich static reports associated with your data/model artifacts
Participants will gain knowledge of how Flyte handles distribution and scalability of computation, implements static and runtime type safety checks, utilizes Docker to ensure reproducibility of results, incorporates caching and checkpointing to recover from failed model training runs, and includes data lineage tracking for complete auditability of data pipelines.
Previous knowledge expected