PyData Seattle 2023

Plant a Touch-Me-Not: Train Models Without Anyone Touching Your Data with Flower
04-27, 10:15–11:00 (America/Los_Angeles), Hood

In the world of machine learning, more data and diverse data sets usually leads to better training, particularly with human centered products such as self-driving cars, IOT devices and medical applications. However, privacy and ethical concerns can make it difficult to effectively leverage many different datasets, particularly in medical and legal services. How can a data scientist or machine learning engineer leverage multiple data sources to train a model without centralizing the data in one place? How can one benefit from multiple datasets without the hassle of breaching data privacy and security?

In this talk, we’ll build a server using a Python library called Flower that listens in on different data sources and trains machine learning models while maintaining privacy among datasets. We’ll explore the advantages of federated learning versus classical machine learning and explain the importance that an easy-to-use federated learning library has in a world that increasingly relies on personal data to power our products.

After that, we’ll jump into live coding and demonstrate how, with minimal code, a data scientist can orchestrate a training job using multiple data sources. We’ll walk through different parameters that give data scientists the power to control and fine tune the server without the hassle of knowing infrastructure or cloud architecture.

By the end of this talk, you’ll be able to:
* Tell the difference between federated learning and classical machine learning
* Know how to design your project so that it can leverage multiple data sources without centralizing the data or finagling with infrastructure
* Build and fine tune your server to meet your application’s needs
* Scale your data pipeline with federated learning so that you can continuously train your model as you get access o more data sources
* Understand the importance that federated learning has when it comes to protecting personal data and maintaining privacy and access rights

Prior Knowledge Expected

No previous knowledge expected

Krishi Sharma is a software developer at KUNGFU.AI where she builds software applications that power machine learning models and deliver data for a broad range of services. As a former data scientist and machine learning engineer, she is passionate about building tools that ease the infrastructure dependencies and reduce potential technical debt around handling data. She helped build and maintains an internal Python tool, Potluck, which allows machine learning engineers the ability to bootstrap a containerized, production ready application with data pipelining templates so that her team can focus on the data and metrics without squandering too much time finagling with deployment and software.

This speaker also appears in: