04-26, 13:30–15:00 (America/Los_Angeles), Kodiak Theatre
We will build an end-to-end ML system to predict air quality that includes a feature pipeline to scrape new data and provide historical data (air quality observations and weather forecasts), a training pipeline to produce a model using the air quality observations and features, and a batch inference pipeline that updates a UI for Seattle. The system will be hosted on free serverless services - Modal, Hugging Face Spaces, and Hopsworks. It will be a continually improving ML system that keeps collecting more data, making better predictions, and provides a hindcast with insights into its historical performance.
This tutorial will produce three different Python programs that, when plugged together make a production ML system. First, we will understand the data sources: public, crowd-sourced air quality measurements that can be retrieved with an API key or scraped from a web page, and weather predictions/observations that can retrieved with free API services. The prediction problem is to predict air quality at the location of existing air quality sensors, using weather forecast data as the primary features for predicting air quality. We will show you how to write a Python program as a feature pipeline that can both scrape new data and provide historical data (air quality observations and weather forecasts). We will show you how to schedule this feature pipeline to run daily using Modal (you could also use Github Actions or any one of the many free Python orchestration services available today).
Our feature pipeline will store our features in a free serverless feature store (Hopsworks) and then we will write a training pipeline that reads features and air quality observations (labels) to train a model to predict air quality given a weather forecast (a set of weather features).
Finally, we will develop a UI using Hugging Face Spaces that will include a batch inference program to retrieve the latest weather forecast features and the model and to predict weather quality.
We will show you how to log predictions, so that you can build a continually improving ML system that provides hindcasts with insights into its historical performance.
For this tutorial, you will need experience with programming in Python, a laptop and Internet access.
No previous knowledge expected
Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is a developers of the open-source Hopsworks platform, a horizontally scalable data platform for machine learning that includes the industry’s first Feature Store.