PyData Seattle 2023

Leveraging Text, Images, and the Kitchen Sink to solve complex ML problems in a few lines of code with AutoGluon
04-26, 09:00–10:30 (America/Los_Angeles), St. Helens

AutoGluon is an open source AutoML framework, developed by AWS. It can train models on multimodal image-text-tabular data with a few lines of code, producing a powerful multi-layer stack ensemble of transformer image models, BERT language models, and a suite of tabular models all working in tandem. This tutorial will give an overview of AutoGluon followed by a deep dive into how (and why) it has proven to be so effective, and finish with code examples to demonstrate how you can revolutionize your ML workflow.


In this tutorial, we demonstrate fundamental techniques that enable modern AutoML systems. We will start with an overview of AutoGluon followed by a deep dive into how (and why) it has proven to be so effective, and finish with code examples to demonstrate how you can revolutionize your ML workflow. This talk is targeted towards software engineers and data scientists familiar with basic python and pandas; no prior machine learning knowledge is required. Each topic covered in the tutorial is accompanied by a hands-on Jupyter notebook that implements best practices (which will be available on GitHub before and after the tutorial).

The talk will cover the following topics:

Tabular AutoML:
- AutoML Basics: Discussion of core AutoML principles and historical background (including early AutoML toolkits such as AutoWeka and auto-sklearn)
- History of competition ML and how it influenced the design of modern AutoML systems
- Discussion of model combination strategies (stacking, bagging, model aggregation)
- Constraint satisfaction and engineering for a performance envelope (accuracy, speed, compute resources)
- Benchmark comparisons showcasing the advancement of AutoML systems in recent years both compared to earlier AutoML systems and human data scientists (4 AutoML frameworks, 104 OpenML datasets, 10 Kaggle datasets)

Multimodal AutoML:
- Foundational models for image and text
- Real-world multimodal problems
- Fusion techniques and multimodal distillation

Advanced Topics:
- Time series
- Exploratory data analysis


Prior Knowledge Expected

No previous knowledge expected

Alexander Shirkov is a Senior SDE at Amazon AI. He is the co-author and maintainer of the open-source AutoML framework AutoGluon. Starting as a personal competition ML toolkit in 2018, Alexander continually expanded the capabilities of AutoGluon and joined Amazon AI in 2020 to work full time on advancing the state-of-the-art in AutoML.

Nick Erickson is a Senior Applied Scientist at Amazon AI. He obtained his master's degree in Computer Science and Engineering from the University of Minnesota Twin Cities. He is the co-author and lead developer of the open-source AutoML framework AutoGluon. Starting as a personal competition ML toolkit in 2018, Nick continually expanded the capabilities of AutoGluon and joined Amazon AI in 2019 to open-source the project and work full time on advancing the state-of-the-art in AutoML.