Han Wang
Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the creator of the Fugue project, aiming at democratizing distributed computing and machine learning.
Sessions
Using Spark, Dask, or Ray is not an all-or-nothing thing. It may seem daunting for new practitioners expecting to translate existing Pandas pipelines to these big data frameworks. In reality, distributed computing can be incrementally adopted. There are many use cases where only one or two steps of a pipeline require expensive computation. This talk covers the strategies and best practices around moving portions of workloads to distributed computing through the open-source Fugue project. The Fugue API has a suite of standalone functions compatible with Pandas, Spark, Dask, and Ray. Collectively, these functions allow users to scale any part of their pipeline when ready for full-scale production workloads on big data.