Kevin Kho PyData Seattle 2023

Kevin Kho
.ical

Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, an workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years.

Sessions

04-26

11:00

90min

Fugue: Porting Existing Python and Pandas Code to Spark, Dask, and Ray

Kevin Kho, Anthony Holten

When Pandas starts to become a bottleneck for data workloads, data practitioners seek out distributed computing frameworks such as Spark, Dask, and Ray. The problem is porting over existing code would take a lot of rewrites. Though drop-in replacements exist where you can just change the import statement, the resulting code is still attached to the Pandas interface, which is not a good grammar for a lot of distributed computing problems. In this tutorial, we will go over some scenarios where the Pandas interface can't scale, and we'll show how to port the existing code to distributed backend with minimal rewrites.

Kodiak Theatre

Kevin Kho .ical

Sessions

Kevin Kho
.ical