Kevin Kho
Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, an workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years.
Sessions
When Pandas starts to become a bottleneck for data workloads, data practitioners seek out distributed computing frameworks such as Spark, Dask, and Ray. The problem is porting over existing code would take a lot of rewrites. Though drop-in replacements exist where you can just change the import statement, the resulting code is still attached to the Pandas interface, which is not a good grammar for a lot of distributed computing problems. In this tutorial, we will go over some scenarios where the Pandas interface can't scale, and we'll show how to port the existing code to distributed backend with minimal rewrites.