PyData Seattle 2023

The Python Data Ecosystem: Navigating a fragmented landscape.
04-28, 11:45–12:30 (America/Los_Angeles), Kodiak Theatre

The Python data landscape is constantly evolving and has become increasingly fragmented, making it difficult for data teams to navigate and pick the right tools and evolve existing tools as needs evolve. With so many options available, how can teams optimize their decisions? And more importantly, how can they ensure that the tools they choose will prevent frequent tool changes down the road? This talk will serve as a guide for those who are overwhelmed by the current state of data tools.


In this talk, Ketan Umare — CEO and co-founder of Union.ai, and creator of Flyte and Yee Tong, founding engineer Union.ai and Flyte — will lay out the common pitfalls when making data tooling decisions in a fragmented world. They will explore the factors and design patterns that data teams should know to future-proof their projects by ensuring scalability, ease of use and interoperability.

Ketan & Yee will give practical tips on how to make sense of the various data formats available in the Python ecosystem, including data-frames, tensors and various file formats, as well as how to handle complex and evolving datasets. Some of the dilemmas that data teams often face are:
- Deciding between pandas, polars and modin for data manipulation and determining the best way to interact with specific data formats.
- Optimizing data access (streaming vs batch) to maximize training and GPU efficiency.
- Selecting the right data frameworks and tools for both, large and small data volumes.
- Making sure that data is valid and high quality and doesn’t degrade throughout workflow operations.
- Optimizing overall performance. Picking the right frameworks and ensuring interoperability between frameworks like Spark, Ray, Databricks and Flyte to leverage distributed computing resources.

Attendees will leave with a better understanding and practical approaches that have worked for many large companies who have encountered these problems.


Prior Knowledge Expected

No previous knowledge expected

Ketan Umare is CEO and co-founder of Union.ai -- a pioneering technology company that empowers organizations to achieve reliable, reproducible, and cost-effective machine learning and data orchestration through Union Cloud, a managed version of the powerful Flyte platform. As the leading contributor to Flyte, Union.ai was founded by the engineers who created this groundbreaking, Kubernetes-native workflow automation platform (Ketan was the lead for this team). Trusted by industry giants like Lyft, Spotify, GoJek, LinkedIn, Toyota, Intel, Wolt, Freenome etc Flyte streamlines the data science and machine learning journey from ideation to production.
Prior to Union, Ketan played many senior roles at multiple organizations including Amazon, Oracle and Lyft. In his spare time he enjoys spending time with his 2 daugthers and exploring the outdoors.

Hi. I'm Yee, a software engineer at Union AI and one of the co-creators of flyte.org, a popular orchestration platform geared towards ML and data-science. My interests are in infrastructure and distributed systems, particularly as they relate to the ever growing ML field.