PyData Seattle 2023

Geo-Unleashed: How Apache Sedona is Revolutionizing Geospatial Data Analysis
04-28, 15:00–15:45 (America/Los_Angeles), Kodiak Theatre

Apache Sedona is a cluster computing system designed to revolutionize the way we process large-scale spatial data. By extending the capabilities of existing systems such as Apache Spark, and Apache Flink, Sedona provides a comprehensive set of out-of-the-box distributed Spatial Datasets and Spatial SQL that enable efficient loading, processing, and analysis of massive amounts of spatial data across multiple machines. With its ability to handle big data at scale, Sedona has the potential to transform industries.

In this presentation, we will delve into the key features of Apache Sedona and showcase its powerful capabilities in handling large-scale spatial data. Additionally, we will highlight the recent developments in Apache Sedona and how they have further enhanced the system's performance and scalability. We will also showcase examples of how Sedona has been used in various industries such as transportation, logistics, and geolocation-based services, to gain insights and improve decision-making.


Apache Sedona is a cluster computing system designed to revolutionize the way we process large-scale spatial data. By extending the capabilities of existing systems such as Apache Spark and Apache Flink, Sedona provides a comprehensive set of out-of-the-box distributed Spatial Datasets and Spatial SQL that enable efficient loading, processing, and analysis of massive amounts of spatial data across multiple machines. With its ability to handle big data at scale, Sedona has the potential to transform industries such as transportation, logistics, and geolocation-based services. In 2020, Sedona joined the Apache Software Foundation, and in 2022, it was recognized as a top-level project, further solidifying its position as a leading technology in the big data space. Apache Sedona receives over 800K downloads per month and is listed among the top 1% most downloaded Python packages on PyPi.

In this presentation, we will delve into the key features of Apache Sedona and showcase its powerful capabilities in handling large-scale spatial data. Additionally, we will highlight the recent developments in Apache Sedona and how they have further enhanced the system's performance and scalability. We will also showcase examples of how Sedona has been used in various industries such as transportation, logistics, and geolocation-based services, to gain insights and improve decision-making.

Overall, this presentation will give you a comprehensive understanding of Apache Sedona and its capabilities, and how it can help you unlock the full potential of your spatial data.


Prior Knowledge Expected

No previous knowledge expected

Jia Yu is a co-founder of Wherobots Inc., a venture-backed company for helping businesses to drive insights from spatiotemporal data, and leads the engineering team at Wherobots. He is currently on leave of absence from his role of Tenure-Track Assistant Professor of Computer Science at Washington State University. He obtained his Ph.D. in Computer Science from Arizona State University in Summer 2020 under the guidance of Mohamed Sarwat. His research focuses on large-scale database systems and geospatial data management. In particular, he worked on distributed geospatial data management systems, database indexing, and geospatial data visualization. Jia’s research outcomes have appeared in the most prestigious database / GIS conferences and journals, including SIGMOD, VLDB, ICDE, SIGSPATIAL and VLDB Journal. He is the main contributor of several open-sourced research projects such as Apache Sedona (incubating), a cluster computing framework for processing big spatial data, which receives 800,000 downloads per month and has users / contributors from major companies.