PyData Seattle 2023

Data Mapping for Data Exploration
04-27, 11:45–12:30 (America/Los_Angeles), Rainier

As embeddings and and vector databases become ever more popular we need to develop new tools for exploratory data analysis. One such approach is interactive data maps -- using 2D map style representations of the data, combined with rich interactivity that can link back to the source data. We'll look at the open source tools available for building interactive data maps, and work through an example use case.


As embeddings and and vector databases become ever more popular we need to develop new tools for exploratory data analysis. One approach is to use dimension reduction tools like UMAP to create visualizable representations of the data. While this can be useful to result can be hard to quickly extract important information from. ThisNotThat (TNT) is an open source library, built on top of Panel and Bokeh, designed for displaying rich interactive data maps, including automatically generated map annotations, search tools, linked data viewers, and interactive data labelling tools, among others. In this talk we'll introduce the concept of data maps and why they are going to be increasingly useful, and then, through an example use case, look at how TNT can make it easy to build rich interactive experience to explore your data.


Prior Knowledge Expected

No previous knowledge expected

Leland McInnes is a senior researcher at the Tutte Intitute for Mathematics and Computing. He works to develop open source tools for data science including UMAP for dimension reduction, HDBSCAN for clustering, and PyNNDescent for approximate nearest neighbor search.