PyData Seattle 2023

Aditya Lahiri

I am currently finishing up my Masters degree at UC San Diego. I have worked with a great bunch of collaborators from UCSD, Stanford, IBM Research and Purdue University! Prior to this, I worked at American Express, AI Labs for two years as a Research Engineer. I completed my undergraduate studies in Computer Science from BITS Pilani in the beautiful state of Goa.

  • Explaining Explainable AI tools : Issues, Pitfalls and Cautionary tails
Akshay Bahadur
  • Graduate Student at Carnegie Mellon University.

  • Applied research engineer with 4 years of work experience in Machine Learning/ Big Data and a proven track record of developing large-scale data systems, including implementation of Machine Learning at Scale solutions in the E-Commerce & CyberSecurity industries.

  • Developed an Indian Sign Language and Recognition System (ISLAR) for spreading awareness and helping the deaf and mute community in India. This effort got featured in multiple publications/blogs/newsletters including getting covered by Google in a youtube video.

  • Delivered 50+ keynotes/sessions/demonstrations covering various topics on Machine Learning.

  • Areas of Expertise: Information Retrieval, Product Ranking, Real-time Data platforms, Computer Vision, Natural Language Understanding, Big Data Systems.

  • Indian Sign Language Recognition(ISLAR)
Alan Descoins

Alan Descoins is an AI and technology leader with over thirteen years of professional services experience working for companies worldwide, mainly focused in the US and Silicon Valley.

Establishing technological strategies for a variety of clients, Alan is an expert at leading teams that solve business problems by applying machine learning techniques. Encouraging and working closely with his teams in Tryolabs, he is constantly in contact with high-profile leaders in different industries, helping them first discover opportunities for the use of AI that can positively impact their business and then effectively execute on those. The outcomes of his collaborations often result in significant cost savings through automation or an increase in revenue attributed to more intelligent decisions powered by data.

Alan has hands-on experience in building machine learning and deep learning models in the areas of natural language processing, computer vision, and predictive analytics. He has worked on a wide range of problems such as process automation, product recommendations, price optimization, predictive maintenance, and video analytics.

Additionally, Alan has been a speaker in multiple talks, keynotes, and workshops on AI-related topics across the world, and holds a BSc in Computer Science from the Universidad de la República (Uruguay).

  • Panel: The living nature of data: exploring the Lifecycle and Management of Data at Scale
Alexander Shirkov

Alexander Shirkov is a Senior SDE at Amazon AI. He is the co-author and maintainer of the open-source AutoML framework AutoGluon. Starting as a personal competition ML toolkit in 2018, Alexander continually expanded the capabilities of AutoGluon and joined Amazon AI in 2020 to work full time on advancing the state-of-the-art in AutoML.

  • Leveraging Text, Images, and the Kitchen Sink to solve complex ML problems in a few lines of code with AutoGluon
Andreas C Mueller

Andreas Mueller is a Principal Research SDE at Microsoft, where he works on the interface of the Data Science ecosystem and cloud infrastructure as a member of the Gray System Lab. He previously held positions as Associate Research Scientist at the Columbia Data Science Institute and as a Research Engineer at the NYU Center for Data Science. He is one of the core developers of the scikit-learn machine learning library, a member of the scikit-learn technical committee, and the author of the book "Introduction to machine learning with Python".

  • Automated Machine Learning & Tuning with FLAML
Anindya Saha

Anindya Saha is a Staff Machine Learning Platform Engineer @Lyft, focusing on distributed computing solutions for machine learning and data engineering. He led and implemented the Spark on Kubernetes support on ml platform for feature engineering at scale with ephemeral Spark clusters on k8s. He is currently working on enabling scalable distributed model training on the ML platform.

  • Being well informed: Building a ML Model Observability pipeline
Anthony Holten

Anthony Holten is a Senior Software Engineer at Interos, Inc. building supply chain software that calculates and tracks risk profiles for hundreds of millions of companies worldwide. Previously, as a Data Engineer at Deloitte, Anthony empowered government clients’ internal policy analysis through natural language processing. He is a published photographer whose formal education is in International Relations by way of Washington, DC and Beijing, China.

  • Fugue: Porting Existing Python and Pandas Code to Spark, Dask, and Ray
Bernease Herman

Bernease Herman is a data scientist at WhyLabs and a research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Her academic research focuses on evaluation metrics and interpretable ML with specialty on synthetic data and societal implications. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

  • Monitoring in the era of Generative AI, LLVMs, and embeddings – why truly scalable approaches matter
Carl Kadie

Ph.D. in CS and Machine Learning. Retired Microsoft & Microsoft Research. Volunteer, open-source projects related to ML, Genomics, Python, and Rust.

  • A Perfect, Infinite-Precision, Game Physics in Python
  • Nine Rules for Writing Python Extensions in Rust
Chengyin Eng

Chengyin Eng is a Senior Data Scientist on the Machine Learning Practice team at Databricks. She is experienced in developing end-to-end scalable machine learning solutions for cross-functional clients and works with product/engineering tam to define MLOps best practices. She also teaches ML in production and deep learning courses. She spoke at Open Data Science Conference, Data and AI Summit, Women in Data Science, etc. Outside of work, she enjoys connecting with friends, watching crime mystery films, and trying out new food recipes.

  • Scaling data workloads using the best of both worlds: pandas and Spark
Chi Wang

Chi Wang is a principal researcher in Microsoft Research at Redmond. He has worked on automated machine learning, machine learning for systems, scalable solutions for data science and data analytics, and knowledge mining from text data and graph data (with a SIGKDD Data Science/Data Mining PhD Dissertation Award). Chi is the creator of FLAML, a fast open-source library for AutoML & tuning used widely inside and outside Microsoft.

  • Automated Machine Learning & Tuning with FLAML
Darren Vengroff
  • Introduction to Working with U.S. Census Data in Python
David Aronchick

David is CEO of Expanso and co-director of Bacalhau, the distributed computing framework that is changing the way people interact with data and machine learning models.

Previously, he led Open Source Machine Learning Strategy at Azure, product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and co-founded the Kubeflow project and the SAME project. He has also worked at Amazon, Chef and co-founded three startups.

When not spending too much time in service of electrons, he can be found on a mountain (on skis), traveling the world (via restaurants) or participating in kid activities, of which there are a lot more than he remembers than when he was that age.

  • Panel: The living nature of data: exploring the Lifecycle and Management of Data at Scale
David Qiu

SDE II @ AWS. Formerly studied physical chemistry at UIUC.

  • Jupyter AI — Bringing Generative AI to Jupyter
Eduardo Apolinario

Eduardo is an engineering manager at Union.ai where he leads the Open-source team. He's also one of the maintainers of Flyte.

He has more than 10 years of experience working on distributed systems, infrastructure, machine learning, up and down the stack.

  • Flyte: Robust and End-to-End Cloud Native Machine Learning & Data Processing Platform
Eloisa Elias Tran

Eloisa is a Data Scientist and Tech Community Organizer of PyData by NumFOCUS, PyLadies and Women Techmakers Seattle. Founder of Women in Data Science conferences in the Seattle area. As an active member in the tech community, Eloisa collaborates with nonprofit tech organizations and enterprises to promote diversity and inclusion programs to support women in the field.

Six Sigma certified, with 8+ years of practical experience applying statistical analysis and models for improving KPIs at Fortune 500 companies. Eloisa has an expertise in enterprise customer negotiation, involving multi-million dollar projects for the manufacturing industry, her portfolio includes clients such as Fiat, Chrysler and Volkswagen. @eloeliasds | linkedin.com/in/eloeliasds

  • The Continuous Improvement Journey: How Data Science Complements the Six Sigma Methodology in Manufacturing
  • Diversity Panel: Allyship is a journey, not a destination
Eugene Ciurana

Eugene Ciurana is the CTO of Triple (https://www.tripleup.com/), the leading provider of next-gen CLO technology to the largest US and European banks, and to some of the largest content and ad networks in the US. Prior to Triple, Eugene was the senior director of knowledge discovery and representation at Meltwater US1, managing science and engineering teams in San Francisco, London, Stockholm, and Budapest; he was CEO and founder of Cosmify, Inc. “one of the last pure AI company acquisitions in Silicon Valley” in 2017 and known as “Palantir in a box.” Before that he was the CTO of Summly, the most successful automated text summarization company in Silicon Valley history, Sr VP of technology at Badoo/Bumble, director of systems integration at LeapFrog Enterprises, and chief architect at Walmart.com Global. Eugene can be reached on the Libera and OFTC IRC networks (#vim, #python, #java, #awk, #wikimedia, #tor) under the /nick pr3d4t0r. Twitter: @ciurana

  • You Want to Buy This - Particle Swarm Classification for Next-Gen Recommendation Engines
Fabiana Clemente

Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations.

Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies.

Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.

  • The Importance of Synthetic Data in Data-Centric AI
  • Panel: The living nature of data: exploring the Lifecycle and Management of Data at Scale
Federico Garza Ramirez

Fede is CTO and co-founder of Nixtla. They has [sic] over five years of experience deploying machine learning models in production, and has worked for large financial institutions in Mexico. An economist and mathematician by training, they passion lies at the intersection of building usable, scalable and open source machine learning products. Speaker at different Pycons.

  • Quantifying Uncertainty in Time Series Forecasting with Conformal Prediction
Florian Jacta

Florian JACTA - Customer Success Manager Taipy

  • Specialist of Taipy, a low-code open source Python package enabling any Python developers to easily develop a production ready AI application. Package pre-sales and after-sales function.
  • Data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS.
  • Developed several Predictive Models as part of strategic AI projets.
  • Master in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.

[email protected]

(5) Florian Jacta | LinkedIn

+33 6 51788731

  • How to build stunning Data Science Web applications in Python with Taipy
Franz Kiraly

Franz Kiraly is a research scientist with interest in open source toolbox design, model quality assurance, and time series related modelling tasks. He is also the founder and core developer of the sktime package, and a core developer of the skbase package.

  • skbase - a workbench for creating scikit-learn like parametric objects and libraries
Gil Forsyth
  • Ibis: Because SQL is everywhere but you don't want to use it
Hamel Husain

Hamel is an entreprenuer-in-residence at fast.ai, where he is building new software development tools like nbdev. Prior to fast.ai, Hamel was a machine learning engineer at companies like Airbnb, GitHub, and DataRobot, and other related roles in management consulting. You can find more about Hamel on his personal site.

  • Panel: “Building a Stronger Open Source Python Data Community: Trends, Gaps, and Collaborative Contributions”
Han Wang

Han Wang is the tech lead of Lyft Machine Learning Platform, focusing on distributed computing solutions. Before joining Lyft, he worked at Microsoft, Hudson River Trading, Amazon and Quantlab. Han is the creator of the Fugue project, aiming at democratizing distributed computing and machine learning.

  • How to incrementally scale existing workflows on Spark, Dask or Ray?
Holden Karau

Holden is a transgender Canadian open source developer with a focus on Apache Spark, Airflow, Kubeflow, Ray, Dask and related “big data“ tools. She is the co-author of Learning Spark, High Performance Spark, Scaling Python with Ray, and Kubeflow for Machine Learning. She is a committer and PMC on Apache Spark. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. She has worked at Amazon, Apple, and Google and is now working at Netflix.

  • Keynote: Distributed Computing 4 Kids -- with Spark (and guest appearances from Ray and Dask)
Hyukjin Kwon

Hyukjin is a techlead in PySpark team as a Staff Software Engineer in Databricks, and Apache Spark PMC member and committer, working on many different areas in Apache Spark such as PySpark, Spark SQL, SparkR, etc.

He is the number one top contributor in Apache Spark, one of the top contributors in pandas API on Spark (Koalas), and the maintainer of multiple open source projects such as Py4J. He mainly focuses on development, helping discussions, and reviewing many features and changes in Apache Spark.

  • Scaling data workloads using the best of both worlds: pandas and Spark
Ilya Katsov

Ilya Katsov is a VP of Technology at Grid Dynamics, a global consulting company that specializes in emerging technology. Ilya works on innovative data science and AI solutions for large enterprises. Prior to joining Grid Dynamics, Ilya worked at Intel Research on wireless communication technologies. He is the author of “Introduction to Algorithmic Marketing: Artificial Intelligence for Marketing Operations” (2017) and “The Theory and Practice of Enterprise AI” (2022).

  • Scaling MLOps to support dozens of analytics teams
J.J. Allaire

J.J. Allaire is the founder of RStudio (now Posit) and the creator of the RStudio IDE. J.J. is an author of several packages in the R Markdown publishing ecosystem including rmarkdown, flexdashboard, learnr, and distill, and also worked extensively on the R interfaces to Python and TensorFlow. J.J. is now leading the Quarto project, which is a new Jupyter-based scientific and technical publishing system.

  • Publishing Jupyter Notebooks with Quarto
JIm Dowling

Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is a developers of the open-source Hopsworks platform, a horizontally scalable data platform for machine learning that includes the industry’s first Feature Store.

  • Build a production ML system with only Python on free serverless services
Jia Yu

Jia Yu is a co-founder of Wherobots Inc., a venture-backed company for helping businesses to drive insights from spatiotemporal data, and leads the engineering team at Wherobots. He is currently on leave of absence from his role of Tenure-Track Assistant Professor of Computer Science at Washington State University. He obtained his Ph.D. in Computer Science from Arizona State University in Summer 2020 under the guidance of Mohamed Sarwat. His research focuses on large-scale database systems and geospatial data management. In particular, he worked on distributed geospatial data management systems, database indexing, and geospatial data visualization. Jia’s research outcomes have appeared in the most prestigious database / GIS conferences and journals, including SIGMOD, VLDB, ICDE, SIGSPATIAL and VLDB Journal. He is the main contributor of several open-sourced research projects such as Apache Sedona (incubating), a cluster computing framework for processing big spatial data, which receives 800,000 downloads per month and has users / contributors from major companies.

  • Geo-Unleashed: How Apache Sedona is Revolutionizing Geospatial Data Analysis
Jim Hibbard

Jim Hibbard is a Developer Advocate at Databricks. Prior to that, he worked at Seattle Children’s Hospital where he developed frameworks and methods for integrating medical records with multi-omics datasets to improve care. He is currently working on improving machine learning infrastructure and model management as part of the extended MLflow team.

  • Building Reliable, Open Lakehouses with Delta Lake
Joe Cheng

Joe Cheng is the CTO and first employee at Posit, PBC (formerly known as RStudio), where he helped create the RStudio IDE and Shiny web framework, along with countless complementary tools and packages.

  • Shiny: Data-centric web applications in Python
Jon Mease

I care deeply about the future of the open data science technology ecosystem, I’m a contributor to a variety of open source visualization and data science projects, I'm the creator of VegaFusion, and I'm a visualization engineer at Hex Technologies.

I hold a Master’s Degree in Computer Science from Johns Hopkins University, and Bachelor's degrees in Mathematics and Physics from Millersville University.

  • Scaling Altair visualizations with VegaFusion
Jonathan Bechtel

Jonathan is a data scientist at DSML Research where he oversees internal data science operations and outreach efforts by delivering seminars and public talks on the latest topics in Data Science and Machine Learning. In the past he's worked as a consultant for General Assembly, the NYPD, Amber Capital and Advent International to help them productionize their data and develop their internal analytics capabilities. He's the author of the KerasBeats deep learning package and has helped contribute to sktime and tensorflowjs. He has an MS in Analytics from Georgia Tech and resides in the NYC area. His particular passion is time series modeling since he believes it's the most practical way to align innovations in data science with business interests.

  • skbase - a workbench for creating scikit-learn like parametric objects and libraries
Juanita Gomez

Juanita Gomez is passionate programmer, mathematician, and open source advocate, former developer of Spyder IDE at Quansight. She has a BS in Pure Mathematics from Pontificia Universidad Javeriana in Colombia and is currently pursuing a PhD in Computer Science at UC Santa Cruz. She is a community manager for the Scientific Python project, a community effort to better coordinate and support scientific Python libraries.

  • Panel: “Building a Stronger Open Source Python Data Community: Trends, Gaps, and Collaborative Contributions”
Jules S. Damji

Jules S. Damji is a lead developer advocate at Anyscale Inc, an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

  • Introduction to Ray for distributed and machine learning applications in Python
Jun Liu

Jun Liu is the science tech lead of Lyft Rider App, focusing on developing large-scale machine learning solutions for recommendations and purchasing. Prior to joining Lyft, Jun received her Ph.D. in Applied Mathematics from Michigan State University.

  • How to incrementally scale existing workflows on Spark, Dask or Ray?
Kamil Kaczmarek

Kamil is a technical training lead at Anyscale, where he builds technical training and educational resources for the broader Ray and AI community. Prior to joining Anyscale he co-founded Neptune.ai and worked with AI models and MLOps processes in the AI consultancy company. Kamil holds M.Sc. in Cognitive Science and B.Sc. in Computer Science.

  • Emerging Open Source Tech Stack for Large Language Models (LLMs) with Ray AI Runtime
Katrina Riehl

Dr. Katrina Riehl is President of the Board of Directors at NumFOCUS, Head of the Streamlit Data Team at Snowflake, and Adjunct Lecturer at Georgetown University. For almost two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data mining, and visualization. Most notably, she has helped lead data science efforts at the University of Texas Austin Applied Research Laboratory, Apple, HomeAway (now, Vrbo), and Cloudflare.

  • Keynote: Scientific Computing and the Gateway to Open Source
Katrina Riehl

Dr. Katrina Riehl is President of the Board of Directors at NumFOCUS, Head of the Streamlit Data Team at Snowflake, and Adjunct Lecturer at Georgetown University. For almost two decades, Katrina has worked extensively in the fields of scientific computing, machine learning, data mining, and visualization. Most notably, she has helped lead data science efforts at the University of Texas Austin Applied Research Laboratory, Apple, HomeAway (now, Vrbo), and Cloudflare.

  • Panel: “Building a Stronger Open Source Python Data Community: Trends, Gaps, and Collaborative Contributions”
Ketan Umare

Ketan Umare is CEO and co-founder of Union.ai -- a pioneering technology company that empowers organizations to achieve reliable, reproducible, and cost-effective machine learning and data orchestration through Union Cloud, a managed version of the powerful Flyte platform. As the leading contributor to Flyte, Union.ai was founded by the engineers who created this groundbreaking, Kubernetes-native workflow automation platform (Ketan was the lead for this team). Trusted by industry giants like Lyft, Spotify, GoJek, LinkedIn, Toyota, Intel, Wolt, Freenome etc Flyte streamlines the data science and machine learning journey from ideation to production.
Prior to Union, Ketan played many senior roles at multiple organizations including Amazon, Oracle and Lyft. In his spare time he enjoys spending time with his 2 daugthers and exploring the outdoors.

  • The Python Data Ecosystem: Navigating a fragmented landscape.
Kevin Kho

Kevin Kho is a maintainer for the Fugue project, an abstraction layer for distributed computing. Previously, he was an Open Source Community Engineer at Prefect, an workflow orchestration management system. Before working on data tooling, he was a data scientist for 4 years.

  • Fugue: Porting Existing Python and Pandas Code to Spark, Dask, and Ray
Koushik Krishnan

Hi! I'm Koushik, I am a site reliability engineer and I love Python, databases and boring on-call rotations. Outside of slurping up anything and everything related to Python, I like to go disc golfing and go hiking around the Pacific Northwest. I also love meeting people at Python conferences so don't be a stranger!

  • Notebooks as Serverless Functions
Krishi Sharma

Krishi Sharma is a software developer at KUNGFU.AI where she builds software applications that power machine learning models and deliver data for a broad range of services. As a former data scientist and machine learning engineer, she is passionate about building tools that ease the infrastructure dependencies and reduce potential technical debt around handling data. She helped build and maintains an internal Python tool, Potluck, which allows machine learning engineers the ability to bootstrap a containerized, production ready application with data pipelining templates so that her team can focus on the data and metrics without squandering too much time finagling with deployment and software.

  • Trust Fall: Hidden Gems in MLFlow that Improve Experiment Reproducibility
  • Plant a Touch-Me-Not: Train Models Without Anyone Touching Your Data with Flower
Leland McInnes

Leland McInnes is a senior researcher at the Tutte Intitute for Mathematics and Computing. He works to develop open source tools for data science including UMAP for dimension reduction, HDBSCAN for clustering, and PyNNDescent for approximate nearest neighbor search.

  • Data Mapping for Data Exploration
Leo Anthias

Leo is the co-founder and CEO of Datapane, an open-source framework for creating data apps using Python and Jupyter.

  • Replacing Proprietary SaaS with Open-Source: Building a Marketing Analytics Web App with Python
Li Jiang

Li Jiang is a senior software engineer at Microsoft China, where he works on data science and AI/ML. He has experience in developing and deploying automated machine learning, distributed deep learning/machine learning, industry AI solutions, and recommendation systems. He holds two PhD degrees from Beijing Normal University and University Toulouse III, where he conducted research on swarm intelligence.

  • Automated Machine Learning & Tuning with FLAML
Lucas Durand

Lucas Durand (he/him/his) is the Head of Data Science Engineering at TD Securities and the Product Owner for TDS Notebooks, the TD Securities "Data Platform as a Service". Lucas has been with TD for upwards of 7 years as a Quant, Software Engineer, and Data Scientist.

Lucas holds a Master of Science in Theoretical Physics from York University as well as an Honours Bachelor of Science from the University of Toronto. He is a passionate teacher, avid musician, and big advocate for Python as a first-class language in banking.

  • Building an Interactive Network Graph to Understand Communities
Madison Swain-Bowden

Madison is a Senior Data Engineer out of Seattle and an avid Python user. She currently works at Automattic on the Openverse team, and has worked at Ookla (Speedtest.net), the Allen Institute for Cell Science, and the Broad Institute. In her spare time she can be found baking, building digital tools to help those battling oppression, contributing to open source, petting her cats, reading queer fiction, or playing video games.

  • Managing a search engine for over 600 million openly licensed media records
Mary Grace Moesta

Mary Grace Moesta is a senior data science consultant at Databricks. She's been working in the big data and data science space for several years with opportunities to collaborate across several verticals, with the majority of her work focused in the Retail and CPG space. Prior to Databricks, Mary Grace has been able to contribute to several machine learning applications namely - personalization use cases, forecasting, recommendation engines, and customer experience measures.

  • MLOps Deployment Patterns with Delta Lake and MLflow
Max Mergenthaler

Max is the CEO and Co-Founder of Nixtla, an open-source time-series research and deployment startup. He is also a seasoned entrepreneur with a proven track record as the founder of multiple technology startups. With a decade of experience in the ML industry, he has extensive expertise in building and leading international data teams. Max has also made notable contributions to the Data Science field through his co-authorship of papers on forecasting algorithms and decision theory. In addition, he is a co-maintainer of several open-source libraries in the Python ecosystem. He has been a speaker at major data conferences in different countries. Max's passion lies at the intersection of business and technology.

  • Quantifying Uncertainty in Time Series Forecasting with Conformal Prediction
Michael Byington

Michael has been a data scientist with a focus on machine learning at INGU Solutions since January 2021. He has a PhD in Chemical Engineering from the University of Houston. His thesis work focused on protein crystal nucleation precursors and image processing techniques, and it is this expertise that he now applies at INGU.

  • U-Net-style neural networks for feature identification in 1D time-series: applications in pipeline inspection, medicine, and more
Misha Desai

Product Manager for Azure Synapse Analytics working on building data science & machine learning capabilities on Apache Spark

  • Automated Machine Learning & Tuning with FLAML
Nate Stemen

Nate is a quantum software developer working to make quantum computing accessible to more people. Previously he's worked as a web developer, and has a masters degree in quantum computing from the University of Waterloo.

  • Growing the open source quantum ecosystem
Naty Clementi

Naty is an Open Source Software Engineer at Coiled, and a Dask maintainer. She frequently presents Dask tutorials online, as well as in local meetups such as Women Who Code and PyLadies. In her free time, she likes to play ultimate frisbee, go fly fishing, and play video games.

  • Open Source meets Enterprise: The right way.
Nick Erickson

Nick Erickson is a Senior Applied Scientist at Amazon AI. He obtained his master's degree in Computer Science and Engineering from the University of Minnesota Twin Cities. He is the co-author and lead developer of the open-source AutoML framework AutoGluon. Starting as a personal competition ML toolkit in 2018, Nick continually expanded the capabilities of AutoGluon and joined Amazon AI in 2019 to open-source the project and work full time on advancing the state-of-the-art in AutoML.

  • Leveraging Text, Images, and the Kitchen Sink to solve complex ML problems in a few lines of code with AutoGluon
Nidhin Pattaniyil

Machine Learning Engineer at Walmart Search

  • Building a Search Engine
Pablo Alfaro

Pablo is a renowned Machine Learning Engineer with over 15 years of experience in the energy, meteorology, and retail industries. He currently leads technical matters in forecasting and pricing initiatives at Tryolabs, skillfully driving cloud-based ML solutions for a multi-million dollar e-commerce business.
In his dual role at Tryolabs, Pablo provides expert client-facing consultancy services and leads the pricing-squad in creating cutting-edge solutions such as the Market Simulation tool, which is crucial for developing pricing-core, a custom-made pricing solution offering unparalleled value to clients.
Throughout his career, Pablo has implemented Uruguay's National Meteorological Databank and its management software, designed and implemented calculations for meteorological products derived from raw observations, and contributed to the development of Uruguay's National Power System Optimization software, SimSEE. Pablo's deeply analytical mindset and dedication to producing actionable data have made a lasting impact across industries, driving data-driven decision-making processes.

  • Untangling the complexity of demand forecasting models: building a Market Simulator
Peter Vidos

Peter is the CEO & Co-Founder of Vizzu.

His primary focus is understanding how Vizzu's innovative approach to data visualization can be put to good use. Listening to people complaining about their current hurdles with building charts and presenting them is his main obsession, next to figuring out how to help data professionals utilize the power of animation in dataviz.

Peter has been involved with digital product development for over 15 years. Earlier products/projects he worked on cover mobile app testing, online analytics, data visualization, decision support, e-learning, educational administration & social. Still, building a selfie teleport just for fun is what he likes to boast about when asked about previous experiences.

  • Hands-on intro of ipyvizzu-story - a new, open-source charting tool to build, create and share animated data stories with Python in Jupyter
Peter Wang

Peter Wang is the CEO and co-founder of Anaconda, and helped found the PyData conferences and global community. Prior to starting Anaconda, Peter worked as a software engineer in scientific computing and visualization. He has extensive experience in software design and development across a broad range of areas, including 3D graphics, geophysics, large data simulation and visualization, financial risk modeling, and medical imaging. Peter holds a BA in Physics from Cornell University.

  • Keynote: Peter Wang
Phillip Cloud

I'm fascinated by a variety of problems related to computers. I've solved hard problems in a variety of software engineering domains including digital video, Rust, systems programming, computer vision, and analytics. I'm currently helping build the future of analytics at Voltron Data.

  • Ibis: Because SQL is everywhere but you don't want to use it
Pierre Brunelle

Pierre is a co-founder, CEO of Noteable. Pierre Brunelle led Amazon’s notebook initiatives both for internal use as well as for SageMaker. He also worked on many open source initiatives including a standard for Data Quality work and an open source collaboration between Amazon and UC Berkeley to advance AI and machine learning. Pierre helped launch the first Amazon online car leasing store in Europe. At Amazon Pierre also launched a Price Elasticity Service and pushed investments in Probabilistic Programming Frameworks. And Pierre represented Amazon on many occasions to teach Machine Learning or at conferences such as NeurIPS. Pierre also writes about Time in Organization Studies. Pierre holds an MS in Building Engineering from ESTP Paris and an MRes in Decision Sciences and Risk Management from Arts et Métiers ParisTech.

  • Combining IPython with Open Source Papermill, Origami, and Genai to enhance your Jupyter Notebook experience
Qingyun Wu

Qingyun Wu is an Assistant Professor in the College of Information Science and Technology at Penn State University. She obtained her Ph.D. in Computer Science from the University of Virginia.

  • Automated Machine Learning & Tuning with FLAML
Rajeev Prabhakar

Rajeev is a Senior software engineer at Lyft focused on building ML observability platform. Prior to Lyft, Rajeev has spent the last few years working on building ML platforms, enabling large scale distributed computing on k8s and building realtime ultra low latency systems.

  • Being well informed: Building a ML Model Observability pipeline
Shivay Lamba

Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development.

He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and has also been a MLH Fellow. He is actively involved in community work as well. He is a TensorflowJS SIG member, Mentor in OpenMined and CNCF Service Mesh Community, SODA Foundation and has given talks at various conferences like Github Satellite, Voice Global, Fossasia Tech Summit, TensorflowJS Show & Tell.

  • Building Machine Learning Microservices & MLOps using Union ML
Sophia Yang

Sophia Yang is a Senior Data Scientist and a Developer Advocate at Anaconda. She is passionate about the data science community and the Python open-source community. She is the author of multiple Python open-source libraries such as condastats, cranlogs, PyPowerUp, intake-stripe, and intake-salesforce. She serves on the Steering Committee and the Code of Conduct Committee of the Python open-source visualization system HoloViz. She also volunteers at NumFOCUS, PyData, and SciPy conferences. She holds an M.S. in Computer Science, an M.S. in Statistics, and a Ph.D. in Educational Psychology from The University of Texas at Austin.

  • Python Anytime, Anywhere with Anaconda Notebooks
Stefan Krawczyk

A hands-on leader and Silicon Valley veteran, Stefan has spent over 15 years thinking about data and machine learning systems, building product applications and infrastructure at places like Stanford, Honda Research, LinkedIn, Nextdoor, Idibon, and Stitch Fix. A regular conference speaker, Stefan has guest lectured at Stanford’s Machine Learning Systems Design course and is an author of a popular open source framework called Hamilton. Stefan is currently CEO of DAGWorks, an open source startup that is enabling Data Science teams to build and maintain each others' model pipelines without the coding nightmares.

  • Panel: “Building a Stronger Open Source Python Data Community: Trends, Gaps, and Collaborative Contributions”
Sumedh Datar

Sumedh is a Senior Machine Learning Engineer with more than 6 years of work experience in the field of Deep Learning, Machine Learning, and Software Engineering. He has a proven track record of single-handedly delivering end-to-end engineering solutions to real-world problems. He works at the intersection of engineering, research, and product and developed Deep Learning based products from scratch that’s been used by thousands of end customers. Currently, Sumedh works in R&D where he works on Applied Deep Learning with fewer data and has several granted patents and several more applied. Sumedh studied master’s in computer science focused on AI.

  • Deep Learning Model Interpretability for Computer Vision based Models
Thiagarajan Ramakrishnan

Thiagu supports Dell’s Infrastructure Customer Service Division and Global Datacenter Sales Data Science team. As a Sr. Machine Learning Engineer at Dell, Thiagu’s responsible for building the foundational infrastructure and evangelizing the best practices necessary to develop and deploy AI/ML models to Dell’s stakeholders. Prior to joining Dell, he was working with Teradata as a Sr. Software Engineer on their database and analytics team.

  • Enterprise-grade Full Stack ML Platform: why human-centricity matters?
Ties de Kok

I am an Assistant Professor at the University of Washington, Foster School of Business. I specialize in combining computer science with capital markets research. My expertise is in machine learning and natural language processing. For more information, see my website and Github page:

https://www.tiesdekok.com/
https://github.com/TiesdeKok

  • Going beyond ChatGPT: an introduction to prompt engineering and LLMs
Timothy Chan, PhD

Timothy Chan is an experienced data science professional, currently serving as the Data Science Lead at Statsig. This cutting-edge platform provides product observability and experimentation services to top companies such as Notion, RecRoom, Univision, and Ancestry. Before joining Statsig, Timothy spent almost 5 years as a Data Scientist at Facebook (now Meta), where he was involved in projects across Facebook App and Reality Labs. Before venturing into the tech industry, Timothy worked in biotech, researching treatments for diseases such as Alzheimer’s, Multiple Sclerosis, Lupus, and Cancer. He holds a PhD in Chemistry and an MBA in Entrepreneurship.

  • Experimentation and the gold standard of data champions
Tom Drabas

Tom is a Field Engineer/Solutions Architect with Voltron Data. He has almost 20 years of experience working with data across multiple industries ranging from airlines, thru finance and banking, to high tech. Tom holds a PhD degree in Operations Research from the UNSW. He has extensive experience presenting at international conferences (KDD, PyData Seattle, GTC). Tom is an author of 3 books and a video series on data analytics and data engines, and has authored multiple blog posts and webinars on GPU applications for big data. He also received a patent for a solution that discovers patterns in extremely high-dimensional datasets while working at Microsoft. At Voltron Data, Tom works on building bespoke solutions for solving intricate problems for customers leveraging the capabilities of the Apache Arrow data ecosystem.

  • From prototype to deployment: Increase productivity and simplify data operations in Python
Tracy Teal

Tracy Teal is the Open Source Program Director at Posit. Previously, she was a co-founder of Data Carpentry and the Executive Director of The Carpentries. She developed open source bioinformatics software as an assistant professor at Michigan State University and holds a PhD in computation and neural systems from California Institute of Technology. Tracy is involved in the open source software and reproducible research communities, including serving on advisory committees for NumFOCUS, pyOpenSci, EarthLab and carbonplan, and has been working with open source communities, developing curriculum, and teaching people how to work with data and code as a developer, instructor and project leader throughout her career.

  • It's not just code: managing an open source project
Travis Oliphant

Dr. Oliphant has a Ph.D. in Biomedical Engineering from the Mayo Clinic, and M.S. and B.S. degrees in Electrical Engineering (and Math) from Brigham Young University. Travis has worked extensively with Python for numerical and scientific programming since 1997, and was the primary developer of the NumPy package and the author of the definitive Guide to NumPy. He is also the primary founding author of the SciPy package. During his academic career, he has worked in the fields of satellite remote sensing, Magnetic Resonance Imaging (MRI), Ultrasound, elastography, and general inverse problems. He was an Assistant Professor of Electrical and Computer Engineering at Brigham Young University from 2001 to 2007 where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. In addition, he directed the BYU Biomedical Imaging Lab, and performed research on scanning impedance imaging. He has done consulting work since 1997 in laser scattering off of semiconductors, sparse matrix calculations for search engines, and mesh transformations for fluid dynamics. Dr. Oliphant served as President of Enthought from 2007 until 2011, where he oversaw the establishment of additional satellite offices in New York City, Belgium, Cambridge UK, and Mumbai, India. During this time, he worked with Fortune 50 companies such as Shell, J.P. Morgan, and Proctor and Gamble in all aspects of the contractual relationship – from contracts to training to code architecture and code development. He also served on the Board of Directors for Enthought from 2008 until 2011, and during that time, formed strong connections with J.P. Morgan technical leadership staff and other industry leaders.

  • Keynote: Travis Oliphant
Trent Hauck

Trent Hauck has been using Python for going on 13 years for various data related endeavors -- in fact, he spoke at PyData 2015 about Latent Dirichlet Allocation before Transformers were around!

Now he works in biotech and develops software and consults as part of his company, WHERE TRUE Technologies (wheretrue.com).

  • Python in Bioinformatics
Vincent Gosselin
  • Since 2021, CEO & Co-Founder at Taipy, the fastest way to build a complete application in Python.
  • 2010-2021: Expert Data Scientist at IBM.
  • Headed the development of strategic AI applications with large ROI for large companies: TSMC, McDonald's, Dassault Aviation, Port Of Hong Kong, Carhartt, St MicroElectronics, etc.
  • Develop novel AI streaming models to optimize the production of Wafer fabs. These have become a standard for the semicon industry (Samsung, TSMC, ST Micro, etc.).

After having implemented AI models over the years for the manufacturing, logistics, and retail sectors, I spent most of the past decade mentoring young data scientists. My quest is now to make Python developers much more successful at developing & deploying applications for business users.

  • How to build stunning Data Science Web applications in Python with Taipy
Yee Tong

Hi. I'm Yee, a software engineer at Union AI and one of the co-creators of flyte.org, a popular orchestration platform geared towards ML and data-science. My interests are in infrastructure and distributed systems, particularly as they relate to the ever growing ML field.

  • The Python Data Ecosystem: Navigating a fragmented landscape.
Ying-Jung Chen

Ying-Jung is an environmental scientist turned data scientist / machine learning engineer. She received her Ph.D. in environmental science and management school at UC Santa Barbara. Also, she had been a post-doc at eScience institute at University of Washington. She started her data science/ machine learning career in the greater Seattle area. She is extremely interested in applying models with different domain business problems such as, finance, agriculture, and hydroclimate. She is also keen on learning an efficient approach to use public cloud computing services. Also, she had learned the state of art machine learning algorithms via reading papers, attending online/in-person meetups, and joining major machine learning conferences. Finally, she is a PyData impact scholar in 2021. She'd like to share her learning journey in data science and sustainability with people. In her free time, she likes to go hiking and play basketball / pickleball.

  • Let’s program to fight the impacts of climate change!
Yucheng Low

Yucheng Low is the co-founder & CEO of XetHub. He is no stranger to the PyData community, having last presented at PyData 2015 Denver. He was the co-founder and Chief Architect at GraphLab, where he built the SFrame - the 1st out-of-core dataframe for Python, scaling to trillion cell dataframes on a laptop. GraphLab, which was renamed to Dato and Turi, was acquired by Apple in 2016. At Apple, after open-sourcing Turi Create, Yucheng worked on many parts of the ML platform stack ranging from storage to inference. In 2021 he left Apple and together with a couple colleagues, started XetHub: a data / model management service combining the best parts of S3 & Git. He has a PhD in Machine Learning from CMU where he worked on distributed ML.

  • Panel: The living nature of data: exploring the Lifecycle and Management of Data at Scale
Zander Matheson

Zander is a seasoned data engineer who has founded and currently helms Bytewax. Boasting over ten years of experience in data infrastructure and data science at top-tier tech organizations like GitHub and Heroku, Zander possesses a deep understanding of the nuances of the Python ecosystem. A true trailblazer and visionary, Zander persistently propels progress in the field and simplifies how Python developers interact with streaming data.
Outside of his professional pursuits, Zander can be found exploring the natural beauty of Santa Cruz, whether he's catching waves in the ocean or hiking through the forest.

  • Panel: “Building a Stronger Open Source Python Data Community: Trends, Gaps, and Collaborative Contributions”
savin goyal

Savin is the co-founder and CTO of Outerbounds - where his team is building the modern ML stack to accelerate the impact of data science. Previously, he was at Netflix, where he built and open-sourced Metaflow, a full-stack framework for data science.

  • Enterprise-grade Full Stack ML Platform: why human-centricity matters?