PyData Seattle 2023

Introduction to Working with U.S. Census Data in Python
04-26, 13:30–15:00 (America/Los_Angeles), Rainier

The United States Census Bureau
publishes over 1,300 data sets via its APIs. These are useful across a myriad of
fields including data journalism, allocation of public and private resources,
data activism, marketing and strategic planning across many sectors.
In this tutorial, which is targeted at
both beginners and those with some experience with census data, we will
demonstrate how open-source Python tools can be used to discover, download,
analyze, and generate maps of U.S. Census data.

This tutorial will consider the full breadth and richness of data available
from the U.S. Census. We will cover not only American Community
Survey (ACS) and similarly well-known data sets, but also a number of data
sets that are less well-know but nonetheless useful in a variety of research
contexts.

Through a series of hands-on demonstrations, attendees will learn to

  • discover data sets, some with a handful of variables and others
    with tens of thousands;
  • download demographic and economic indicators at levels ranging from the entire
    nation to individual neighborhoods;
  • plot the data we downloaded on maps;

All Python tooling used in the workshop is available as open-source
software. Final versions of the notebooks used in the tutorial will also
be made available via open-source.


The United States Census Bureau publishes over 1,300 data sets across many vintages via its
APIs. These are useful across a myriad of fields including data journalism,
allocation of public and private resources, data activism, marketing, and strategic
planning across many sectors. Some of these datasets, like those from the American Community Survey (ACS)
are widely used and readily available in Python via multiple open-source packages.
Others are more obscure, but extremely useful given the right tools to discover and
access them. An example is the Small Area Income and Poverty Estimates (SAIPE)
time series, which combines ACS and other data to get low variance estimates
of poverty rates in local areas like school districts.

The goal of this tutorial is to teach attendees how to use open-source
Python tools to discover, download, analyze, plot, and generate maps of U.S.
Census data, whether it comes from the ACS, the SAIPE, or any other data set
the Census Bureau publishes via APIs. We will give attendees ample opportunity
to discover and download data sets that specifically interest them and
pose questions and get feedback on the process.

We will demonstrate both tools specific to working with Census data, like
censusdis and tools for working with geographic data like
GeoPandas.

No census data experience is required, though we expect that even
those with some experience working with census data will learn something
about open-source tools that can help make them more productive. A basic
familiarity with Python, jupyter notebooks, and pandas is recommended.

The high-level outline of the topics we will cover is as follows:

Introduction to the U.S. Census data model

  1. A "Hello, World" example.
  2. Extended examples: data, geographies and maps
  3. Exploring metadata

Using metadata for discovery

  1. geographies
  2. data sets
  3. groups
  4. variables

Putting it all together in an end-to-end example

  1. Identifying concentrations of child poverty in Newark, NJ

Guided hands-on exercise.

  1. Discover, download, and map a dataset of interest
  2. Building on the Newark example

Discussion and Q&A

  1. Open opportunity for discussion, feedback, and Q&A

We hope that by the end of this tutorial attendees will be confident
in their ability to use Python tools to discover, download, analyze,
and visualize the full collection of data the U.S. Census Bureau
publishes.

All Python tooling used in the workshop is available as open-source
software. Final versions of the notebooks used in the tutorial will also
be made available via open-source.


Prior Knowledge Expected

No previous knowledge expected

See also: By the end of the tutorial, attendees will be able to download and process U.S. Census data to produce maps like this one. (333.5 KB)