This is a post in my occasional humanities data curation series, which outlines humanities data curation activities using publicly-available data on grants and awards made by the National Endowment for the Humanities (NEH). A subset of the series focuses on mapping the data and creating geospatial data visualizations.

For reference, here are the other essays in the series:

Metadata: Origins and Metaphors (10 September 2024)
Encoding Reparative Description: Preliminary Thoughts (17 September 2023)
Find and replace data in the shell (11 September 2022)
Wrangling Humanities Data: An Interactive Map of NEH Awards (17 July 2022)
Wrangling Humanities Data: Using Regex to Clean a CSV (27 February 2021)
Wrangling Humanities Data: Exploratory Maps of NEH Awards by State (22 January 2021)
Wrangling Humanities Data: Cleaning and Transforming Data (19 January 2021)
Wrangling Humanities Data: Finding and Describing Data (20 December 2020)

This installment uses the geospatial dataset previously created and uses some of the visualization tools provided by the geopandas librarywhich is supported in a Python environment and Jupyter notebook. You can download a Jupyter Notebook version of this post from the GitHub repository along with all of the data discussed here. File references discussed below are included in the same neh-grant-data-project repository.

Mapping State by State

Now that I have a pretty good set of points, I wanted to visualize these against maps for the states. For example, is it possible to see all of the grants in a given state for a given time period? Could I look at different states? What about all of the continental states? For these kinds of questions, geopandas has a lot of built-in tools for filtering the clean data, as well as for outputting a few initial maps. Below, I walk through a process to develop state-by-state maps. This uses some of the filtering capacities of pandas in combination with the mapping visualization tools of geopandas. Let’s go!

Set up the environment

This activity will only use two python modules: geopandas and matplotlib. However, it also requires the clean dataset that I developed previously. Since that was saved as a geojson file, it is now reusable and serves as the basis for these examples. The data file is included in the repository as neh_1960s_grants.geojson. I am still exploring this data, so rather than looking at the entirety of the NEH data that was available, I continue working with the 1960s decade as before.

import geopandas as gpd

import matplotlib.pyplot as plt
%matplotlib inline

Start with the geojson

Having the previously created and saved geojson data, which is already cleaned and transformed to include valid POINT coordinates, now I can use the file to load in data rather than going through the cleaning and transformation process again.

gdf_neh_1960s = gpd.read_file('neh_1960s_grants.geojson', driver='GeoJSON')

gdf_neh_1960s.head()

	AppNumber	Institution	InstCity	InstState	InstPostalCode	InstCountry	CongressionalDistrict	Latitude	Longitude	YearAwarded	ProjectTitle	Program	Division	AwardOutright	ProjectDesc	ToSupport	Participants	Disciplines	geometry
0	FB-10007-68	Regents of the University of California, Berkeley	Berkeley	CA	94704-5940	USA	13	37.87029	-122.26813	1967	Title not available	Fellowships for Younger Scholars	Fellowships and Seminars	8387.0	No description	No to support statement	John Elliot [Project Director]	English	POINT (-122.26813 37.87029)
1	FB-10009-68	Pitzer College	Claremont	CA	91711-6101	USA	27	34.10373	-117.70701	1967	Title not available	Fellowships for Younger Scholars	Fellowships and Seminars	8387.0	No description	No to support statement	Steven Matthysse [Project Director]	History of Religion	POINT (-117.70701 34.10373)
2	FB-10015-68	University of California, Riverside	Riverside	CA	92521-0001	USA	41	33.97561	-117.33113	1967	Title not available	Fellowships for Younger Scholars	Fellowships and Seminars	8387.0	No description	No to support statement	John Staude [Project Director]	History, General	POINT (-117.33113 33.97561)
3	FB-10019-68	Northeastern University	Boston	MA	02115-5005	USA	7	42.33950	-71.09048	1967	Title not available	Fellowships for Younger Scholars	Fellowships and Seminars	8387.0	No description	No to support statement	Thomas Havens [Project Director]	History, General	POINT (-71.09048 42.33950)
4	FB-10023-68	University of Pennsylvania	Philadelphia	PA	19104-6205	USA	3	39.95298	-75.19276	1967	Title not available	Fellowships for Younger Scholars	Fellowships and Seminars	8387.0	No description	No to support statement	Gresham Riley [Project Director]	Psychology	POINT (-75.19276 39.95298)

gdf_neh_1960s.shape

(997, 19)

The data import worked as expected. The first four records appear correctly, and the .shape call shows 997 records, which is what it should be (the CSV had 998 rows).

Mapping State by State

Now I can use geopandas to filter the data to create different kinds of maps, and to export maps to files that can be reused in reports.

Define and use map shapes for US states

The project to clean and check data quality focused on information with single, clear geographic coordinates, aka POINTs. For more complex maps, I need to use some additional geospatial data types. Geospatial data and visualization most frequently uses 2D shapes, called Polygons or Multipolygons, which are used to represent areas on maps like states or countries. In this task to create state maps, I will need to pair the point data with the states’ shape data. Rather than going to data.gov, which is where the NEH data can be found, I used the useful US state shape data that Eric Celeste has made available, which makes it easy to get geojson and various levels of detail.

Fortunately, the US state shapes are well defined, and the Census Bureau and others make the information readily available. This section explains how to import the shapes for US states, how to display the shapes, and then how to display the points for each grant on the state shapes.

# test the states shapefile

us_states = gpd.read_file('gz_2010_us_040_00_5m.json')

us_states.head()

	GEO_ID	STATE	NAME	CENSUSAREA	geometry
0	0400000US01	01	Alabama	50645.326	MULTIPOLYGON (((-88.12466 30.28364, -88.08681 ...
1	0400000US02	02	Alaska	570640.950	MULTIPOLYGON (((-166.10574 53.98861, -166.0752...
2	0400000US04	04	Arizona	113594.084	POLYGON ((-112.53859 37.00067, -112.53454 37.0...
3	0400000US05	05	Arkansas	52035.477	POLYGON ((-94.04296 33.01922, -94.04304 33.079...
4	0400000US06	06	California	155779.220	MULTIPOLYGON (((-122.42144 37.86997, -122.4213...

type(us_states)

geopandas.geodataframe.GeoDataFrame

type(us_states['geometry'])

geopandas.geoseries.GeoSeries

I expected the data to be imported as a GeoDataFrame, which it is, and I confirmed that the geometry column is a series data. In the display of the data, also note that the geometry types are all “Polygon” or “Multipolygon”. (The latter is a separate datatype that is required when states have non-contiguous parts, like the Hawaiian islands.) Note that the NAME column has the full state name, which I will use to reference the shapes.

Next, I want to see how the shapes look using the .plot() method:

# nb: the first time I ran this, it required installation of the descartes module
us_states.plot(figsize=(20,20))

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af34b90>

png

The proportions look a bit strange with all of that whitespace on the right, but on a close inspection I can see at the far right that there are a few of the Aleutian Islands flowing across the dateline. See the small blue dots over to the right? Very small, but they are there! Other than that… pretty cool! This is what I wanted!

Now I’ll refine this a bit more to show the continental US… (with apologies, but I am leaving the display of Alaska, Hawaii, and Puerto Rico for a future project).

# for illustration, exclude Alaska & Hawaii & Puerto Rico
continental = us_states[us_states['NAME'].isin(['Alaska','Hawaii', 'Puerto Rico']) == False]

continental.plot(figsize=(30,20), color='#d1b26f')

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af62090>

png

That looks good! Note above that I used the geopandas function .isin() above to filter out any shapes that did not appear in the list using a boolean filter (“False”).

Similarly, I can “zoom in” and look at just a single state. Again using the .isin() function, but this time the opposite boolean value is set to “True”:

# display shape for a single state

hawaii = us_states[us_states['NAME'].isin(['Hawaii']) == True]

hawaii.plot(figsize=(30,20), color='#d1b26f')

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af89a10>

png

Yes, that looks like Hawaii!! And note that it’s a great example of a “multipolygon” state

Now, let’s plot the coordinates on these shapes…

Plot the points in the states

Now that I can draw the states, I want to plot the grant point data! This section begins using the matplotlib module for expanded visualization capabilities, such as the inclusion of a title and setting colors.

Map the grants in Minnesota

For example, to display the grants in one state, I can use filtering to pull grants given to the state of Minnesota, and likewise filter to display the Minnesota state shape. Note I have added the “figure” (figure) and “axis” (ax) components, which the visualization library is using to render different parts of the image. I have also set colors in the arguments provided to the .plot() functions.

# set the desired data
state = 'Minnesota'
Minnesota_1960s_grants = gdf_neh_1960s[gdf_neh_1960s['InstState'] == 'MN']

# plt to see the points on the map
fig, ax = plt.subplots(1, figsize=(30,20))
base = us_states[us_states['NAME'].isin([state]) == True].plot(ax=ax, color='blue')

# plot the positions
Minnesota_1960s_grants.plot(ax=base, color='yellow')

plt.show()

png

Woohoo! This is the first thing that really looks like a map! A clear, visually simple, graphic that combines the geographic shape information with the grant data from the NEH to show where the money went!

Map the grants in any US state

Now, let’s make this more dynamic: create options that allow for easier generation of maps for any state.

First, set map parameters to select a state (lines 2 and 3), filter the data based on the selections (line 6), draw out some basic information that can be used in the graphic (lines 8 through 32), then plot the selected data (line 34 and following). I’ve used pandas filters to draw out various information, like a starting year and ending year, to determine the number of grants that are shown in the image, to sum the dollar amounts of the awards, and to specify colors for the output; most of this information is bundled into the displayInfo dictionary. On line 42, I used python’s string format substitution methods with some text filtering (see here for a guide from RealPython), to provide some well-formatted information for the image title and legend.

# set the state info - to map other states, change the next two lines
stateAbbrev = 'MI' #use standardized 2-letter postal abbreviations
state = 'Michigan' #use full name of state written out with spaces and no diacritics

# filter the data
state_1960s_grants = gdf_neh_1960s[gdf_neh_1960s['InstState'] == stateAbbrev]

# get label info
#start year
startYear = state_1960s_grants['YearAwarded'].min()
#end year
endYear = state_1960s_grants['YearAwarded'].max()
#number of awards
numGrants = state_1960s_grants['AppNumber'].count()
#dollars awarded
totalOutright = state_1960s_grants['AwardOutright'].sum()
#map shape color
map_color = '#d1b26f'
#point color
point_color = '#2C4B85'

# create a bundle of info for display rendering
displayInfo = {
    'startYear' : startYear,
    'endYear' : endYear,
    'numGrants' : numGrants,
    'outrightDollars': totalOutright,
    'map_color' : map_color,
    'point_color' : point_color,
    'state' : state,
    'abbrev' : stateAbbrev
}

# plt to see the points on the map
fig, ax = plt.subplots(1, figsize=(30,20))
base = us_states[us_states['NAME'].isin([state]) == True].plot(ax=ax, color=map_color)

# plot the positions
state_1960s_grants.plot(ax=base, color=point_color, legend=True)

# add title and legends
plt.title('NEH Grants awarded in {0[state]}, {0[startYear]}-{0[endYear]} ({0[numGrants]} awards, ${0[outrightDollars]:,.2f})'.format(displayInfo), fontfamily=['Georgia','serif'], fontweight='bold', fontsize='x-large') # title uses some advanced string formatting to display the total $ amounts as a currency display
plt.legend(['Indicates one award (points may overlap)'])
lims = plt.axis('equal') # not sure what this is actually doing, although some states seem less 'squished'

plt.show()

png

The map looks a bit squashed top to bottom, but aside from that, this is a great start!

Map the grants in multiple states

Now, what if I want to map the grants from more than one state? One of my goals is to map the points of all the grants in the continental US. Using similar filtering functiosn to those I used above (.isin() and boolean filters), I can exclude the data that I don’t want. Note the specialized use of the ~ character here, in line 5, which reverses the filter, effectively displaying anything that is “False,” that is to say it will return all the values not in the exclude set.

(In this case, my desired visualization focused on many states, so it was easy to filter out a handful. If you are working with a smaller group of states, an inclusive filtering approach might work better.)

exclude = ['Alaska','Hawaii', 'Puerto Rico']
excludeAbbrev = ['AK','HI','PR']

# filter the data
state_grant_info = gdf_neh_1960s[~gdf_neh_1960s.InstState.isin(excludeAbbrev)]

# basic summary information for title
#number of awards
numGrants = state_grant_info['AppNumber'].count()
#dollars awarded
totalOutright = state_grant_info['AwardOutright'].sum()

#plot the map & points
fig, ax = plt.subplots(1, figsize=(30,20))

# filter the base map
base = us_states[us_states['NAME'].isin(exclude) == False].plot(ax=ax, color='#d1b26f')

# plot the positions
state_grant_info.plot(ax=base, color='#2C4B85')

plt.title('NEH Grants awarded in the continental United States, 1966-1969 ({0} awards, ${1:,.2f})'.format(numGrants, totalOutright), fontfamily=['Georgia','serif'], fontweight='bold', fontsize='x-large')
plt.show()

png

And there we have it! A map of the continental US, with the location of each NEH grant recipient of the 1960s displayed!

There are still a few more things that might be useful, including drawing in Alaska, Hawaii, and Puerto Rico, which also received grants during this time. Additionally, the NEH does make awards to other entities, including Guam, the US Virgin Islands, and American Samoa, so there could be a lot of additional tweaking necessary.

Since everyone is looking at things on the web and on their phones these days, an interactive “slippy” web map would also be nice for display. Those are tasks that I may explore in future installments. For the purposes of this demonstration, however, the map is ready!

Reference list

Credit to the examples in these tutorials and projects (as of January 2021), which were highly informative to the exploratory work outlined above:

Duong Vu, “Introduction to Geospatial Data in Python” at Datacamp, published 24 October 2018.
Lesley Cordero, “Getting Started on Geospatial Analysis with Python, GeoJSON and GeoPandas” at Twilio, published 14 August 2017.
Dani Arribas-Bel, “Mapping in Python with geopandas”, from Geographic Data Science ‘15. Provides a lot of useful information about creating more complex plots.

See this site for US state shapefile information:

Eric Celeste http://eric.clst.org/tech/usgeojson

Interesting mapping projects with historical data

Selena Qian, “Sanborn Maps Navigator”, example of creating a map interface to explore an inventory of the Sanborn maps at the Library of Congress.
USGS historical atlas explorer https://livingatlas.arcgis.com/topoexplorer/index.html
“Keweenaw Time Traveler,” an interactive historical GIS project from the Historic Environments and Spatial Analytics Lab at Michigan Technological University.

Share on

Twitter Facebook LinkedIn

Wrangling Humanities Data: Exploratory Maps of NEH Awards by State

Jesse Johnston

Mapping State by State

Set up the environment

Start with the geojson

Mapping State by State

Define and use map shapes for US states

Plot the points in the states

Map the grants in Minnesota

Map the grants in any US state

Map the grants in multiple states

Reference list

Interesting mapping projects with historical data

Share on

Comments

You May Also Enjoy

What do you mean Knowledge Environments?

Playlist: Naming is Powerful, and So is Re-Naming

Metadata: Origins and Metaphors

Encoding Reparative Description: Preliminary Thoughts