This post continues with “Wrangling Humanities Data,” which drafts a data curation project using publicly publicly-available grant data provided by the National Endowment for the Humanities (NEH). This installment uses the geospatial dataset previously created and uses some of the visualization tools provided by the geopandas library.

As before, the process demonstrated below is also included in This process uses the geopandas data library, which is supported in a Python environment and Jupyter notebook.

As in the previous post, you can also download a Jupyter Notebook version of this post from the GitHub repository along with all of the data discussed here. File references discussed below are included in the same neh-grant-data-project repository.

# Mapping State by State

Now that I have a pretty good set of points, I wanted to visualize these against maps for the states. For example, is it possible to see all of the grants in a given state for a given time period? Could I look at different states? What about all of the continental states? For these kinds of questions, geopandas has a lot of built-in tools for filtering the clean data, as well as for outputting a few initial maps. Below, I walk through a process to develop state-by-state maps. This uses some of the filtering capacities of pandas in combination with the mapping visualization tools of geopandas. Let’s go!

## Set up the environment

This activity will only use two python modules: geopandas and matplotlib. However, it also requires the clean dataset that I developed previously. Since that was saved as a geojson file, it is now reusable and serves as the basis for these examples. The data file is included in the repository as neh_1960s_grants.geojson. I am still exploring this data, so rather than looking at the entirety of the NEH data that was available, I continue working with the 1960s decade as before.

import geopandas as gpd

import matplotlib.pyplot as plt
%matplotlib inline


Having the previously created and saved geojson data, which is already cleaned and transformed to include valid POINT coordinates, now I can use the file to load in data rather than going through the cleaning and transformation process again.

gdf_neh_1960s = gpd.read_file('neh_1960s_grants.geojson', driver='GeoJSON')

gdf_neh_1960s.head()

AppNumberInstitutionInstCityInstStateInstPostalCodeInstCountryCongressionalDistrictLatitudeLongitudeYearAwardedProjectTitleProgramDivisionAwardOutrightProjectDescToSupportParticipantsDisciplinesgeometry
0FB-10007-68Regents of the University of California, BerkeleyBerkeleyCA94704-5940USA1337.87029-122.268131967Title not availableFellowships for Younger ScholarsFellowships and Seminars8387.0No descriptionNo to support statementJohn Elliot [Project Director]EnglishPOINT (-122.26813 37.87029)
1FB-10009-68Pitzer CollegeClaremontCA91711-6101USA2734.10373-117.707011967Title not availableFellowships for Younger ScholarsFellowships and Seminars8387.0No descriptionNo to support statementSteven Matthysse [Project Director]History of ReligionPOINT (-117.70701 34.10373)
2FB-10015-68University of California, RiversideRiversideCA92521-0001USA4133.97561-117.331131967Title not availableFellowships for Younger ScholarsFellowships and Seminars8387.0No descriptionNo to support statementJohn Staude [Project Director]History, GeneralPOINT (-117.33113 33.97561)
3FB-10019-68Northeastern UniversityBostonMA02115-5005USA742.33950-71.090481967Title not availableFellowships for Younger ScholarsFellowships and Seminars8387.0No descriptionNo to support statementThomas Havens [Project Director]History, GeneralPOINT (-71.09048 42.33950)
4FB-10023-68University of PennsylvaniaPhiladelphiaPA19104-6205USA339.95298-75.192761967Title not availableFellowships for Younger ScholarsFellowships and Seminars8387.0No descriptionNo to support statementGresham Riley [Project Director]PsychologyPOINT (-75.19276 39.95298)
gdf_neh_1960s.shape

(997, 19)


The data import worked as expected. The first four records appear correctly, and the .shape call shows 997 records, which is what it should be (the CSV had 998 rows).

## Mapping State by State

Now I can use geopandas to filter the data to create different kinds of maps, and to export maps to files that can be reused in reports.

### Define and use map shapes for US states

The project to clean and check data quality focused on information with single, clear geographic coordinates, aka POINTs. For more complex maps, I need to use some additional geospatial data types. Geospatial data and visualization most frequently uses 2D shapes, called Polygons or Multipolygons, which are used to represent areas on maps like states or countries. In this task to create state maps, I will need to pair the point data with the states’ shape data. Rather than going to data.gov, which is where the NEH data can be found, I used the useful US state shape data that Eric Celeste has made available, which makes it easy to get geojson and various levels of detail.

Fortunately, the US state shapes are well defined, and the Census Bureau and others make the information readily available. This section explains how to import the shapes for US states, how to display the shapes, and then how to display the points for each grant on the state shapes.

# test the states shapefile


00400000US0101Alabama50645.326MULTIPOLYGON (((-88.12466 30.28364, -88.08681 ...
20400000US0404Arizona113594.084POLYGON ((-112.53859 37.00067, -112.53454 37.0...
30400000US0505Arkansas52035.477POLYGON ((-94.04296 33.01922, -94.04304 33.079...
40400000US0606California155779.220MULTIPOLYGON (((-122.42144 37.86997, -122.4213...
type(us_states)

geopandas.geodataframe.GeoDataFrame

type(us_states['geometry'])

geopandas.geoseries.GeoSeries


I expected the data to be imported as a GeoDataFrame, which it is, and I confirmed that the geometry column is a series data. In the display of the data, also note that the geometry types are all “Polygon” or “Multipolygon”. (The latter is a separate datatype that is required when states have non-contiguous parts, like the Hawaiian islands.) Note that the NAME column has the full state name, which I will use to reference the shapes.

Next, I want to see how the shapes look using the .plot() method:

# nb: the first time I ran this, it required installation of the descartes module
us_states.plot(figsize=(20,20))

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af34b90>


The proportions look a bit strange with all of that whitespace on the right, but on a close inspection I can see at the far right that there are a few of the Aleutian Islands flowing across the dateline. See the small blue dots over to the right? Very small, but they are there! Other than that… pretty cool! This is what I wanted!

Now I’ll refine this a bit more to show the continental US… (with apologies, but I am leaving the display of Alaska, Hawaii, and Puerto Rico for a future project).

# for illustration, exclude Alaska & Hawaii & Puerto Rico
continental = us_states[us_states['NAME'].isin(['Alaska','Hawaii', 'Puerto Rico']) == False]

continental.plot(figsize=(30,20), color='#d1b26f')

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af62090>


That looks good! Note above that I used the geopandas function .isin() above to filter out any shapes that did not appear in the list using a boolean filter (“False”).

Similarly, I can “zoom in” and look at just a single state. Again using the .isin() function, but this time the opposite boolean value is set to “True”:

# display shape for a single state

hawaii = us_states[us_states['NAME'].isin(['Hawaii']) == True]

hawaii.plot(figsize=(30,20), color='#d1b26f')

<matplotlib.axes._subplots.AxesSubplot at 0x7fa34af89a10>


Yes, that looks like Hawaii!! And note that it’s a great example of a “multipolygon” state :smile:

Now, let’s plot the coordinates on these shapes…

## Plot the points in the states

Now that I can draw the states, I want to plot the grant point data! This section begins using the matplotlib module for expanded visualization capabilities, such as the inclusion of a title and setting colors.

### Map the grants in Minnesota

For example, to display the grants in one state, I can use filtering to pull grants given to the state of Minnesota, and likewise filter to display the Minnesota state shape. Note I have added the “figure” (figure) and “axis” (ax) components, which the visualization library is using to render different parts of the image. I have also set colors in the arguments provided to the .plot() functions.

# set the desired data
state = 'Minnesota'
Minnesota_1960s_grants = gdf_neh_1960s[gdf_neh_1960s['InstState'] == 'MN']

# plt to see the points on the map
fig, ax = plt.subplots(1, figsize=(30,20))
base = us_states[us_states['NAME'].isin([state]) == True].plot(ax=ax, color='blue')

# plot the positions
Minnesota_1960s_grants.plot(ax=base, color='yellow')

plt.show()


Woohoo! This is the first thing that really looks like a map! A clear, visually simple, graphic that combines the geographic shape information with the grant data from the NEH to show where the money went!

### Map the grants in any US state

Now, let’s make this more dynamic: create options that allow for easier generation of maps for any state.

First, set map parameters to select a state (lines 2 and 3), filter the data based on the selections (line 6), draw out some basic information that can be used in the graphic (lines 8 through 32), then plot the selected data (line 34 and following). I’ve used pandas filters to draw out various information, like a starting year and ending year, to determine the number of grants that are shown in the image, to sum the dollar amounts of the awards, and to specify colors for the output; most of this information is bundled into the displayInfo dictionary. On line 42, I used python’s string format substitution methods with some text filtering (see here for a guide from RealPython), to provide some well-formatted information for the image title and legend.

# set the state info - to map other states, change the next two lines
stateAbbrev = 'MI' #use standardized 2-letter postal abbreviations
state = 'Michigan' #use full name of state written out with spaces and no diacritics

# filter the data
state_1960s_grants = gdf_neh_1960s[gdf_neh_1960s['InstState'] == stateAbbrev]

# get label info
#start year
startYear = state_1960s_grants['YearAwarded'].min()
#end year
endYear = state_1960s_grants['YearAwarded'].max()
#number of awards
numGrants = state_1960s_grants['AppNumber'].count()
#dollars awarded
totalOutright = state_1960s_grants['AwardOutright'].sum()
#map shape color
map_color = '#d1b26f'
#point color
point_color = '#2C4B85'

# create a bundle of info for display rendering
displayInfo = {
'startYear' : startYear,
'endYear' : endYear,
'numGrants' : numGrants,
'outrightDollars': totalOutright,
'map_color' : map_color,
'point_color' : point_color,
'state' : state,
'abbrev' : stateAbbrev
}

# plt to see the points on the map
fig, ax = plt.subplots(1, figsize=(30,20))
base = us_states[us_states['NAME'].isin([state]) == True].plot(ax=ax, color=map_color)

# plot the positions
state_1960s_grants.plot(ax=base, color=point_color, legend=True)

plt.title('NEH Grants awarded in {0[state]}, {0[startYear]}-{0[endYear]} ({0[numGrants]} awards, ${0[outrightDollars]:,.2f})'.format(displayInfo), fontfamily=['Georgia','serif'], fontweight='bold', fontsize='x-large') # title uses some advanced string formatting to display the total$ amounts as a currency display
plt.legend(['Indicates one award (points may overlap)'])
lims = plt.axis('equal') # not sure what this is actually doing, although some states seem less 'squished'

plt.show()


The map looks a bit squashed top to bottom, but aside from that, this is a great start!

### Map the grants in multiple states

Now, what if I want to map the grants from more than one state? One of my goals is to map the points of all the grants in the continental US. Using similar filtering functiosn to those I used above (.isin() and boolean filters), I can exclude the data that I don’t want. Note the specialized use of the ~ character here, in line 5, which reverses the filter, effectively displaying anything that is “False,” that is to say it will return all the values not in the exclude set.

(In this case, my desired visualization focused on many states, so it was easy to filter out a handful. If you are working with a smaller group of states, an inclusive filtering approach might work better.)

exclude = ['Alaska','Hawaii', 'Puerto Rico']
excludeAbbrev = ['AK','HI','PR']

# filter the data
state_grant_info = gdf_neh_1960s[~gdf_neh_1960s.InstState.isin(excludeAbbrev)]

# basic summary information for title
#number of awards
numGrants = state_grant_info['AppNumber'].count()
#dollars awarded
totalOutright = state_grant_info['AwardOutright'].sum()

#plot the map & points
fig, ax = plt.subplots(1, figsize=(30,20))

# filter the base map
base = us_states[us_states['NAME'].isin(exclude) == False].plot(ax=ax, color='#d1b26f')

# plot the positions
state_grant_info.plot(ax=base, color='#2C4B85')

plt.title('NEH Grants awarded in the continental United States, 1966-1969 ({0} awards, \${1:,.2f})'.format(numGrants, totalOutright), fontfamily=['Georgia','serif'], fontweight='bold', fontsize='x-large')
plt.show()


And there we have it! A map of the continental US, with the location of each NEH grant recipient of the 1960s displayed!

There are still a few more things that might be useful, including drawing in Alaska, Hawaii, and Puerto Rico, which also received grants during this time. Additionally, the NEH does make awards to other entities, including Guam, the US Virgin Islands, and American Samoa, so there could be a lot of additional tweaking necessary.

Since everyone is looking at things on the web and on their phones these days, an interactive “slippy” web map would also be nice for display. Those are tasks that I may explore in future installments. For the purposes of this demonstration, however, the map is ready!

## Reference list

Credit to the examples in these tutorials and projects (as of January 2021), which were highly informative to the exploratory work outlined above:

See this site for US state shapefile information:

• Eric Celeste http://eric.clst.org/tech/usgeojson

### Interesting mapping projects with historical data

• Selena Qian, “Sanborn Maps Navigator”, example of creating a map interface to explore an inventory of the Sanborn maps at the Library of Congress.
• USGS historical atlas explorer https://livingatlas.arcgis.com/topoexplorer/index.html
• Keweenaw Time Traveler,” an interactive historical GIS project from the Historic Environments and Spatial Analytics Lab at Michigan Technological University.

Tags:

Categories: