Jesse Johnston

What do you mean Knowledge Environments?

2025-01-08T00:00:00+00:00

Over the past few years, as a faculty member at the University of Michigan School of Information, I have been working with colleagues to re-envision our curriculum in the areas that have been traditionally described as “LIS”, or Librarianship and Information Science, and “ARM”, or Archives and Records Management. (Actual traditionalists would likely call it Library and Information Science, but since the various configurations of institutions described by the term “library” is actually rather capacious, I prefer to say “librarianship,” which refers to the practices of working as a librarian rather than in a building that has Library written above the door.)

The outcome of some of these conversations has been the re-envisioning of our courses that fit in these areas under a common rubric that we have dubbed LAKES for Librarianship, Archives, and Knowledge Environments in Society.

The first two elements are familiar to many, but the third - “knowledge environments” - may not be. So, what do we mean, “knowledge environments”? All memory institutions, those organizations that Trevor Owens has described as “the institutions that function to keep alive and enable access to and use of cultural memory,”¹ have been challenged by the rapid conversion toward digital media, information, and networked communications technologies. As more and more cultural materials are born digital or digitally reformatted, we see increasing needs for a new generations of digital archivists and digital preservation specialists to manage these collections. In repositioning libraries, archives, and other traditional anchor institutions of humanities research, Sidonie Smith describes the proliferation of digital collections and digitized resources as “knowledge environments,” which “increasingly impact how academic humanists think about their projects of scholarly inquiry and their vehicles of scholarly publication.”² As these knowledge environments become more ubiquitous, the current landscape for digital scholarship in the humanities has also matured, even developing into new “digital infrastructure” for some areas of humanities inquiry.³ The new generation of scholars and stewards must navigate these knowledge environments, learn how to learn about new tools, use the tools as part of a normal problem solving routine in the management of digital collections, and learn how these tools can be used to support core archival, digital curation, and digital preservation values such as provenance and authenticity and core functions, such as collection development, description, and access. No other graduate archival program has developed a similar suite of curricular modules and leading digital curation tools to support the Master’s-level education of new digital archivists and curators or has disseminated these materials as a reusable, national resource freely available to the educational community.

Thus we’re choosing “knowledge environments” as a more accessible, inclusive, and hospitable term for what others may call “cultural heritage institutions,” “knowledge infrastructures,” “epistemic infrastructure,” or “big L” librarainship.

Endnotes

Trevor Owens, After Disruption: A Future for Cultural Memory (Ann Arbor: University of Michigan Press, 2024), p. 2. Available at https://doi.org/10.3998/mpub.12410213. ↩
Sidonie Smith, Manifesto for the Humanities: Transforming Doctoral Education in Good Enough Times (Ann Arbor: University of Michigan Press, 2015), pp. 43–45. Available at https://doi.org/10.3998/dcbooks.13607059.0001.001. ↩
Donald J. Waters, “The Emerging Digital Infrastructure for Research in the Humanities,” International Journal on Digital Libraries 24 (2023): 87–102, at https://doi.org/10.1007/s00799-022-00332-3. ↩

Playlist: Naming is Powerful, and So is Re-Naming

2024-12-16T00:00:00+00:00

A 2022 essay claims that changes in descriptive language, or “alternative vocabulary” as described in the essay, is problematic. There’s a certain point, that a change in descriptive terminology should not be confused with a substantive change in history or even necessarily historiography. The claim ignores, however, that for us in libraries and archives, the issue is not historiography but our current descriptions. These descriptions are not static, they are created after the fact and managed by us and in our systems, and in fact beyond selection of contents in a collection, the language to describe and discover materials is all that we can indeed control. It’s about what Hope Olson calls the “power to name,” which Olson calls a specific domain of librarians and archivists. If we can make our descriptive languages (which can always only be in the present) more sensitive, accurate, or even attentive to the present moment, then it is our job to do it. Fine with me if historians think it’s just some window dressing, but even if we’re “just” changing language, it can be impactful not only in that materials are described more accurately and ethically, but also to promote changing discourse in the spirit of promoting the ability of users to find and discover things in our collections. And if these changes reflect consultation with communities and source groups who may want changes, or have different common language than we do, then updating descriptive language is certainly not only to make “us” feel better.

The essay goes on to observe that “language refers not to an historical object or concept but to itself; the given term or phrase does not produce a more ethical historical framework but instead reflects the semantic process by which such a framework is sought. Put simply, these correctives can refer to us more so than they refer to the historical subjects about whom we think and write.” This is exactly the point regarding disocovery. It is more useful to describe materials with correct and current language (current, that is, for current users in a given time and place). For librarians and archivists, as people who control description and classification, it’s absolutely imperative and within our obligations to make changes that are sought or requested by people or groups who are connected to and represented in the records. At the end of the day, the goal is to change language for “us” (here, now) never to distort the “aboutness” of the things being described.

Metadata: Origins and Metaphors

2024-09-10T00:00:00+00:00

The word metadata combines the Greek prefix “meta-,” in the sense of above or beyond, with “data,” a fact or piece of information (discussed elsewhere in this volume). The resulting concept of metadata suggests “something that is beyond the data: a statement or statements about the data.”¹ While a potentially infinite variety of information might be included in such “information about” other information or resources, in the context of collections and repositories the word generally refers to information that is used to enable the discovery, use, management, grouping, or preservation of information resources.²

Multiple points of origin have been suggested for the word metadata and its use in resource description. Pomerantz suggests the word is a “deliberate play” on the usage of “meta-“ as applied in Metaphysics, the title applied to the collected works of Aristotle that dealt with topics “beyond” the Physics.³ Another story suggests current usage of the term in libraries, at least if “spelled with an initial capital or all capitals, was actually coined by Jack E. Myers in the late 1960s and registered in 1986 as a trademark of the Metadata Company.”⁴ Leaving aside the possible implication that anything referred to as metadata must be a current or future product of that company, the usage of metadata in this essay assumes that current, and more generalized usage explored here, is the sense of the word as it has “entered the public domain.”⁵

Whatever the origins of “metadata,” since the 1990s the term has been broadly used within cultural heritage organizations to describe information about collection resources. This usage coincided with the rise of the World Wide Web and the World Wide Web Consortium (W3C). By 1995, with the first “Dublin Core” workshop, which resulted in the creation and promulgation of the Dublin Core Element Set, the word metadata had worked its way into “the working vocabulary of mainstream librarianship.”⁶

The multiple points of origin for metadata as a word, and a concept, suggest why the term seems at once so capacious but also highly technical and precise. While the usage came from the world of computing, where it referred to the way that operating systems and networks managed and transported digital resources, it easily mapped conceptually onto the practices of resource description in libraries, archives, and museums (LAMs). As Campbell points out, this diversity of stakeholders offers a productive tension: since “metadata has evolved from several different communities,” it requires a “theoretical paradigm that will enable us to distinguish between different activities that have the same labels, and to recognize similar activities labeled differently.”⁷ Gill suggests that, as the word has become “increasingly adopted and co-opted by more diverse communities, its definition has grown in scope to include almost anything that describes anything else.”⁸ This breadth of usage, however, may be an opportunity that opens up a chance for dialogue between stakeholder communities and cross-pollination of practice.

To address the multiple possible meanings of metadata, Campbell develops a framework based in language. This extends the legacy of Svenonius’s approach to information organization, which posited that “information is organized by describing it using a special-purpose language.”⁹ This underlies a key insight of metadata work as derived from bibliographic practice, which suggests that descriptive, bibliographic metadata is rooted in specific and formulaic language use within a controlled system. Campbell extends this approach to elaborate a theory that describes information used for resource discovery as “metonymic” (a catalog record is a surrogate for the resource it describes) and information for resource use and management as “metaphoric” (information that provides a paradigm for how a resource might be used).¹⁰ In the latter case, a resource with a content type of “text” might logically be assumed to be read.

Metadata work, as an interdisciplinary area of practice, draws together multiple information communities, and potentially allows for a productive exchange of perspectives. In this spirit, Mitchell describes metadata work as a “boundary object.”¹¹ Following this idea, metadata work illustrates aspects of Star’s characterization of boundary objects as “plastic enough to adapt to local needs and constraints, . . . yet robust enough to maintain a common identity across sites. . . . They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable, a means of translation.”¹² One of Star’s canonical forms of boundary objects is the “standard form.” In the case of collectors for the Berkeley Museum of Vertebrate Zoology was a standard way for them to note factual information such as species, date, and location of collection, all of which would now be understood as “standardized information” to build and organize the museum’s collections.¹³ In other words, metadata about a museum’s collections. Importantly, boundary objects “may be abstract or concrete,” thus encompassing both metadata’s instantiation in system memory and storage, but also in the various work areas surrounding it, including description, standardization, creation, operation and analysis, and exchange. In this view, metadata work occupies an interstitial, community-connecting space, productively bridging descriptive traditions in cultural heritage with approaches from the web and digital culture, records management, as well as information technology.

While metadata work may have multiple origins, the two perspectives outlined above offer useful frameworks for understanding metadata as an area of information work. The first view understands metadata as a special use of language, in particular language used to provide “information about” resources in a collection or system. The second view illustrates that metadata work may encompass multiple spheres of activity with similar ends of organizing and managing information, which may operate under different means or names, but can still be understood as shared area of work that offers opportunities for negotiation and translation.

Note: this post excerpts a draft of a larger essay that I am currently working on for an edited volume. Stay tuned for that as the canonical, extended version of the thoughts below.

Update: the full draft is now posted, and comments or suggestions are welcome at the PubPub platform: https://digital-preservation-a-critical-vocabulary.pubpub.org/pub/2ib9hvvg.

Endnotes

Jeffrey Pomerantz, Metadata, The MIT Press Essential Knowledge Series (Cambridge, Mass., and London: MIT Press, 2015), 6; Steven Jack Miller, Metadata for Digital Collections: A How-to-Do-It Manual, 2nd ed. (Chicago: ALA Neal-Schuman, 2022), 1. And others. ↩
The specific distinction of organizing information resources and organizing “information about” those resources is further elaborated by Glushko et al. in The Discipline of Organizing (2013), but this formulation as a useful way to talk about metadata occurs in other definitions, too (e.g., Caplan 2003). ↩
Pomerantz, Metadata, 5. ↩
Priscilla Caplan, Metadata Fundamentals for All Librarians (Chicago: American Library Association, 2003), 1. ↩
Caplan, Metadata Fundamentals, 1. ↩
Caplan, Metadata Fundamentals, 2; see also S. Weibel, “Dublin Core Metadata for Resource Discovery,” Request for Comments (Internet Engineering Task Force, 1998), https://www.ietf.org/rfc/rfc2413.txt; Miller, Metadata for Digital Collections: A How-to-Do-It Manual. ↩
D. Grant Campbell, “Metadata, Metaphor, and Metonymy,” Cataloging & Classification Quarterly 40, no. 3/4 (2005): 58–59, https://doi.org/10.1300/J104v40n03_04. ↩
Tony Gill, “Metadata and the Web,” in Introduction to Metadata, ed. Murtha Baca, 3rd ed. (Los Angeles: Getty Publications, 2016), https://www.getty.edu/publications/intrometadata/metadata-and-the-web/. ↩
Elaine Svenonius, The Intellectual Foundation of Information Organization (Cambridge, Mass., and London: MIT Press, 2000), 1, https://mitpress.mit.edu/9780262512619/the-intellectual-foundation-of-information-organization/. ↩
Campbell, “Metadata, Metaphor, and Metonymy.” ↩
Erik Mitchell, Metadata Standards and Web Services in Libraries, Archives, and Museums: An Active Learning Resource (Santa Barbara, Calif.: Libraries Unlimited, 2015). ↩
Susan Leigh Star and James R. Griesemer, “Institutional Ecology, ‘Translations’ and Boundary Objects: Amateurs and Professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39,” Social Studies of Science 19, no. 3 (August 1989): 393, https://doi.org/10.1177/030631289019003001. ↩
Star and Griesemer, 410. ↩

Encoding Reparative Description: Preliminary Thoughts

2023-09-17T00:00:00+00:00

Since the beginning of this year, I have been working with the project team for ReConnect/ReCollect, a project supported by the University of Michigan to examine the legacy of its colonial collections from the Philippines. As the project puts it, they are “committed to developing models for culturally-responsive and historically-minded stewardship of the Philippine collections at the University of Michigan.” One portion of this work may be described as “reparative description,” which in this case refers to the addresing of harmful, outdated, or incorrect terminology and language in the description of these collections.

Because these collections are extensive and spread across multiple repositories in the University, making changes is not a simple task of updating a few database records. It also requires institutional coordination and changes that must be implemented in multiple systems by multiple people. To address this challenge, I worked with the team to develop tools that would help to understand the metadata across different repositories and to begin to identify pathways of addressing various problems that the project has identified. We quickly began to realize that bulk editing and analysis approaches would usefully complement the work. By analyzing the collections as a whole, we hoped to develop understandings that could inform or confirm our analyses, particularly the insights and recommendations arising from the analysis of harmful terminology. We were specifically hoping to develop tools that would help in the analysis of hundreds of descriptive records, which were identified by the project during our collections survey, in aggregate. While much work in the rest of the project has focused on collection inventory and analysis, we also hoped to create methods that could assist collections managers in analyzing and understanding collection metadata, with the ultimate goal of creating reusable, code-based tools that could assist our project and others in changing or updating collection descriptions.

The project team identified more than two hundred finding aids that described materials at the University of Michigan related to the Philippines. These documents provided the basis of a dataset for analysis. These finding aids came from three collections on campus: the Bentley Historical Library, Special Collections Research Center, and the Clements Library. All provided data in a text-based markup format known as Encoded Archival Description (EAD), which was shared in eXtensible Markup Language, a standard data format that is frequently used to share metadata. The analysis of the data was undertaken by graduate student Ella Li (School of Information) and myself. We worked together to develop a series of analysis tools. The goal was to analyze the data, with the end goal of producing useful data visualizations, and to begin creating tools that could be used or repurposed to begin making changes to the descriptions. We developed analysis modules using the Python programming language and widely used analysis and visualization tools. The resulting code is available, and can be reused or repurposed, through a series of interactive code examples now available on GitHub.

Playlist: Decluttering the Library

2023-03-01T00:00:00+00:00

The activities of building, maintaining, curating, and providing access to collections, mostly of books but many other sorts of things as well, is a core activity of libraries. The politics of these activities, although generally non-controversial, has recently become a lightning rod. Whether identifying as librarians, archivists, curators, or something else, many of us devote our work and passions to these activities, and many have received intensive, specialist, and technical training about how to develop collections, organize them, make them discoverable, and to share the ideas and usefulness of the knowledge and information that we believe comes from or is contained in these collections. Librarians, for example, not only collect and organize books, most are fierce advocates for freedoms of expression and the right to learn. This has been clearly stated and articulated by the American Library Association’s “Freedom to Read” statement, which was first issued in 1953, which among other things expresses “the responsibility of publishers and librarians, as guardians of the people’s freedom to read, to contest encroachments upon that freedom by individuals or groups seeking to impose their own standards or tastes upon the community at large; and by the government whenever it seeks to reduce or deny public access to public information.”

While many recent attempts to control library collection priorities have focused on the control of social and cultural narratives, there is another threat to library collections: the refusal to admit that a library should ever get rid of a book! Let’s call it “book fever.” This is the sort of thing that possesses a woman in a small Ontario city to get more and more (and more!) books even though she has no space to store them. As the CBC reported in January 2022, it sounds like a bookseller gone obsessive: she owns a book store but now “has a barn and two other farmhouses on the same property full of donated” books. That might not be a problem were there some use for these books, yet “The books are essentially in deep storage, inaccessible to customers and unsearchable by the bookstore’s three-person staff since most of them aren’t catalogued.” She calls it a “tsunami of books” and says she has “no place to store them.” Yet it appears she can’t stop accepting books from the local Salvation Army, which receives hundreds of books a week.

While clearly these books are of no interest to their previous owners, there doesn’t seem to be an indication that there is demand elsewhere. Is this like food waste, where there inefficient distribution and overproduction lead to thousands of pounds of waste every year yet many people still don’t have enough food? You see, books are things but ideas aren’t quite the same as things: there is a zero sum aspect of things - you either have a thing or you don’t, and if you give it to someone else, they have the thing - ideas, on the other hand, can be shared but simultaneously kept. I suspect that what these book hoarders want is to share knowledge, but they’re confusing that with the keeping of books.

A few librarians slid into this conversation on twitter. Mary Cavanagh pointed out that while one of Ranganathan’s laws may be “to every book its reader,” that doesn’t mean mindless hoarding; instead, recycling and deaccessioning unused books is “ethical, necessary, and practical.”

While 1st Ranganathan rule may be, ‘to every book a reader,’ 2nd rule of librarianship is that recycling old unwanted books is ethical, necessary, and practical. To the bin. https://t.co/lbrKbyCRs7
— Mary Cavanagh (@mfcavanagh) January 16, 2022

Another, David Kemper, tweeted that such hoarding is a “disservice” and one respondent pointed out that some of these books came from public libraries where presumably people “won’t even borrow [them] literally for free.”

Agreed. I see some public library stickers on one book there, so there are books that people won't even borrow literally for free. I challenge anyone to detect even one book that is worth reading in those pictures.
— Ryan Deschamps (@RyanDeschamps) January 19, 2022

Another library perspective noted “Books are not inherently sacred. . . . Weeding collections is a good thing.”

Apparently it needs to be said again. Books are not inherently sacred. Books have a lifespan. Recycling books is not a bad thing. Weeding collections is a good thing. Books are not inherently sacred. Stop acting like they are. https://t.co/QQ5Mo4cjLo
— Natalie (@InkyLibrarian) January 16, 2022

Presumably, should someone show up at these aforementioned barns and want to take one of the books, the hoarder should be happy, right? But it doesn’t seem like anyone is stampeding to the place. Captain Obvious on twitter suggested a free warehouse, which was duly laughed off the stage.

LOL! OK, OK, I give up, librarians. It's Monday morning and I have no time to fight for the idea of a free warehouse of cast-off books. I get it: you know all about this problem, etc., and you don't have money to solve it, etc.

I don't have time to engage further. I'm done.
— Captain Obvious (@lsanger) January 17, 2022

(By the way, did you mean a public library, where we share the costs of maintaining the collection so more of us can use the things and, more importantly, find what we’re looking for, not some random castoffs?)

The problem is real. In December, Karen Heller wrote about Fran Lebowitz’s collection of books in The Washington Post. Heller noted Wonder Books, a warehouse of 6 million books in Frederick, Maryland, that continues to receive 300,000 volumes every month. The books are in literal piles and bins. Like the Canadian book tsunami, these American’s were “drowning in old books.” It may be that nobody likes to throw away a book, but there remain a lot of books out there that aren’t very good. Huge runs of hardbacks are published and immediately on sale in big box stores, where most will go unread. But there’s a lot of things that just might benefit from a reconsideration. The Awful Library Books site, with the tagline “Hoarding is not collection development!”, has many to choose from. For example, unless you’re a collection devoted to the history of book organization and cataloging (there are collections that are!), you should not ask your high school library to retain a copy of the ALA Rules for Filing Catalog Cards from 1942, unless you’re preparing your kids for a career in WWII-era libraries. This might be a good reference point for the students in my “Introduction to the Organization of Information” class who need to know about the history of the profession, but probably not a lot of others. There are plenty of institutions that do have a copy of this, that have the resources to keep it, and it’s not doing anyone any good hanging out lost in their barn!

Book hoarders may have overlooked Ranganathan’s first law, which is: books are for use.

Whether you call it tsundoku, bibliomania, or just plain book fever, the activity of collecting and storing books is clearly a human passion. But when it gets too focused on the possession of books qua books, we have lost focus on the reason that we make libraries: to promote knowledge, learning, and community engagement.

Unrelated Mini-Playlist

An unfortunately missed previous libraries playlist could’ve focused on the “cheese slice bookmark” tweets, which were in high circulation around January 2020 when the University of Liverpool Library tweeted a photo of a packaged cheese slice found stuffed between the pages of a returned book. It was covered by Know Your Meme, though it’s now under review (presumably for being too “niche”?).

Evaluating Equity and Inclusion in Cultural Heritage Grantmaking

2023-02-20T00:00:00+00:00

I was pleased last week to see the release of our assessment report on the Digitizing Hidden Collections: Amplifying Unheard Voices grant program. The report is titled Evaluating Equity and Inclusion in Cultural Heritage Grantmaking: CLIR’s Amplifying Unheard Voices Program. The report was co-authored by Ricky Punzalan and me, and it was published by the Council on Library and Information Resources (CLIR). As I noted on twitter:

Exciting to see this out! It was a privilege to work with @archivalflip and @CLIRgrants on the program assessment for Amplifying Unheard Voices! There's a lot for funders here, & there's lots of advice for cultural heritage grantseekers too! https://t.co/awm9s8GEI9 pic.twitter.com/R7TBlVAmnY
— Jesse Johnston 💻🖋😻🐕🎓🌈🐝☮️ (@jesseajohnston) February 16, 2023

In their description of the report, CLIR writes:

This report summarizes a yearlong program assessment of “Amplifying Unheard Voices,” a major revision of CLIR’s Digitizing Hidden Collections grant program. The revision sought to expand the reach and appeal of the program to a broader range of institutions, including independent and community organizations, and to emphasize the digitization of historical materials that tell the stories of groups underrepresented in the digital historical record. Significant changes were made to the application structure, new applicant support resources were created, eligibility was expanded to Canada, and new thematic emphases and program values were added. The assessment was based on a series of qualitative data-gathering activities that included stakeholder groups and staff. Through surveys and interviews of applicants, inquirers, proposal reviewers, and staff, the authors provide a holistic view of the program, offer a series of recommendations, and identify areas for further attention.

The full report and accompanying data are available from CLIR as a pdf.

Assessing Digitizing Hidden Collections: Amplifying Unheard Voices

2022-12-16T00:00:00+00:00

I’m excited to announce that, after a year of consulting with the Council on Library and Information Resoruces to assess their implementation of a major revision of Digitizing Hidden Collections, Ricky Punzalan and I will be publishing our final report. We have just completed final copyedits and the report will be available from CLIR in January 2023. We have presented this work publicly already, including in a presentation at the 2022 Digital Library Federation/Hidden Collections Symposium in Baltimore, Maryland (slides here), and the draft of the report has been available for public comment since October. Now, the complete, final, edited report is working its way to press. The following excerpts come from the report draft.

“We see further potential to increase equity in funding programs and representation of community stories in the digital historical record.”

Background

Digitizing Hidden Collections (DHC) is a major funding program that has supported the digitization of unique, historical collections since 2015. Grants are administered and awarded by CLIR, but the funding comes from The Andrew W. Mellon Foundation. In 2020, CLIR worked with Mellon to adapt the program so it could better serve less-frequent-grantseeking organizations and emphasize the digitization of historical collections that told the story of groups under-represented in the digital historical record. The report summarizes my year-long work with Ricky Punzalan to assess the resulting program, “Amplifying Unheard Voices.” Through the 2021 program revision, CLIR aimed to expand the reach and appeal of the program to a broader range of institutions, including independent and community-based organizations and to emphasize the digitization of historical materials that tell the stories of groups underrepresented in the digital historical record. The program revision implemented significant changes to the application structure, created new applicant support resources, expanded eligibility to Canadian applicants, and added new thematic emphases and stated program values.

Higlighted findings

Enthusiasm for the program is high. The changes in the 2021 DHC:AUV iteration were warmly received by many potential applicants, including organizations that are not frequent grant seekers for collections-related activities as well as many organizations that have previously applied to DHC. The revised program was recognized as a critical funding resource unique in its newly articulated support for collections digitization in conjunction with social justice priorities. These interests are clearly expressed in the program values and positively benefit the preservation of and access to more representative digital collections and records. CLIR’s resource materials for applicants were praised highly for their clarity, comprehensiveness, and approachability and for being readily usable and accessible. The expanded membership of the review panel represented expertise well-suited to evaluate the new group of applicants and proposals received. Overall, program accessibility, the appeal of the call for proposals that emphasized underrepresented perspectives in collections, and the continuing support for digitization was welcomed and well received. Even among those interested in the program but who elected not to submit applications, more than half hoped to submit applications in future competitions if given the option.

Selected areas for attention and recommendations

Alongside these positive elements, we identified areas in which the program would benefit from further attention as it moves ahead:

Allowed activities. While the current focus on digitization was popular, stakeholders also noted that DHC:AUV should consider designating support for reparative description or redescription of collections.
Applicant support. The current applicant support mechanisms rely on direct email support; these are appreciated, but more tailored, one-on-one direct support is needed.
Review process. We identified specific areas of need, including public library expertise; guidance materials for reviewers; and direct, specific items for actionable feedback.
Award process. This part of the program was generally effective, but we raise notable concerns about DHC:AUV’s approach to intellectual property and collection ownership. We recommend that the program move toward reciprocal notions of ownership and access that respect community sovereignty and expertise.
Program “voice groups.” If DHC:AUV aims to serve underrepresented narratives in the digital historical record, we identify community narratives, or “voices” that appear to be of particular value to the program as it is currently implemented.
Program administration. To the extent possible, we recommend that DHC:AUV explore the option of creating an additional program officer or a program manager role—someone who can steer the review process and offer increased direct support to applicants.

We conclude the assessment with optimism about the program’s possibilities but also with an awareness of the significant work required to maintain and improve such funding programs. We note the high enthusiasm for increased support of community-based memory initiatives that will diversify the historical record and make that record more digitally available. At the same time, the assessment reveals challenges of funding digitization projects in cultural heritage: the significant time required for design, implementation, and management of multiyear programs; the limitations of project grants; and the challenges of making incremental yet responsive changes within a longstanding program.

The project revealed enthusiasm for and potential of the future of DHC:AUV, but more broadly, we see further potential to increase equity in funding programs and representation of community stories in the digital historical record.

Find and replace data in the shell

2022-09-11T00:00:00+00:00

The following demo was created for a specific task, but it might be a useful model for similar tasks: searching out specific patterns of text in a set of files and making replacements or edits according to a specific pattern. In this, case, I had a specific problem that I wanted to fix while updating this site: in a series of posts, I had used a particular text string, but I had to update and change the text string. This site runs on a static site generator platform called Jekyll, which reads through a series of plain text files formatted in markdown and generates the html to display each page on this site. Over the summer, I had created a template for introductory text on certain posts (first called toc_mapping_humanities_data.md). Jekyll provides a capability that allowed me to include and this boilerplate text so that it appears in any page on the site where I put the code to reference it. I had referenced this in a handful of posts, but I soon realized that in fact “mapping humanities” wasn’t really correct; the main point, in fact, was humanities data curation, so I renamed that template intro_humanities_data_curation.md. Now, I needed to update each post where I had included the template with the new template title.

It would be possible to make these replacements by hand, but it seemed like it would be quicker if I could ask the computer to do this for me. I also thought that it might be useful to show how to use the command shell to make this update in a quick and consistent way. The basic steps would be:

create a way to match the text that I wanted to change
identify the files with the text to change
look through the files and replace the old text with the new text.

Each of these steps can be done with a command shell tool: the first using a regular expression, the second using grep and find, and the third using a file editor like sed, which can use the regular expression to make the changes.

Match the text to change

Each of the files where I used the introductory template included the following line:

{% include toc_mapping_humanities_data.md %}

This line told the Jeykll site generator to insert the template at this point. To include the new template, I needed to replace the above with the following line:

{% include intro_humanities_data_curation.md %}

To do this, I created the following regular expression, which would match the portion of the line that I wanted to match. Using grouping, I could tell the command which parts I wanted to change:

(include )[a-z_]*(.md)

Identify the files and make the changes

Now, grep and sed come into the picture. I used grep to identify all of the lines where the include command occurs:

grep 'include [a-z_]*.md' _posts/202[012]-[01]*.md

To make the replacement, the sed command. sed is a “stream editor” designed to replace matches in input based on pattern matching like regular expression.

To test sed, for example, we can pipe in a specific input:

echo "{% include toc_mapping_humanities_data.md %}" | sed -E 's/(include ).*(\.md)/\1intro_humanities_data_curation\2/'

This is useful for testing the regular expression patterns, too. The -E option specifies using extended regular expressions, which are necessary to use things like group matching. In this case, the parentheses in the first pattern set groups, they are then references with numbers starting with 1 in the replacement string (i.e., \1 and \2).

So the sed command may be:

sed -E 's/({% include ).*(.md %})/\1intro_humanities_data_curation\2' _posts/202*.md

This will publish all of the modified stream (the file contents) into the terminal window or display. If you review the contents, the change has worked. Now, it needs to be rerouted into the files, if you want to make the actual changes.

Find the files, make the changes

To do this, we can use the find command to make a specific search for the files, then run the sed command on each of these:

find _posts -type f -name '202[012]-[01]*.md' -exec sed -i '' -E 's/(include ).*(\.md)/\1intro_humanities_data_curation\2/' {} \;

The above illustrates some of the power of the find command. Here it searches for each of the markdown files with a name meeting the specific search parameters, then the -exec option runs the sed command with the regular expression created above.

Aside: This find construction uses what looks like a regular expression to search for certain files, but in fact this a different sort of pattern matching. While similar to regex, this is an example of a “file expansion” search also known as “globbing,” which approach pattern matching similar to regex but specific to searching file paths (see Globbing). To use a regular expression in find, the Zshell uses the -regex option. While a more limited pattern matching option, globbing can do a lot. For example, to find posts that don’t inlude ones posted in August, try the find command:
find _posts -type f -name '202[012]-[01]-[01234567]*'

So to sum up: Using this one line of commands, developed piece by piece, I quickly updated all of the include statements for all of the posts on the site.

Wrangling Humanities Data: An Interactive Map of NEH Awards

2022-07-17T00:00:00+00:00

This is a post in my occasional humanities data curation series, which outlines humanities data curation activities using publicly-available data on grants and awards made by the National Endowment for the Humanities (NEH). A subset of the series focuses on mapping the data and creating geospatial data visualizations.

For reference, here are the other essays in the series:

Metadata: Origins and Metaphors (10 September 2024)
Encoding Reparative Description: Preliminary Thoughts (17 September 2023)
Find and replace data in the shell (11 September 2022)
Wrangling Humanities Data: An Interactive Map of NEH Awards (17 July 2022)
Wrangling Humanities Data: Using Regex to Clean a CSV (27 February 2021)
Wrangling Humanities Data: Exploratory Maps of NEH Awards by State (22 January 2021)
Wrangling Humanities Data: Cleaning and Transforming Data (19 January 2021)
Wrangling Humanities Data: Finding and Describing Data (20 December 2020)

This installment uses the geospatial dataset previously created and describes how to display the data in an interactive map on the web. As in the previous post, you can also download a version of this post from the GitHub repository along with all of the data discussed here. File references discussed below are included in the same neh-grant-data-project repository.

Mapping the data on an interactive, web-based map

Although my previous essays have explored various data-related topics, this post continues a theme of mapping humanities data. Previous installments walked through the process of preserving, transforming, and visualizing this data, which is a list of grants awarded by the NEH during the 1960s (the agency’s first five years). This post demonstrates how to create interactive, web-friendly maps more familiar to everyday users. The goal is to plot the grant information from the 1960s on a map background that allows zooming in and out, repositioning, and the option to click on each point to get information about the referenced grant.

Mapping the geojson

Rather than building the map from scratch (as previously), this demonstration uses widely used, open, and pre-existing code libraries to generate the map with the desired features. The primary tool is Leaflet.js, a javascript library that will knit together the map tiles and the grant data.

Leaflet provides a library of tools, written in javascript, that help to display geospatial information via a web browser. By combining this library, which builds on common frameworks and code approaches to web publishing, with the geospatial grant data, we can create, display, and share the map. The elements that power the map include:

an HTML file that will pull in leaflet and create the basic page framework for the map
a CSS file to style and modify the display of the map
a javascript file that will use the leaflet tools to extract and load the geojson data

A similar process is outlined in a clear and approachable way by Kim Pham in “Web Mapping with Python and Leaflet,” Programming Historian 6 (2017), doi:10.46430/phen0070. Pham’s tutorial starts by setting up all of the map elements in one file (with a different dataset), then splits them out into three files as below. If you want to see how this would look in one file, refer to the above tutorial.

Because the html and css elements of this step are the shortest, we will go through those first. Below is a walkthrough description of each file.

HTML

The first file is an html file (basic-map-neh-1960s-leaflet.html). In the section, this file calls the necessary javascript and css files that are required for operating the Leaflet library. The header also pulls in a custom css file for our map (see the next file walkthrough).

In the section, the html includes only one empty div with an id="map" attribute. This tag is all that we need to provide the page information that leaflet can hook the full map on.

Finally, just before the closing tag, the file references the javascript file that we will use to create the map (see the subsequent file walkthrough).

     rel="stylesheet" href="https://unpkg.com/leaflet@1.8.0/dist/leaflet.css" integrity="sha512-hoalWLoI8r4UszCkZ5kL8vayOGVae1oxXe/2A4AO6J9+580uKHDO3JdHb7NzwwzK5xr/Fs0W40kiNHxM9vyTtQ==" crossorigin="" />
    
     rel="stylesheet" href="basic-map-neh-1960s-leaflet.css" />
    
     id="map">

CSS

Next is a short css file (basic-map-neh-1960s-leaflet.css). This file provides styling information to the browser about how to display the map. Most important is the information for the map id, which provides instructions to the browser about where to display the leaflet map.

body { 
    margin: 0; 
    padding: 0; 
}
#map { 
    position: absolute; 
    top: 0;
    bottom: 0; 
    width: 100%;
    height: 750px;
}

Javascript for leaflet

The file that pulls this all together is the javascript that calls the leaflet functions to create the map, load the grant data, and creates individual markers for each grant point (basic-map-neh-1960s-leaflet.js)

The file opens by calling a function via window.onload. This means that each time the window is loaded (or reloaded), the browser will execute the instructions to draw (or redraw) the map.

Next, we create a basemap variable (var basemap), which provides information about the underlying map layer. In this case, we draw the tiles from the Open Street Map project and provide attribution.

Then, the $.getJSON command loads the geojson data (created previously). Using leaflet’s functions (they are recognizably prepended with L.), the file gives instructison for parsing each geojson element into points. The majority of this section is a series of filters that provide information for displaying the text (e.g., how to display the integers as US dollars) or correcting missing information (such as unlisted Institution fields). At the end of this block, in the layer.bindPopup statement, a formatted string creates the text of the popup for each grant on the map.

Finally, leaflet is instructed to draw the map. The view is set (zoom level and the latitude and longitude to center the view). And, the basemap and geojson are added as layers to the interactive map.

The next section explains how to use python locally to view (serve) these files and explore the interactive map as it could appear if published to the web.

window.onload = function () {
    var basemap = L.tileLayer('http://{s}.tile.osm.org/{z}/{x}/{y}.png', {
        attribution: '© OpenStreetMap contributors'
    });

    // retrieve the geojson data
    $.getJSON("neh_1960s_grants.geojson", function(data) {

        // set popups for each point
        var geojson = L.geoJson(data, {
            onEachFeature: function (feature, layer) {

                // set content formatting to format and correct missing information for popups
                if ( feature.properties.Institution == null ) {
                    feature.properties.Institution = 'an unaffiliated, independent scholar'
                }
                if ( feature.properties.Participants == null ) {
                    feature.properties.Participants = 'unlisted'
                }
                if ( feature.properties.ProjectTitle == null ) {
                    feature.properties.ProjectTitle = 'unlisted'
                }
                let dollarUS = Intl.NumberFormat('en-US', {
                    style: 'currency',
                    currency: 'USD',
                })

                // create the popups
                layer.bindPopup(
                    `In ${ feature.properties.YearAwarded }, ${ feature.properties.Institution } (in ${ feature.properties.InstCity }, ${ feature.properties.InstState }) was awarded ${ dollarUS.format(feature.properties.AwardOutright) } for ${ feature.properties.AppNumber }">NEH project number ${ feature.properties.AppNumber }.

Project Title: ${ feature.properties.ProjectTitle }
Project participants: ${ feature.properties.Participants }
NEH Program: ${ feature.properties.Program }
NEH Division: ${ feature.properties.Division }
`
                    );
            }
        });

        // set up the map, set viewport
        var map = L.map('map')
            .setView([37.90, -94.66], 4); //continental US view

        basemap.addTo(map);
        geojson.addTo(map);
    });
};

Getting it up and running

After files are created, we can use the python HTTP server to display the files locally and see how they may display on the live web. For this, use python3 http.server module. (Note: this should only be used locally for testing, not in a production environment.)

To display (and “run” the files) the files with python, open a command shell, navigate to the location where the files are stored, then run:

python -m SimpleHTTPServer or in python3 python3 -m http.server.

You can specify a server port for these if you like, or you can use the default. When the server starts, you will see something like this displayed in the shell:

Serving HTTP on 0.0.0.0 port 8000 ....

Note the port number. Now, open a web browser. In the browser’s location bar, use the port number (rather than a URL) to display the local server. For example, if as above the port is 800, you would request the following address in the browser:

localhost:8000/ or 127.0.0.1:8000

If the files are working correctly, you should see something like a map dotted with blue markers representing the grant data.

Resources

Leaflet.js documentation

A Brief Update: Back to Michigan, Back to Academia

2022-05-12T00:00:00+00:00

I’m happy to announce that in the fall (September 2022) I’ll be joining the faculty of the University of Michigan School of Information (aka UMSI) as a Clinical Assistant Professor with a teaching focus on digital curation, digital preservation, and archives.

Old Academia, New Teaching Opportunities

It’s an exciting opportunity, and I’m looking forward to joining the work of advancing the innovative UMSI curriculum, which has already pioneered digital skills development for early-career librarians, archivists, and other information professionals. While I have some concerns about returning to academia in general—the inherent elitism and classism, as well as the overheated (and under-acknowledged) prestige economy don’t align well with my personal values—the opportunity to join UMSI specifically presents a chance to facilitate training for archivists and digital curators.

I have a longtime connection to the University of Michigan. It is my alma mater, and notwithstanding the shameful and disappointing sexual misconduct (to put it diplomatically) in the news recently, I remain optimistic that the university as a social institution still has potential to make the world better through education and knowledge production. I hope to be part of the effort to move the U and its culture in ethical, egalitarian, and inclusive directions.

I have a great regard for many of the positive social and cultural values that I see in the state universities of the Midwest, including educational breadth, increasing access to knowledge for more people, and sharing critical thought, culture, and knowledge through teaching. Aside from the problematic settler colonialism behind the Land-Grant College Acts, the fundamental focus of the state university on service to all people in the state appeals to me. That said, University of Michigan in fact predates the federal land grant legislation, having been envisioned through the 1817 Treaty of the Rapids with the Council of the Three Fires, which stipulated the university should educate the descendants of the land givers, the Anishinaabeg and Wyandot. Although the institution has not yet realized that potential (though land acknowledgments have become more frequent and respectful), I believe that we can build a more ethical institution that stewards and answers the call to serve, educate, and inform the community of all people in Michigan.

I remain an idealist at heart and will be working to serve the values that align with mine, which are very clearly at the heart of UMSI’s work. The school’s current mission is to “create and share knowledge so that people will use information—with technology—to build a better world.” Among UMSI’s list of core values, I hope I can particularly support:

Pursuit, integration, and respect of diverse intellectual perspectives
Being intellectually adventurous and creative
Public access to information
Civility and respect in public discourse
Public undergraduate and graduate education as a path for increased social and economic mobility, in particular for student populations historically underrepresented in higher education
Passion and engagement
Diversity, equity, and inclusion

Teaching in a program like the one at UMSI is a chance to create a digital archives and curation curriculum that allows students to build on the digital and technological foundations that UMSI has invested in through its curriculum over the last decade, and to apply those skills to work in archives, digital curation, libraries, and other cultural heritage spaces. This type of curriculum development is something to which I have already contributed in my teaching at the University of Maryland, where I began building a curriculum of curation skills and tools for digital collections, and in my work at the Library of Congress, where I began a Library Carpentry series. I look forward to expanding those learning opportunities in this new role at UMSI.