Digital Humanities


King's College London, 3rd - 6th July 2010

[Image: KCL Photo Collage]
[Image: London Photo Collage (Somerset House; Globe Theatre; Millennium Bridge; Tate Modern)]

Historical Interpretation through Multiple Markup: The Case of Horatio Nelson Taft's Diary, 1861-62

See Abstract in PDF, XML, or in the Programme

Garfinkel, Susan
Library of Congress, Washington DC, USA

Heckscher, Jurretta Jordan
Library of Congress, Washington DC, USA


Now fifteen years old, the still-growing American Memory Web site at the Library of Congress ( offers more than 130 separate and diverse multi-media collections — comprising more than a million digitized library items — to a vast base of virtual patrons across the Internet. Yet despite radical changes in the ways that people have come to use the World Wide Web, the fundamental conception of American Memory and its collections remains much as envisaged fifteen years ago. What happens next to such established but largely static digital resources? Aside from implementing obvious upgrades such as higher-resolution image scans, cleaned-up OCR, fleshed-out metadata, or faceted search capabilities, how might cultural repositories more fundamentally enhance their existing online content? Can we fully (re)imagine a second iteration, a next generation, for already digitized historical materials?

Our demonstration project works to consider this question for a single American Memory collection, purposely setting aside issues of scale or interoperability in favor of exploring effective and compelling ways to convey to users the particular character of specific historical resources. Starting with the unpublished manuscript diary of Horatio Nelson Taft (1861-62), already digitized (as textual transcription with page images) at, we have chosen to explore the implications of creating several alternate and explicitly interpretive frameworks by applying multiple XML markup to this bounded but compellingly dense and historically significant text.

American Memory patrons bring unusually diverse research needs to the same body of historical materials: typical users include everyone from elementary school students and hobbyists to lawyers, librarians, college professors and members of Congress. Our project’s concerns stem from this diversity as well as from our own role as scholars of American culture working with historical materials in a library setting. Traditional library practice treats sources as analytically separate from their interpretation, even while the most basic interventions of librarianship are fundamentally interpretive. Historians, by contrast, well recognize their own work with sources as interpretive, but still tend to view textual sources as fully fixed in meaning.

Textual analysis is an underdeveloped tool in American historical study, and history as a discipline has lagged behind literature in imagining computing as a threshold for innovative research. Historians’ traditional use of computing has trended quantitative rather than qualitative, while recent innovations emphasize creating tools to assist the mechanics of research or scholarly interaction rather than transforming them. Explorations of how computing might fundamentally change the practices of history or the outcomes of historical interpretation are still needed.

Against this background the Taft Diary project takes on several related goals: (1) to visibly demonstrate a historical text’s accessibility to multiple simultaneous interrogations enacted through digital scholarship; (2) to more fully explore the multidimensionality of a significant historical text; (3) to introduce the methods and questions of digital humanists more centrally into a library context (making them better known to library practitioners and more widely available to diverse library audiences); and (4) to meld the questions and methods of historians with the advances in digital textual scholarship arising among literary scholars. With our project we hope to establish a model for text-based scholarship — literary and ethnographic as well as historical — that foregrounds the individual user’s interpretative needs while also conveying that interpretation to the broad variety of the diary's potential users.

The artifact and its text

The Horatio Nelson Taft Diary commands wide historical interest. When it was introduced to the public on the Library’s Web site in 2001, it offered the first new information about the events of Abraham Lincoln’s death to come to light in half a century. In fact the diary emerges from the confluence of multiple transformative historical developments: not only the American Civil War and Lincoln’s presidency, but profound long-term changes including the rise of the American middle class; the nation’s industrial, technological, and consumer revolutions; the decisive expansion in the size and influence of the federal government; and the maturation of Washington, DC, as a complex urban community of national and international importance.

An exceptional historical record on many levels, the Taft Diary is also unusually well suited to an experiment in simultaneous multidimensional interpretive markup. Its contents are rich in the overlap of ordinary life and significant events, people, and places, thereby appealing both to specialized historians and to a broad general audience. Its limited size offers definite boundaries to the range, though not the depth, of interpretive markup. Moreover, the formulaic pattern underlying most of its daily entries invites ready comparison while ensuring an organizing degree of structural regularity.

Figure 1: Taft Diary vol. 1, entries for Sept. 19-21, 1861; transcription of Sept. 20 entry

Taft’s three-volume manuscript diary consists of daily entries from January 1, 1861, through April 11, 1862 (the end of vol. 1), and less frequent entries through May 30, 1865. The current phase of our project deals only with vol. 1, which contains in total 466 entries. Because Taft used a printed blank diary book for vol. 1, each 1861 entry is eleven handwritten lines long, or typically ninety to one hundred words (fig. 1). (The volume’s 1862 entries, written into pages intended for back matter, vary more in length.) Beyond the consistency of the entries’ length, certain content elements recur so frequently as to be almost formulaic: information about the weather, illnesses within and beyond the family circle, political and military news about the Civil War, and the people and locations that populate Taft’s daily life.


In selecting markup schemas, we considered and discarded a number of interpretive typologies, including Library of Congress Subject Headings (LCSH) ( and the Library of Congress Classification Outline (, because they turn out to lack a consistent level of granularity across cultural and historical domains. Seeking to demonstrate applicability and variety, for phase one we selected three schemas for their breadth as well as depth, their flexibility, and their inherent relevance to the content of our text:

  1. TEI,
  2. Outline of Cultural Materials (OCM),
  3. Scholar-defined “webs of significance” such as:
    1. Taft’s recurring concerns
      1. weather
      2. places
      3. illness
    2. networks
      1. of social relationships
      2. how news is conveyed

These three schemas together provide for theoretical and well as methodological diversity, representing a variety of approaches to the analysis of historical texts. TEI is a widely used standard for the representation of texts in digital form, with well-developed editorial standards and an established community of users. The OCM is anthropological, deriving from attempts to develop comprehensive categorizations of human cultural phenomena. Scholar-defined “webs of significance” follow from our own close readings of the text. In the future we anticipate exploring additional markup typologie based in the Thesaurus for Graphic Materials (, GIS, and data visualizations, so as to demonstrate the diversity and versatility of interpretation that simultaneous multiple markup can sustain.

Our baseline text for markup is the transcription of Taft’s vol. 1, which was originally provided in SGML mapped to the American Memory DTD. Our own markup reverses normalizations in transcription in favor of a diplomatic version, and strips out the older SGML tags.

Figure 2: Text of Taft Diary vol. 1, Sept. 20 entry showing conceptual phrase-level OCM markup

Implementation raises methodological and technical challenges. In our process we must move from theoretical issues to the development of standards for each tag category, exploring the dilemmas that intensive and self-created markup entails: for example, whether phrases or sentences should be the unit of markup with OCM tags (fig. 2). Resolving such questions requires us to confront the practical challenges of creating markup conventions originally based in theory rather than established practice. Theoretical language about multiplicity is inspiring — and even starts to get at the truth of lived experience — but in practice we must also make sound choices about how to complete the markup with reasonable consistency so that others may use and rely on it. Further, do we mark the same text multiple times, or invest ourselves in some version of a standoff markup (see the XStandoff toolkit, for example)? How do we best present multiple interpretive options to our audience? At this level, the Taft Diary project is still a work in progress.


We anticipate that simultaneous multiple markup will render the text dynamic, surfacing its performative and ritualized aspects; and that it will invite new attention to its literary and linguistic aspects. Markup can also foreground key aspects of worldview of which the writer himself was unaware. In these and other ways, simultaneous multiple markup becomes a new instrument of historical interpretation, reimagining analysis beyond the realm of narrative prose.

While our technical questions are not yet fully answered, and the project itself is ongoing, we seek to share important methodological and theoretical questions, and our conclusions so far, with our colleagues in digital humanities.


© 2010 Centre for Computing in the Humanities

Last Updated: 30-06-2010