Introduction

According to a study by the Pew Research Center in 2018, approximately 60 percent of all American adults believe that higher education in the USA is “heading in the wrong direction” [33]. Many of those surveyed desperately want their own children to attend college, but respondents generally acknowledged that higher education seems to have become simply a private good for individuals, satisfying their personal quests for knowledge and helping them become employable. They expressed little confidence that higher education, especially in the humanities, could contribute measurably to the public good by addressing the grand challenges—the “wicked problems”—that stand in the way of human flourishing. These problems include inequality, racism, refugees and forced migration, food scarcity, pandemics, and climate change.

Just as the perceived value of higher education has fallen, so too in the digital age has the search for truth, the ability to weigh evidence, and the process of creating knowledge. As a result, higher education simply cannot rehabilitate itself, address the grand challenges, and serve the public good without a credible and reliable digital information infrastructure. Creating such an infrastructure and making it easy to use, affordable, and widely available to help address complicated social problems is itself a grand challenge.

Partly envisioned by Vannevar Bush in his prescient 1945 Atlantic article, “As We May Think,” components of a digital infrastructure for scholarship slowly emerged in the decades following World War II [37]. Development accelerated with the founding of the World Wide Web in the early 1990s and received a major boost in the USA in 1994 with the national Digital Libraries Initiative, jointly funded by the National Science Foundation (NSF), the Defense Advanced Research Projects Agency, and the National Aeronautics and Space Administration. Commercial growth of search engine, electronic commerce, social media, and other related services filled in important elements of the infrastructure.

Meanwhile, guided in part by the so-called Atkins report in 2003 on “cyberinfrastructure” [8], NSF, the National Institutes of Health the Alfred P. Sloan Foundation, and other funders invested in solutions to address a variety of storage and computational needs of scholars in the sciences and social sciences. Similarly, the report of the American Council of Learned Societies on Our Cultural Commonwealth [4] provided a framework for public agencies, such as the National Endowment for the Humanities and the Institute of Museum and Library Services, and private foundations, such as The Andrew Mellon Foundation, where I served as program officer from 1999 to 2019, to help seed and catalyze additional infrastructure developments tailored to scholarship in the humanities and the humanistic social sciences. Australia, the European Union and individual countries in Europe, as well as nations in Asia, South America, and Africa have also made significant contributions to the development of a viable digital infrastructure for advanced research and teaching.

In this article, I take the perspective of a former funder and advance the thesis that these investments, combined with those of scholars, technologists, librarians, archivists, and their institutions, have resulted in a digital infrastructure in the humanities that is now capable of supporting end-to-end research workflows. By “digital infrastructure,” I follow the Atkins report and Our Cultural Commonwealth and mean in this essay to denote the collection of standards, software, digital content, and expertise that directly supports scholarly research [4, 8]. Because “infrastructure” is a relative term, the scholarly infrastructure discussed here in turn depends on deeper layers of support. At one level, there are platforms of various kinds for digital search and messaging; other levels include networking and storage protocols and technology. These other layers of digital infrastructure are not directly addressed in this essay.

To help illustrate what has so far been achieved, I refer in the next section to key developments in the epigraphy and paleography of the premodern period. I draw primarily on work in classical studies, which is the focus of this issue, but I also highlight related work in the adjacent disciplines of Egyptology, ancient Near East studies, and medieval studies. In doing so, I am not suggesting that these various fields are more advanced in their digital capabilities than other disciplines in the humanities. Such a conclusion would require detailed comparisons that are beyond the scope of this essay. On the other hand, I am also not declaring “mission accomplished.” Even without detailed comparisons, it is safe to say that the capabilities of the infrastructure remain unevenly distributed within and across disciplines, institutions, and regions. Overcoming these “digital divides” in inclusive and equitable ways is imperative and an ongoing challenge [13, 19, 57, 84, 93]. Moreover, the components, including the links between steps in the workflow that I outline below, are generally far from user-friendly and seamless in operation. Because further refinements and additional capacities are still much needed, I conclude in the final section with a discussion of key priorities for future work.

Premodern studies and the humanities research workflow

In 1967, James McDonough, the director of the Office of Humanistic Research at Saint Joseph’s College in Philadelphia, took stock of the role of computers in classical studies [80]. One of a set of academic fields focused on the premodern period, which had ended by approximately 1500 CE, classical studies is the branch of the humanities that seeks to understand the culture and society of the ancient Mediterranean world from the Bronze Age (approximately 3000 BCE) until Late Antiquity (ending about 600 CE). McDonald noted the pioneering work of Father Roberto Buso in Italy. Conceived in 1946 and well underway by 1969, Buso’s project aimed to produce a computer-generated concordance of all the works of Saint Thomas Aquinas. Other computer-based initiatives that McDonough mentioned in his inventory included statistical studies of Plato’s dialogues and variant manuscripts of the Greek New Testament, the indexing of The Corpus of Latin Inscriptions, and analyses of metrical word types in the Iliad, Latin hexameter in the works of Vergil, and the syntax of Cicero’s letters. Because of the encouraging results of these and other works, McDonough observed that the International Congress of Classical Studies had already decided to devote its entire meeting in 1969 to the topic of computers and the classics. He then concluded his review with a rousing call to all classicists to appear at the International Congress “with specific plans for international cooperation in computer studies of the entire body of classical literature.”

McDonough and his fellow meeting attendees soon realized the magnitude of the task he had outlined. Subsequent reviews of the field by Solomon in 1993 [102], Hardwick in 2000 [61], and Babeu in 2011 [10] have documented considerable progress toward McDonough’s goal, but his early call to arms affirms a fundamental tenet that underlies research not only in classical studies, but also in most disciplines in the humanities and the humanistic social sciences. That axiom is that the evidence contained in primary sources—in McDonough’s words, “the entire body of classical literature”—is the fuel for scholarship and the growth of knowledge about human culture and society.

In his NEH-sponsored Jefferson lecture in 2019, Father Columba Stewart further affirmed and amplified this tenet. Stewart is the director of the Hill Museum and Monastic Library (HMML) at Saint John’s University in Minnesota. HMML has amassed one of the largest collections in the world of digital copies of endangered manuscripts from Europe, the Middle East, Africa, and Asia. “The opening word of Saint Benedict’s Rule,” Stewart observed in his lecture, “is, appropriately, obsculta, ‘listen’.” To learn, it is necessary to listen carefully and with humility, especially to those, past and present, whose voices go unheard. Referring to local communities with endangered manuscript heritage, Father Columba went on: “Our team at HMML has worked with them to ensure that their deposits of wisdom, their libraries of handwritten texts, the voices of their past, can join the global conversations of the digital era. And we do it side-by-side, as equals” [104].

The path to knowledge may begin with these “deposits of wisdom,” but there is much more to the research process. There is a rich body of work by Unsworth [113], Borgman [29], Palmer [88], Hughes [21, 63], Antonijevic [5, 6], Almas [3], their colleagues, and many others who have observed and modeled information-seeking behavior and scholarly practice. Figure 1 draws on this work and offers a schematic of a basic set of functions that comprises a generalized research workflow whereby scholars and others build knowledge in the humanities disciplines. In this representation, they begin by collecting relevant primary and other sources. Then they organize and catalog them; transcribe and translate them as necessary; identify key entities within the sources; analyze and interpret the accumulated materials; and publish the findings. This process applies to research that may result either in a scholarly edition with an essay and critical apparatus or a synthetic work of scholarship such as a journal article, a monograph, or other kind of work.

Fig. 1
figure 1

“A generalized humanities research workflow.” It contains six circles to represent the elements of the workflow. They are labeled from left to right: Collect, Catalog, Transcribe/Translate, Identify, Analyze/Interpret, and Publish. The Collect and Catalog circles are enclosed in a dotted box, which is meant to depict the functions typically associated with libraries

Archives and libraries have long played a critical, supportive role in this research process, mainly by focusing on the “collect” and “catalog” functions, as highlighted by the dotted line in Fig. 1. Serving as intermediaries and partners in the knowledge-building process, they collect and aggregate sources, describe them, and provide a catalog that helps researchers find and gain access to relevant source materials for the collecting and cataloging work that they each undertake in their own individual or collaborative group projects. How well or poorly libraries and archives assist in the research process depends, at least in part, on the extent and coherence of the sources they have collected, the detail of the cataloging, and the nature of the repository and discovery systems by which they provide access to the items in the collection. In an increasingly digital world, a further gauge is how adroitly and reliably libraries and archives have created or adapted their “collect” and “catalog” functions for sources that are born digital or digitized.

This representation of the research workflow in the humanities is highly simplified and idealized. For other purposes, one might narrow the focus and emphasize a subset of these scholarly functions. Alternatively, one might well enumerate a broader set of functions, recognizing that each of those that I have identified represents a bundle of related and sometimes overlapping activities that merits fuller analysis and explication. Moreover, I fully acknowledge that scholarly workflow is messier and more complex in practice than I have represented it. The creation of knowledge rarely proceeds in a step-by-step, linear order. Instead, the process is more often recursive and branching as scholars discover key sources that they previously missed, realize that their analyses are incomplete or faulty, or uncover relationships that lead them in new directions.

Although the ordered set of functions I have identified may not be sufficient to represent all aspects of scholarly work in the humanities, it does comprise key components of a generalized research workflow and thereby serves a heuristic purpose in this essay. It provides a framework for systematically analyzing the digital developments of the last several decades. Where many scholars, administrators, funders, and members of the public have tended to see only a jumble of disparate, individual digital projects, reference to this set of functions can help reveal intensive programs within and across scholarly disciplines to extend and expand the knowledge creation process through the embrace of digital tools and content.

With considerable help from technical experts, including those in libraries and archives, and with funding from institutions as well as public and private funders, researchers have been thoroughly reengineering their workflows to accommodate the digital environment and to build what the European Science Foundation calls “research infrastructures in the digital humanities” [51]. Figure 2 seeks to represent this transformation with a focus on classical and other premodern studies. The second line emphasizes that sustainable change requires participants to engage in the difficult political and social processes of agreeing to general standards and best practices in each of the functional areas. Classical scholars have adopted digitization and other standard practices that have largely been developed elsewhere, but they have also crafted some, such as EpiDoc [40, 50] and the Canonical Text Services [25, 100, 109, 110], on their own to address the specific nature of textual, epigraphic and other sources in the field. The third line in Fig. 2 highlights examples of the products and tools with which researchers in these fields have been implementing these standards and practices and creating operational digital workflows. The features of some of these products, such as Perseus [44], span several of the functional categories. However, at the risk of giving them short shrift, I have discussed these products in the categories that reflect what I consider to be their most predominant functional features, thereby leaving room for me to review a wider range of content and tools and to illustrate the range and depth of digital development. Let us now examine each these “digitized” functions in turn.

Fig. 2
figure 2

“A “digitized” humanities research workflow.” It contains three lines. The first line depicts workflow functions in six circles. They are labeled from left to right: Collect, Catalog, Transcribe/Translate, Identify, Analyze/Interpret, and Publish. In six boxes, the second line depicts sets of digital standards applied to each of the six workflow functions. From left to right, the standards are listed as Digitization/Resolution/Color Balance; FRBR/Canonical Text Service; TEI XML/EpiDoc/OCR; EAC/SNAC; IIIF/W3C Web Annotation/NLP; and EPUB. In six boxes, the third line depicts digital applications for implementing each of the six workflow functions. From left to right, the applications are listed as Tropy/Omeka/Perseus/EAGLE; papyri.info/Perseus Catalog; OCRopus/Son of Suda Online; DPPR/Pelagios/Godot; Mirador/Virtual Worlds/Treebanks; STOA/Perseids/Digital Latin Library

Collect

Digital research in premodern studies mainly relies on manuscript texts, inscriptions, archaeological remains, and other primary sources that scholars, libraries, and others have collected for use by making a digital copy of either the original items, photographs of those items, or in many cases, previously published versions of them. One factor that has facilitated this digital collection process is that the original, premodern sources and many authoritative publications of them are free of copyright protection. Scholars in modern and contemporary history and literature, and other fields that rely on the use of sources that are encumbered by copyright restrictions, find the process of building digital collections more challenging.

Digitization typically begins with an image copy of the source. Digital imaging follows standard procedures that have become increasingly sophisticated and focused on producing images that reproduce relevant features of the original at the highest possible quality. These procedures cover all elements of the process: camera resolution, lighting, storage equipment, color management; file naming and other metadata requirements, viewing software; and quality control checklists [62].

For certain materials that are difficult to read, scholars have applied specialized imaging procedures. For example, Bruce Zuckerman and his team in the West Semitic Research Project used “raking” light for digitizing inscriptions on stone, and multispectral imaging for works on parchment and papyri [64]. Multispectral imaging also revealed the contents of the Archimedes Palimpsest [48]. Other investigators, such as Peter Der Manuelian and his Digital Giza colleagues, have used reflectance transformation imaging to capture inscriptions under different lighting conditions. They have also used photogrammetry, QuickTime Virtual Reality, and other scanning techniques to represent inscriptions on statues, sarcophagi, as well as on interior and exterior walls of tombs and other structures in a three-dimensional context [76, 77]. And in the Digital Restoration Initiative, Brent Seales and his colleagues are developing imaging techniques using X-ray-based micro-computed tomography to recover the text on papyrus rolls carbonized in Herculaneum by the eruption of Mount Vesuvius [89].

With powerful cameras standard in their mobile phones, individuals now also have in their pockets the ability create high-quality personal collections of manuscripts, inscriptions, and other objects critical to their research. The Center for History and New Media (CHNM) at George Mason University has developed a desktop tool, called Tropy, for individuals to store and manage these collections. It has also developed another relatively lightweight tool, called Omeka, for collaborative research collections [45]. Omeka is deployed widely, and Zuckerman’s West Semitic Research Project used it to create Inscriptifact, its collection of digitized inscriptions [64]. Other larger-scale collections supporting epigraphy include the Cuneiform Digital Library Initiative [112], and EAGLE [87]. EAGLE is a consortium of 19 partners from 12 different European countries designed to provide access to many collections of ancient Greek and Latin inscriptions through a single Web portal. For textual materials, the venerable Thesaurus Linguae Graecae (TLG), established by Ted Brunner in 1972, contains a digitized collection of Greek literature dating from Homer in the eighth century BCE to the fall of Byzantium in 1453 [34, 108]. In 1990, the Perseus Project began work on a digital library designed “to complement the textual focus of the TLG” [81].

Catalog

In addition to collecting their sources, scholars need to organize and catalog them so that they can find and use them effectively in their work. In some cases, the catalog process precedes and drives the digital collection process. To describe the sources and indicate the areas of relevance with an appropriate set of tags, they might use personal reference managers like Zotero, another CHNM application [45], or similar commercial products. Alternatively, they and librarians working with them might use cataloging tools based on international cataloging standards such as the Dublin Core Metadata Element Set, which comprises 15 key properties for describing resources [7]. A variety of repository applications, including Omeka, support the Dublin Core. Because it conforms to the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), catalogs of classical works that adhere to the Dublin Core standard can technically interoperate within the so-called semantic web of linked data. That is, when scholars, librarians, and others identify entities such as concepts or names of people and places in catalogs with uniform resource names (URNs), they can use standard web protocols to connect or “link” them together [86].

While these cataloging approaches may be sufficient for some research projects, primary sources in classical and other premodern studies—indeed, those in many humanities disciplines—tend to be complex bibliographically and not always easy to catalog. The main difficulty is that sources often exist in multiple versions, either as a feature of their original production or because they have been copied and disseminated over time. In their cataloging, researchers must trace and account for the provenance and reliability of the versions they are using. Because digitization produces yet additional versions in a medium where it is easy for copies to proliferate, cataloging them is both more complicated and essential.

A small but important field of classical studies is papyrology, which focuses on the social and cultural documentation about the ancient Mediterranean that survives on papyri. Papyrologists have devised one solution to the multiple versions problem. The University of Heidelberg in Germany manages the HGV, Heidelberger Gesamtverzeichnis der griechischen Papyrusurkunden Ägyptens, the primary database of detailed bibliographic descriptions of Greek and Latin documentary (but not literary) papyri texts. In the early 2000s, scholars in the field agreed to merge the HGV with two other key databases: the Duke Databank of Documentary Papyri, which was created at Duke University and contains online transcriptions of essentially the set of papyri, and APIS, the Advanced Papyrological Information System, which was established at Columbia University and includes digital images of papyri and related metadata. By uniting bibliographic information with images and transcriptions, the resulting database, known as papyri.info, provides a powerful tool for using this important corpus of primary sources [14, 17, 39].

More general solutions to the bibliographic issues associated with multiple versions are based on a framework that the International Federation of Library Associations devised in 1998. The Functional Requirements for Bibliographic Records (FRBR) distinguishes four key entities: A work is “a distinct intellectual or artistic creation;” an expression is “the intellectual or artistic realization of a work;” a manifestation is “the physical embodiment of an expression of a work;” and an item is “a single exemplar of a manifestation.” In other words, “a work is realized through an expression, an expression is embodied in a manifestation, and a manifestation is exemplified by an item” [67]. To make these distinctions concrete in relation to the multiple versions of sources used in classical studies, Alison Babeu offers this helpful example: “Vergil’s Aeneid is considered a work, Robert Fitzgerald’s original English translation is viewed as one expression of that work, printings by different publishers of the same Fitzgerald translation are different manifestations of that expression, and my individual copy of one of those printings is an item” [9].

In a detailed series of essays, Babeu has described how she and staff of the Perseus Digital Library have reshaped the Perseus Catalog [9, 11, 12]. With determined effort combined with systematic research and experimentation over approximately 15 years, they have “FeRBeRized” the catalog. Instead of maintaining a record structure that focuses on the “item-in-hand” of traditional cataloging rules, they have deployed standard uniform resource names (URNs) of the web to identify FRBR entities—work, expression, manifestation, and item—and make them accessible as linked data as part of the complex, multiple version universe of digitized primary sources in classical studies. As Babeu observes, classical scholars also rely on other kinds of defined entities, including chapters, sentences, or phrases, that appear within works but across different expressions, manifestations, and items. As mentioned earlier, classical scholars have devised a specialized architecture and set of Canonical Text Services, to mark these additional entities with URN’s and make them digitally accessible [25, 100, 109, 110].

Transcribe/translate

For textual sources that they have collected and cataloged, and for which they have only a photographic or digital image, scholars generally also need a digital transcription. Such transcriptions activate the texts for further computer-based analysis and use in the research process, including comparison with other texts. Depending on the linguistic scope of the research effort, a digital translation of the sources may also be necessary.

Developers of papyri.info created Son of Suda On Line (SoSOL), an editing tool to facilitate the digital transcription of papyrological texts [17]. Other resources for classical studies, such as the Perseus Digital Library, have also adopted SoSOL as a transcription and general editing tool [2]. For encoding the texts, these tools support the eXtensible Markup Language (XML) as well as EpiDoc, the subset of XML encoding guidelines that classical scholars have specifically designed for epigraphy [40, 50].

For small sets of texts in lesser-known languages, digital transcription may inevitably be a solitary, manual process for the scholars interested in those texts. One alternative may be to hire a commercial transcription service. Such services typically offer to key texts into a computer twice. The service then compares the two copies. If the copies are identical, then the transcription are assumed to be accurate; if they differ, an error has occurred and requires a correction. The Perseus Digital Library generated the digital transcriptions of almost all of its original Greek texts using this method of double keying [94].

A more promising technique for digital transcription is optical character recognition (OCR), a set of image processing techniques for segmenting printed or handwritten texts down to the character level, identifying the character in its linguistic context, and reconstituting the text in machine-readable format. Recent improvements in machine learning have made it possible to train OCR engines, such as OCRopus, against a known, accurate transcription to improve the accuracy of their transcriptions of additional texts in certain languages and orthographies. Because it is not (and probably never will be) fully accurate, the main advantage of OCR is that it is requires much less manual labor—and is therefore much cheaper—to identify and correct OCR errors than to transcribe a text manually from scratch. Bruce Robertson has usefully summarized recent applications of OCR in classical studies, indicating that one project has “generated 52,938,168 editable words of ancient texts, of which 10,237,171 are manually verified” [94]. Meanwhile, David Smith and Ryan Cordell have identified a research agenda intended to yield significantly better results from the application of OCR to a wide range of texts in historical languages, including Arabic, that are important to classical and other premodern studies [99].

Translation of ancient texts from early versions of Latin, Greek, and Arabic to languages in modern use is necessary to engage contemporary audiences in the lessons of premodern studies. Machine translation technologies are not well advanced to be of much use in this process. However, carefully structured crowd-sourced translation projects might help increase the corpora of accurately translated ancient texts. For example, at Tufts University, Marie-Claire Beaulieu has reported success in organizing students in her advanced Latin class into teams that work together under her supervision to translate previously untranslated texts. Not only does this kind of supervised crowdsourcing increase the corpora, but it also has substantial pedagogical value in both teaching advanced language skills and giving students the pride of authorship in newly produced works of original scholarship [2].

Identify

Another key task in the humanities research workflow is for researchers to identify named entities within the primary sources they are using. Such entities include people and organizations as well as places and certain calendrical designations. For sources that they have transcribed into machine readable formats, researchers in classical studies may be able to use advanced software from the natural language processing branch of computer science to assist in extracting these named entities [23]. However, because people, organizations, places, and historical periods are typically known by various formal and familiar names, disambiguating named entities from one another using available evidence is an essential, and often manual, part of the research task.

Archivists have recently developed a standard XML schema—the Encoded Archival Context (EAC)—for digitally representing people, families, and corporate bodies once researchers have unambiguously identified them [103]. An international cooperative of archives, libraries, museums, and other cultural heritage organizations has also created Social Networks and Archival Context, or SNAC, where users can search aggregated EAC-based entities and browse biographical information about them [101]. However, both EAC and SNAC build on a long research tradition in the humanities of prosopography. Prosopographies are biographical dictionaries that identify groups of actors in their historical context. Scholars in classical studies have created a variety of these dictionaries and have begun converting them to digital form and developing them further online. One of these is the Digital Prosopography of the Roman Republic [32]. In addition, in the Standards for Networking Ancient Prosopographies project, scholars have begun to create an aggregation of these online dictionaries and other similar resources, to which they can link to identify and disambiguate persons and person-like entities in their sources [26, 27].

Another longstanding research tradition in the humanities is the creation of gazetteers, or geographical dictionaries. These are useful because the same name can refer to different places, different names can identify the same place, and names and regional boundaries often vary over time. Moreover, references to places rarely conform to the standard geometries used today to mark geographic boundaries, but they do often provide the early and useful evidence of cities, towns, landmarks, and other spatially localized phenomena. For places in the ancient world, the Pelagios project has shown the usefulness and viability of collecting location references digitally across projects, based on various gazetteers of toponyms, and providing unique URN identifiers for them. For example, https://pleiades.stoa.org/places/59672 is a unique digital identifier for Alexandria Eschate in modern Tajikistan. It serves as the basis for digitally aggregating other references to that site and for ensuring that it is not confused with the more famous Alexandria in Egypt [69, 98].

In working with primary sources from and about the ancient world, researchers wrestle not only with the thorny problems of identifying people and places, but also with situating in time both the documents and the events to which they refer. In the simplest case, dates are easy to identify if the documents refer to a standard calendar system, such as the Julian calendar, which Julius Caesar established for the Roman world in 45 BCE. In other cases, dates are more difficult to establish because local calendaring practices can vary widely. For example, both before and after the adoption of the Julian calendar in the ancient western world, the documentary evidence indicates that some localities tracked time by reference to solar or lunar calendars, or a combination of the two, while others referred to the life span of a notable official or to the time elapsed since a momentous event. The creation of digital tools to assist in documenting and analyzing intersecting and overlapping chronological systems has lagged behind the development of online prosopographies and gazetteers. To help address this gap, researchers from Heidelberg University, Katholieke Universiteit Leuven and King’s College London have begun creating a tool called the Graph of Dated Objects and Texts, or GODOT [58].

Analyze/interpret

Scholars and academic institutions generally cultivate the impression that the “real work” of scholarship and the measure of its quality rest in the tasks of analyzing and interpreting the evidence at hand. However, as the previous sections indicate, these tasks depend on the quality and extent of significant prior scholarly work: collecting relevant primary and other sources; organizing and cataloging them; transcribing and translating them as necessary; and identifying key entities within the sources. The divisions of labor within classical studies or any other field of study may mean that researchers complete these other foundational tasks themselves or rely, at least in part, on the scholarly work of librarians, archivists, or others so that they can proceed to analyze and interpret their sources against the questions that have spurred their investigations.

Unlike the “how” questions that often motivate researchers in the sciences and engineering, researchers in the humanities generally focus on “why” questions. They seek to explicate and account for what the distinguished medievalist Stephen Nichols, following the philosopher Richard Rorty, calls the “critical intelligence” that underlies the imaginative, inventive, affective, ethical, political, and religious dimensions of human culture and society [83]. As researchers in classical studies and other fields increasingly apply digital techniques to these “why” questions, their work may qualify as “data analysis.” However, the term “data” applies not in the sense that information is quantitative, as it often is in the sciences and engineering, but rather in the sense that it is, like the quantitative information generated in scientific laboratories, primary source evidence for further investigation [49].

The primary sources for study in the humanities vary in type, including visual, spatial, and textual materials, and researchers generally tailor their analytical and interpretive techniques to the types of evidence they are using. In some cases, the digital nature of the source does not require the application of a specific digital technique. The collections of the variant versions of illustrated Roman de la Rose manuscripts, on which Nichols has focused much of his research, or of the Iliad and Odyssey in the Homer Multitext project [46] provide exemplary models of how extensive digitization can facilitate even traditional forms of scholarship. Simply having digital copies at hand often means that scholars can travel to consult physical copies only when absolutely necessary and instead can concentrate on comparison and analysis to interpret the significance of various works of human expression.

On the other hand, because the availability of sources in digital form offers more than a simple convenience, researchers have begun to adapt traditional analytical practices and take advantage of new digital affordances. For visual materials like statues and artifacts that may carry inscriptions, it is increasingly common for researchers to use three-dimensional digital images and models and use standard tools to view them online, rotate them, and zoom in on specific features for comparison in ways that would not be possible physically in the field or in a museum setting [76, 92]. In addition, the development of the International Image Interoperability Framework (IIIF) [68] has helped advance the analysis of visual materials. Relying in part on the World Wide Web Consortium’s Annotation Data Model [118], IIIF now defines a set of protocols that permits researchers to request digital images from a wide (and increasing) number of repositories that have adopted the protocols, and to examine, compare, and annotate them using conforming viewers, such as Mirador [114]. The Vatican Library has recently adopted IIIF for its online collections, and its curators have amply demonstrated the analytical power of using IIIF-tools in a series of essays called “thematic pathways.” One essay focuses on the illustrations in the Vatican’s collections of medieval manuscripts of classical Latin texts. These illustrations, as well as those in related collections elsewhere, are worth comparative study not only because of their intrinsic value as art objects. According to the author, the illustrations offer critical evidence of the reception of Latin texts in medieval times: “the iconographic study of these manuscripts also reveals that the images…were related mostly to the reading and occasional interpretation of the text by a specific reader, or by a particular scholar…”[36].

Digital tools and techniques have also enhanced the spatial analysis of primary sources in classical studies. For example, several scholars have recently deployed geographic information systems and Google Earth in their analysis of the Histories of Herodotus. The visualizations they created are not meant to demonstrate the obvious, but important, point that Herodotus’ conceptions of space differ from modern cartographic models. Rather, as the scholars argue, the use of these digital tools helps to describe the points of difference more precisely than would be otherwise possible and thereby serves to illuminate more clearly the spatial relationships central to the narrative of the ancient Histories [16]. Other tools that researchers have used for geospatial analysis include software for architectural modeling and virtual reality. Perhaps the most well-known efforts are those to reconstruct the ancient Roman Forum [47, 56, 70]. Among other scholarly benefits, these reconstructions make it possible to study monuments and their inscriptions in a larger, spatial context, rather than in isolation, as well as to test theories about their likely role in funerals and other ceremonies designed to inspire and mobilize the public [52, 71, 105].

Even more well-articulated in classical studies than digital tools for visual and spatial research are those for textual analysis, which are specific to the historical languages in the primary source documents but draw heavily from the advanced computational fields of corpus linguistics and natural language processing. At the core of these tools is a variety of Greek and Latin text databases, called “treebanks.” Developed painstakingly over many years and including the Index Thomisticus Treebank, which is based on the pioneering work of Father Busa, these databases contain machine-readable annotations about the morphology of each word as well the syntactic structure of each sentence in each corpus [41, 75, 90]. Given the extensive linguistic information now assembled in these databases, researchers have begun assembling the necessary tools for systematic text mining and semantic study [35] and started undertaking a growing number of studies tracing topics, themes, and phrases through the corpora [31]. One of the more recent of these studies analyzes the intertextual dynamics by which a poetic phrase in Vergil’s Aeneid became viral and was used to varying effect by later Roman Greek, medieval Christian, and even early modern English authors [42].

Publish

The publication of peer-reviewed research results marks the culmination of academic work in most fields of scholarship, including classical studies and others in the humanities. A common form of publication is the journal article, and most academic journals have by now adopted digital formats as their primary mode of distribution, even if they also still circulate a printed version. Classical studies boast one of the first journals in the humanities to publish only digitally. “In a field where reviews were often so long delayed that they appeared after the book had slouched off to the remainder tables,” the Bryn Mawr Classical Reviews appeared in 1990 and was designed electronically to deliver timely reviews of new work [85]. It is still operating.

Although the journal article may suffice as the primary form of scholarly communications in many fields, for most scholars in classical studies and other disciplines in the humanities, an extended argument in the form of a monograph is the gold standard. The promise of a digital as opposed a printed monograph is that the magic of the web would make it possible for the reader to engage directly not only the author’s reasoning but also the underlying primary source evidence adduced to support it. This promise has proved elusive but now seems within reach with growing acceptance of a general XML-based electronic publishing standard called EPUB [117].

For example, the California Classical Studies series produced seven online monographs since 2013 in subfields such as classical archaeology, papyrology, epigraphy, and textual studies using the EPUB standard [111]. In addition, university presses have recently started to build on the EPUB standard and expand their digital publishing capabilities. Stanford has begun publishing what it calls “interactive scholarly works,” including Elaine Sullivan’s digital monograph on the ancient Egyptian necropolis of Saqqara, which incorporates interactive three-dimensional visualizations [106]. Michigan’s Fulcrum platform offers a standard mechanism for managing online monographic source materials; and Minnesota’s Manifold system supports the iterative scholarly monograph, the argument of which evolves in response to reader commentary and new evidence [79, 116]. Classical and other premodern studies will surely benefit from these and other university press programs for the publication of digital monographs.

The critical edition is a third form of publication that is essential to researchers in classical studies. Critical editions are reliable, authoritative presentations of primary source evidence. A critical apparatus comprising a detailed essay and an extensive series of notes typically accompanies the source materials explaining their significance, identifying variant expressions and manifestations, articulating editorial choices, and defining difficult or unfamiliar phrases as well as references to people, places, and related works [54]. A documentary edition is an authoritative compilation and transcription of a set of letters, manuscripts, inscriptions, or other documentation, usually of historical value. A literary edition is a type of documentary edition that presents a literary text or related set of texts. The question of how to conceive and construct a critical edition digitally has attracted considerable attention [30, 53, 97], but the Homer Multitext Project [46] is a working example of a digital literary edition, while papri.info facilitates the production of digital editions of historical documents [14, 17, 39].

Founded in 1997, the Stoa Publishing Consortium published a variety of digital editions such as Suda On Line, a translation of a Byzantine Greek encyclopedia of classical learning originally created in the tenth century [43, 74]. The technical environment for Suda On Line inspired the development of a second-generation set of tools, called Son of Suda On Line, which supports the creation and publishing of digital editions in papyri.info and the Perseus Project [2, 17]. More recently, the Society for Classical Studies, the Medieval Academy of America, and the Renaissance Society of America have collaborated to create the Digital Latin Library, which has begun to publish and curate online critical editions of Latin texts [38, 65, 66].

With the emergence of a functional digital workflow, researchers in classical studies and other humanities disciplines have strived in these ways to produce digital analogs of the article, monograph, and critical edition, which mark the traditional end points of research efforts and represent the badges of success that scholars usually seek. However, it has not escaped the most digitally savvy researchers that these three kinds of publication are not the only and may not even be the most important research outcomes for the digital future of scholarly communications [24]. One scholar has observed that the digital research workflow has created a “distributed architecture” for publishing [54]; others have noted that it has contributed an “increasing diversity and complexity of content” to the scholarly record [72]. In other words, the digital processes described in the previous sections—collection, cataloging, transcribing and translation, identifying, and analyzing and interpreting—each lead to the creation of or contribution to key scholarly products, often in the form of specialized databases. These works make public and effectively publish knowledge that is of interest in and of itself to audiences beyond the researchers who compile it.

These new, digital forms of what might be called “intermediate” publications resemble the emerging processes of data publication in some sciences and social sciences, and they have depended on the rise of important new divisions of labor in the humanities research enterprise. Much as the application of machinery in the pin factory famously described in Adam Smith’s Wealth of Nations, the application of digital technology at each stage of the workflow has helped simplify and subdivide research activity, creating new points of entry (and requiring as yet undeveloped systems of credit) for students, faculty, and educated members of the general public to participate and generate useful research results without necessarily committing to the traditional final research product of an article, monograph, or digital edition. Important new kinds of specialist roles, such as data curators and scholarly communications librarians, have also emerged to support the process. In addition, the new divisions of labor have resulted in new organizational alignments, such as the growing interest of research libraries in their potential role as publishers, and the related emergence of the Library Publishing Coalition, which seeks to publish and maintain not only books and journals but also the outputs of scholarly projects ranging from digitized copies of primary sources to biographical databases [73].

Having used developments in classical and other premodern studies to illustrate the emergence of a digital infrastructure supporting research in the humanities, I now conclude this essay with a brief consideration of the priorities that could help shape the future progress of this infrastructure.

Priorities for future work

There is an apocryphal story about a young scholar in the humanities at one of the Oxbridge colleges long before digital media had become so important. She invented a new genre of print publication to present her work but was worried that her innovation would be too controversial. She therefore spent considerable time scouring the archives at the university and elsewhere to ensure that it was fully consistent with departmental and disciplinary practice. Fully assured at last, she took the idea to her senior colleagues. The presentation was elaborate and thorough, and she made sure to explain how she had fully searched the records of the last 500 years of scholarly communications in print and had found nothing seriously inconsistent with her proposal. One of the college dons interrupted her at this point, lifting his head wearily, and observed: “But you would agree, would you not, that the last 500 years have been somewhat exceptional?”.

Echoing the curt skepticism of this college don, a Yale professor recently favorably compared the so-called public humanities to what he decried as the “mania for digital scholarship” [82]. This kind of offhand dismissal of digital work is so common in the academy that it warrants more systematic analysis than is possible here. Such an account would have to acknowledge both the profound disagreements among scholars about the types of scholarly work they value and the complex mix of economic and political incentives that administrators and funders use to help drive research priorities. In addition, there has certainly been no shortage of hyperbole about the merits of the digital humanities and these claims deserve criticism. Even the survey that I have provided in the previous section may be more optimistic than is appropriate in depicting how far digital scholarship has matured since the pioneering days of Father Busa.

However, suffice it here to observe here that the objection of the Yale professor ignores the plain evidence that scholars in many disciplines and at many institutions have embraced both the public humanities and digital scholarship. Engaging publics beyond the academy in the grand intellectual challenges of the day is not at all at odds with efforts to retool the academy, enabling it to add contemporary digital media to the traditional set of tools it uses to communicate about those subjects both internally among scholars and with those publics. Indeed, the digital infrastructure in the form of content, tools, and human skills that has emerged is now demonstrably able to support serious, peer-reviewed, well-regarded scholarship that contributes to our understanding of the human condition. This accomplishment is especially notable at a time when the growing press of digital media and the threat of disinformation require an academy and a citizenry fluent in digital tools and content and well able to distinguish their use in creditable, evidence-based inquiry from their use by digital trolls to promulgate lies.

More can and needs to be done to strengthen and extend this digital research infrastructure and to ensure that it is inclusive in its reach and supports inquiry from a wide variety of perspectives and traditions. Rather than rushing to assemble all-encompassing, “big digital” platforms for the humanities, scholars have instead urged a step-by-step approach that pays close attention to functional requirements within specific disciplines [107]. There have been several attempts to define these needs in the field of classical studies [3, 24], and the other contributors to this issue amply illustrate the efforts to meet many of these needs. Here, in the context of the research workflow I have outlined, I emphasize three broad priorities that might inform future work and eventually lead to more general, cross-disciplinary solutions.

Improve interoperability

Given the sustained development over the last thirty years, the creation of new digital tools for classical studies and related fields does not hold the same urgency as it once did. However, significant gaps remain in the infrastructure that call for prudent investment in certain new or refined components. Perhaps most important are those features that facilitate interoperability of steps within the disciplinary workflow as well as of outputs from the workflow with external research processes.

As we have seen, the outputs of one step in the research workflow provide inputs to subsequent steps. For example, digitized source texts feed transcription applications such as OCR engines, and transcribed texts serve as input to both named entity recognition software and text mining tools. Similarly, research products intended for wide distribution must adhere to standard formats in broad use across disciplines or by the general public. Because the infrastructure of content and tools supporting the research workflow is relatively new, many incompatibilities exist, and attention is needed to reconcile them.

Fortunately, developers who have fashioned the digital workflow in classical and other premodern studies have strived for open data in the form of primary sources that are readily available at little or no cost to individual researchers. Relative freedom from copyright encumbrances has helped in this effort, as have institutions and their libraries able to cover the costs of access for their constituents, but vigilance is required to guard against profit-seeking commercial entities that may take control of the sources and charge exorbitant prices that would limit access and constrain research.

As we have seen, developers have also followed both linked data standards for cataloging and named entity databases and web-based protocols for text markup and content transmission. Continued observance of these open design principles will help ensure interoperability as researchers bring additional content and tools into the workflow, where strict adherence has not been possible or cannot be achieved, or if commercial interests begin to introduce proprietary workflow tools, developers may need to create specialized conversion tools for input and export.

Accommodate expanded usage

As priority shifts from new development to the maintenance and care of the content and tools of new digitally enabled research workflows, another challenge is to ensure that the infrastructure can reliably accommodate a growing base of users with diverse research interests. The most likely source of growth in usage is the undergraduate classroom. Researchers in the humanities often test new ideas in their lectures and seminars and ask their students to explore these ideas in course assignments. Faculty in classical studies is now bringing elements of the digital workflow into the classroom, pointing to existing digital sources, identifying new sources for digitization, and asking students to transcribe, translate, and analyze them [18, 28]. Nothing focuses the mind of resource providers on how to ensure access to content and tools, improve interfaces for use at scale, and harden them against error and abuse like this prospect of a regular and growing number of students who must complete online assignments on time to succeed in their classes.

As the response to increased demand from undergraduate researchers helps make access to digital sources and tools in classical and other premodern studies more reliable, the result is that use of the infrastructure becomes more attractive, building additional demand from other kinds of researchers, and creating a virtuous cycle of growth, improvement, and still further growth. Who would these additional researchers likely to be? First, in the face of claims that research in classical studies has supported racial inequities, both actively and implicitly, people of color are using digital and social media to upend established hierarchies within the field and to pursue lines of research that challenge these traditional research priorities[91]. These efforts rely, at least in part, on pointing to works by scholars of color that clearly demonstrate how classical studies speak to concerns of those who are underrepresented in the field and society in general. For example, Eric Ashley Hairston examines in detail the role that classical studies played in the lives of four key black Americans: Phyllis Wheatley, Frederick Douglass, Anna Julia Cooper, and W.E.B DuBois [59]. Similarly, Danielle Allen, a specialist in Athenian democracy, provides a close reading of the Declaration of Independence and argues that equality, a much-contested concept in American racial politics, is central to that founding document and to a thriving American democracy [1].

Another source of demand on the emerging digital research infrastructure is likely to come from those pursuing broad, eclectic lines of research. Consider how scholarly attention to social and cultural interactions along the Silk Road from ancient times to the present has created a rich comparative framework for the study of Europe, Asia, the Middle East, and Africa [20, 55, 60]. The questions that arise within this framework about the means and objects of cultural transmission now require a generation of scholars in classical studies and other fields who are capable of and interested in examining (or reexamining) texts, inscriptions, and other primary sources about previously underappreciated or neglected communities during different eras in all of these regions. Demand from diverse sets of researchers like these with different interests from the digital pioneers will not only test the reliability and the flexibility of the digital workflow and its underlying infrastructure of content and tools, but also help expose and correct biases and other limitations that have been built into them.

Ensure sustainability

Given a basic, working digital infrastructure, the academic community must also raise the visibility and profile of efforts to maintain the tools and services, and of the maintainers needed to keep the components in good working order [96]. Under the rubric of “sustainability,” much has been written about what is required to maintain digital content and services and keep them in good order [15, 22, 78, 95, 115]. Here, I highlight two key factors: the organizational and financial.

First, as Cayless has observed, academic institutions now sponsor many of the digital content and software services on which researchers in classical studies depend [39]. Organized in the shelter of a department or center, the maintainers of these services enjoy the stability that comes with being embedded in a larger institution. They share office space and rely on the college’s or university’s physical plant, legal, human resource, payroll, and information technology offices. However, with these substantial and perhaps subsidized benefits also come certain risks. For example, if institutional priorities do not align with disciplinary needs, then the services may be unable to grow in scale or make other required changes. Because these embedded service organizations are typically small, they are also exposed to what, in modern management parlance, is called a “key person risk,” in which the sudden departure or loss of a member of the team could jeopardize the entire operation. The future of these digital services therefore depends on disciplinary leaders and the service directors remaining alert. They must maintain an up-to-date inventory of staff responsibilities as part of an ongoing succession planning process. They must also carefully monitor the relationship of the service operation with the host institution and regularly evaluate the relative costs and benefits of keeping the affiliation compared to finding another sponsor or establishing an independent organization.

Second, to assess costs properly and sustain the ongoing operations of the digital research infrastructure, service providers in classical studies (and in other disciplines), as well as their sponsors and funders, must exercise a robust set of financial controls. For example, if a sponsoring university or college subsidizes rent or payroll costs and does not charge the service provider directly, then the provider may need to impute these costs so that they are fully recognized on its balance sheet and do not remain invisible. In addition, the service provider must acknowledge the key distinction between capital and maintenance costs. Creating new technology is analogous to the capital costs of constructing a new building. The difference is that buildings can last decades, while the lifespan of technology is much shorter, often less than five years. Sustaining technology infrastructure for decades is thus a problem of covering ongoing maintenance costs punctuated by regular injections of capital to upgrade the technology with new or improved features or to rebuild it entirely. To address this problem, digital service providers in classical studies and other disciplines in the humanities must not only fully recognize their regular maintenance costs but also adopt a budgeting methodology to record and forecast their capital costs. The time has passed when funds would flow at the promise of building something new and shiny. Without substantial effort to marry the creative imagination with hard-headed economic discipline, the emerging infrastructure and the research it supports will become at risk of failure.

Conclusion

In this essay, I have described the development over the last 30 years in the fields of classical and other premodern studies that has resulted in the emergence of the infrastructure necessary to support an end-to-end digital research workflow in the humanities. The workflow covers standard research functions, which include collecting primary sources; cataloging, transcribing and translating them as necessary, identifying key entities within the sources; analyzing and interpreting the accumulated materials; and publishing the research findings. Although the development is well advanced, more work is necessary that includes ensuring interoperability within steps of the workflow and with external research processes, accommodating expanded usage, and ensuring the sustainability of the underlying infrastructure of tools and content. The vigorous development of this infrastructure and the promise of continued growth are now inextricably part of the larger story of field-building in the humanities. The question is whether these last 30 years of development will ultimately prove to establish the digital infrastructure as a natural part of the research apparatus needed to critique received wisdom, build new knowledge, and thereby help address society’s pressing grand challenges. Or will this period merely prove to be “somewhat exceptional?”.