1 Written Works

Written expression is the most common form of recording and communicating the human experience of life on earth. The two volumes of this special issue describe scholarly efforts related to written and inscribed works, from the earliest markings on stone and cave walls to the brilliantly crafted and elaborately illustrated manuscripts created by scribes in the centuries prior to the print era. Epigraphy and paleography scholars study these written works and in so doing, provide valuable evidence-based information to long-established fields in the humanities, social sciences, and the arts. Many ancient works have been damaged, partially destroyed, or in extreme states of deterioration. While museums and libraries maintain numerous collections with descriptive information, a larger number are scattered about, located around the world, undocumented, improperly referenced, lacking provenance information and held in poorly maintained physical environments. For those held in private collections, digitization is rarely done. Changes of ownership are frequent. Richly illustrated manuscripts are regularly bought and sold at auctions to anonymous buyers as investments, works of art or commercial merchandise destined to be disassembled with individual pages and illustrations removed and sold.

Epigraphy and paleography, along with contemporary research in almost every scholarly discipline, are becoming based on data-intensive, multi-stage workflows drawing on diverse and distributed datasets. Digital libraries research is central to advancing data-intensive approaches by providing access, tools and services. In return, digital libraries research has benefitted from unique challenges posed by the sheer magnitude and remarkable diversity of the subject matter.

It is now feasible, both technologically and economically, to create extremely accurate digital facsimiles of handwritten artifacts. In addition to creating high-resolution digital images of the work in its current state, new imaging technologies together with computational methods have proven effective for digitally reconstructing, remediating, and restoring ancient inscriptions on the many types of media upon which they were scrawled, engraved, etched, embossed, painted or penned. These efforts have rendered many legible and coherent for the first time while leaving the original in place, undamaged, intact and unaltered. Some accomplishments have been dramatic, such as the digital unrolling and deciphering of scrolls turned to charcoal by flash combustion and recovering text invisible to the human eye by penetrating layers of media to reveal overwritten text.

2 Epigraphy and Paleography

Epigraphy will be taken here as that research and scholarship concerned with inscriptions on durable materials until the widespread use of the codex,Footnote 1 an early form of the book. Epigraphic research involves identifying and deciphering the content, establishing the origins and context of creation and to, as best can be done, determine the use and purpose. In short, the goal is to build an authentic, readable and coherent work.

Paleography is taken here as research that focuses on hand-written manuscripts in the form of codices and other types of long-form documents that pre-date the print era. Areas of study include examining text, script, substrate characteristics and techniques relevant to the creating of the physical artifact. Transcription of the text and other marking into a host language is a primary goal. Like epigraphic scholars, paleographic scholars are concerned with the origins and circumstances that motivated the creation of the work.

The boundaries of both are indistinct, overlap and are characterized by many specialist areas in keeping with the remarkable diversity of types of artifacts, inscriptive genres, writing techniques, orthographies, individual letter forms and other markings.Footnote 2 Codicology is closely related to paleography and there are differing views of the relationships and differences.Footnote 3 It is generally accepted that visual aesthetics express meaning as well.

Both areas deal with manuscripts that may exist in many different editions, have multiple copies with slight differences and uneven translations. A single text may be the work of several authors some of whom may be fictitious. Manuscripts may refer to outside sources that are lost, unknown, unrecognizable or to internal references that no longer can be located. Ancient versions of languages were fluid. Word order, grammars and orthographies changed with time and place. A nuanced orthographic change might lead to a different understanding. The Homer Multitext ProjectFootnote 4 is one project that is used by language specialists to explore these changes. Two papers in this Issue discuss ongoing work based on the Multitext Project resources.

The scope and reach of epigraphic and paleographic research are rapidly expanding with the creation of digital collections and development of specialized tools and management practices. Yet the scale and diversity are daunting. Web-accessible collections number in the thousands, individual volumes and artifacts number in the millions, and there are a seemingly uncountable number of scripts, letter forms and unique, but meaningful markings. Collections contain works in hundreds of language variants expressed using a large number of alphabets, symbol systems, pictographs and other communicative forms. Collections are widely dispersed, often constructed to meet local needs, contain an unorganized mix of material and may not conform to even the most basic standards necessary to facilitate aggregation and cross repository search and retrieval. And there are a great number of manuscripts and artifacts held in small private collections of individuals, churches, monasteries, libraries that have not yet been digitized.

Digital libraries principles and practices have proven to be invaluable by guiding efforts to establish linking properties to collections. Creating high-resolution page images and adding rich annotations and standardized metadata are first steps. The long-term goal is conversion of scanned page images to machine-readable text. In almost every case to date, this is accomplished by manual transcription. Accuracy is highly dependent on the knowledge of the transcriber. This being the case, “crowdsourcing” is not a viable option. Artificial intelligence and machine learning software platforms are being tested, with measurable success. This topic will be explored in a later section of the paper.

3 Aggregation and Access

Over the past several decades, digital libraries researchers, working with content providers, technologists and other stakeholders have developed standards for data representation, description and linkage strategies for distributed repositories. The degree of functionality across collections depends on the types and nature of linkages. Many established digital libraries support work across distributed sites, as well as tools to assist scholars throughout the scholarly workflow.

However, for paleographic and epigraphic repositories, there is much to be done to enable enhanced access and use. Although individual repositories may support basic scholarly tasks, working across multiple repositories often involves accessing each, one by one.

Digitization of manuscripts is frequently done by using basic scanning methods or photography to create page images. Enhancements using studio lighting techniques and computational photography are not uncommon. These methods are less expensive than laser scanning and multispectral illumination used in projects that have more financial support and in-house expertise.

Browsing and searching of collections is frequently enabled by interface design features unique to individual sites. Many sites have adopted a quasi-Boolean approach of using filters for search, retrieval and compilation of similar entities. Numerous sites include basic image processing options for users to enhance and manipulate text and illustrations. Incorporating tools such as the Mirador ViewerFootnote 5 provides for image markup and workspaces for importing data and resources from other sites. It is optimized for repositories that are International Image interoperability Framework (IIIF) compliant, as described below. One example of a collection that incorporates the Mirador Viewer is the Virtual Manuscript Library of Switzerland.Footnote 6

EuropeanaFootnote 7 is a web portal and digital library of cultural heritage for many European Union funded projects, with links to various other large-scale repositories. Most relevant to this special issue is the EAGLE Project.Footnote 8 EAGLE has created numerous resources for using Europeana's collection of more than 500,000 ancient manuscripts. It has also developed a multi-lingual collection of Greco-Roman inscriptions with annotations and peer-reviewed contributed translations. Of particular value to users is the EAGLE Wikibase for using content and managing workflows. The project has also incorporated a personal workspace in which users can save queries, annotate and save results and do similarity searches on uploaded images. A mobile app can identify, with some degree of accuracy, inscriptions taken with a mobile phone.

Ideally, all heavily used sites would include a workspace for scholars to perform basic tasks of search and retrieval, sorts, grouping, selection of subsets and other tasks to minimize repetitive downloading and data transfer between platforms. While tasks can be expedited somewhat using web resources, many scholars resort to the more common alternative—to download content to their workstation for detailed analyses. Since data sets can be very large, this can be an impediment. It also assumes that sites have open data policies and not all do.

EpiDoc is an international, collaborative effort for encoding scholarly editions of ancient documents as an extension of Text Encoding Initiative and enables encoding in XML.Footnote 9 Work began developing EpiDoc in the mid-1990s. As a result, its guidelines have been widely adopted. There is an extensive list of projects conforming to EpiDoc standards.Footnote 10 There are also instructions, tools, style sheets and other resources available created by and for the EpiDoc community. Tasks such as search and retrieval of EpiDoc compliant documents can be accomplished with a high degree of precision eliminating tedious trial and error often required with other text retrieval methods.

Images in the form of photos, illustrations, maps, charts and other visual images are often an essential component within documents. The International Image Interoperability Framework (IIIF) is a set of standards that supports cross-repository search and retrieval to other IIIF-compliant repositories and allows researchers to perform a wide variety of workflow tasks. It is a foundational requirement for interoperable repositories containing visual content.Footnote 11 The six IIIF APIsFootnote 12 are continually being implemented into global repositories.

Web portals serve as catalogues to link users to a large number of collections, exhibitions, tools and discussion forums. The Digital Classicist is a hub for scholars, students, professionals and others interested in the digital humanities and application of computational methods to the study of the ancient world. As well as cataloging projects, tools and other resources, it also is a hub for discussions that play an important role in bringing communities of like-minded scholars together.Footnote 13

Many collections are dedicated to specialty areas of which there are many. As examples, The Cuneiform Digital Library InitiativeFootnote 14 is a full featured site of approximately 500,000 images and text for Assyriologists. Papyri.infoFootnote 15 aggregates papyrological documents from a number of papyrological collections as part of the Integrating Digital Papyrology project.

One of the most full-featured and important digital libraries for scholars in the classics and beyond is the Perseus Digital Library.Footnote 16 A resource under continuous development for more than three decades, the Perseus collections and tools bring a wealth of information and services to meet the needs of scholars, students and members of the general public. The Perseus Digital Library incorporates the Scaife ViewerFootnote 17 in which more than 2000 works in Greek and Latin can be read both in the original language and translated versions. Perseus content is base-line data for scholarly works in the classics. It is also cited in best-selling novels and non-fiction. The Editor-In-Chief and developer of Perseus is Gregory Crane from Tufts University, USA. He and his co-authors have two papers in this Special Issue.

It is widely recognized that adding relational properties conforming to semantic web and linked data protocols (RDF) to collections results in a quantum leap in functionality and interoperability. Semantic descriptions describe and connect data entities by revealing relationships in ways not possible by searches based on annotations, metadata or even complete transcriptions. Linked data, even in the most basic form, can transform and expedite data discovery, gathering, analysis and presentation stages of the scholarly workflow. Geographic and spatial data more broadly are seen as opportune for linked data applications. One project under development for nearly ten years aimed at promoting linked open data within the humanities communities is the Graph of Ancient World Data. Geospatial information will be discussed in the following section.

However, the prospects for widespread incorporation of such features in the near term are dim. At this time, ongoing work is focused on building web-accessible digital collections that conform to established principles and formats. Additional emphasis is placed on transcription, creating multi-lingual descriptive metadata and encouraging open access to repositories. A great deal has been accomplished in that regard, which speaks to the dedication and efforts of a relatively small community of scholars, library and information scientist and technologists working for the good of all.

Nevertheless, it is enticing to consider the possibilities. As repositories of linked data are created, automated linking among individual repositories begins and integrated networks of repositories are created. New compound data objects can be added that support complex operations and at the same time document a history of use. For the scholar, each level of added functionality and relational properties not only makes common tasks easier and faster, but also brings new methodologies and powerful analytics to a research project resulting in novel and qualitatively different approaches. The benefits of these capabilities cannot be overstated. The impact is such that research that may have taken years to accomplish only a few decades ago can now be completed in a matter of months.

4 Geographic Considerations

Geographic information sciences and systems (GIS) have long been a part of research in many domains, but more recently these have demonstrated considerable value in studying the ancient world. New forms of digital geographic information combined with other spatial considerations have proven to benefit the works of epigraphic and paleographic scholars.

Modern gazetteers contain a wealth of descriptive information and help to disambiguate uncertainties associated with recorded place names and location of ancient sites. The information is critical to the spatial analyses needed in epigraphic and paleographic study. Pleiades,Footnote 18 a long-standing project, is a community-built resource that provides a large corpus of historic content, gazetteers and related geographic information about the ancient world in various forms. Key concepts that organize material are places, locations, names and connections. Content is structured and described in ways to facilitate and enhance scholarly workflows by enabling advanced search, visualizations and linkages to other sites. The project encourages users to contribute and participate by adding and improving geographic information about the ancient world. Pleiades serves as a primary resource for a sizeable number of mapping and other projects. The Ancient World Mapping Center uses Pleiades data in an interactive digital atlas application for creating custom maps.Footnote 19 The University of Pittsburgh World History CenterFootnote 20 is developing a gazetteer and software platform for connecting collections of place names. An article in this issue discusses this project and the importance of place as a starting point for more sophisticated software capabilities.Footnote 21 It was the 2021 Digital Humanities Awards winner for Best DH Tool or Suite of Tools. An article describing the World History Gazetteer is included in this Issue.

The Pelagios NetworkFootnote 22 is a collaborative project aimed at understanding the ancient world and its material culture in geographic terms as well as textual description. To create and maintain geographic connections, Pelagios draws on the community expertise of humanities scholars working with geographic data. The primary gazetteer used by participating projects is that provided by Pleiades. A primary goal of Pelagios and the “glue” that holds its many activities together is the focus on building and implementing linked data resources across geographic datasets associated with the ancient world. Pelagios groups its activities into several primary activities related to spatial analysis and semantic linking: the Annotation activity focuses on semantic geo-annotations for visual data entities; the Gazetteer Activity aims to link external gazetteers by creating uniform place reference specifications; the Registry activity is concerned with registry services and the discovery of linked data collections for places; the Visualization activity focuses on the development of innovative tools for geospatial analysis. A partner project that has gained considerable praise is the Recognito initiative.Footnote 23, Footnote 24 Recognito is an open-source platform with an extensive set of software tools relative to collections building, collaborative work and semantic linking.

Terrestrial maps of the day were inaccurate and changed frequently. Far from being static, the ancient world, as studied today, is increasingly viewed as fluid and complex in many way—names, places and perceived locations changed frequently. Maps and gazetteers, if they existed, were simplistic and likely inaccurate. The uncertainties about what might be over the horizon for inhabitants were likely a constant concern.

In contrast, sophisticated maps of the planets, stars and their movements were referenced in cuneiform tablets, in hieroglyphics and refined over the centuries to high levels of precision by Greek and Roman scholars. Celestial navigation for voyages, placement and alignment of monuments, public buildings and places of worship were frequently based on bearings relative to the sun and stars. Archeoastronomy is a growing area of study in this regard.Footnote 25

5 Digital Restoration, Remediation and Recovery

Advances in information and computing technologies over the past 20 years have made it possible and economical not only to capture high-resolution digital images of written artifacts in their current condition, but also to recover text from damaged and deteriorated works. This can be done using non-intrusive methods such that physical works are unharmed, left intact and unaltered. In many cases, the digital representation will prove to be of greater scholarly value than the original physical artifact by revealing textual markings that have become invisible or hidden with the passage of time.

Physical deterioration of written works is inevitable. Written works are prone to deterioration from the beginning and over the course of centuries depending upon the types of media, the chemistry of writing fluids, storage environments and other external factors. The result in almost every case is to distort or obscure the original work. Text fades, bleeds and blends with adjacent characters. Pages are subject to discoloration, mold and mildew. Bindings become brittle and crack, pottery breaks,Footnote 26 stone erodes and metals oxidize. In some cases, text and markings fade to such a degree that they become invisible. This “hidden text” has been shown to have information critical to understanding. Such is the case not only with the original text, but also with marginalia, scholia, edits and other markings added at later dates.

A number of new technologies have been developed to recover text. Of these, multispectral imaging has proven effective in many cases. The Lazarus Project based at the University of Rochester uses a multispectral camera system. Manuscripts are photographed at discrete wavelengths of light, which penetrate to different depths of the substrate to reveal text at different levels. The multiple images captured are then combined, or “stacked,” and processed using statistical algorithms to complete the recovery and reveal hidden text and images.Footnote 27 The clarity of the processed images is such that new scholarship can be done. The Recognito platform of Pelagios is used to host, share and annotate images.

Palimpsests were a special type of manuscript made to be reused. Pages could be erased to allow for new writings. Generally, faint traces of earlier writings remained. Such was the case of a tenth century palimpsest containing a scribal copy of an Archimedes treatise. Later, the text was erased and overwritten. Multispectral imaging techniques proved effective in recovering some overwritten text, but the heavily damaged pages required special x-ray scanning provided by a laboratory at Stanford.Footnote 28,Footnote 29

More common is the planar distortion of pages interfering with scanning and readability. One such distortion affecting pages is “cockling”—bulges, creases, warping—that can affect an entire manuscript. The Venetus A manuscript, a 10th-century Byzantine manuscript of the Iliad and the one on which modern texts are primarily based, had deterioration in this regard. Image capture using flatbed scanners or photographs would result in 2-D images with significant distortion of the text. Three-dimensional scanning combined with virtual flattening algorithms was developed by the EDUCE Laboratory at the University of KentuckyFootnote 30 to create a digital facsimile of the original 645-page manuscript and prepare it for transcription by classists at the Harvard Center for Hellenic Studies.Footnote 31

Building on this technique, sophisticated sets of algorithms were used to digitally “unroll” scrolls-even heavily damaged ones. A recent project by the EDUCE Laboratory succeeded in “virtual unwrapping” and deciphering parts of the En-Gedi scroll, one of several recovered from a 600 CE site. The parchment scroll had been burned and carbonized by heat from the eruption of Vesuvius in 79AD. Merely touching the scroll caused disintegration. Recovering the text required a non-invasive digitization and x-ray-based micro-computed tomography. The project received extensive media coverage in both scholarly publications and public interest programs.Footnote 32, Footnote 33

The richly illustrated "Illuminated Manuscripts" are particularly prone to changes. A scribe's palette might contain more than a dozen different pigments. Numerous technologies are effective in analyzing chemical structures of those pigments used in illustrations. One example is multi-sensor hyperspectral imaging and Raman spectroscopy. Remediation of severely damaged artifacts can require a multidisciplinary effort that involves not only scholars with detailed knowledge of the text, but also participation from researchers in computer vision and computer graphics, materials scientists, chemists, physicists and others.

Developing and including remediated works in digital collections present special requirements. Codices, scrolls, illustrations and computer enhanced artifacts in general are complex data objects that have a “digital history.” It is important to capture and make available steps taken and intermediate stages in the remediation process as well as the final product.Footnote 34

6 Automated Transcription

A valuable addition to the study of paleography is new research on “artificial paleography"-the application of artificial intelligence and machine learning technologies for optical character recognition (OCR) and handwritten text recognition (HTR) for ancient written works. Automated transcription would relieve scholars of the tedious tasks that must be done before analysis and interpretation can begin by giving them the means to search, retrieve and otherwise access content in the same fashion as a modern text file. It might also lead, eventually, to discovery of related manuscripts across distributed collections.

Optical character recognition software for many languages has been available for decades. Software for recognition and conversion of cursive scripts and alphabets with connected scripts, such as Arabic and Indic scripts is still an active area of research. For ancient manuscripts the tasks are more challenging. Medieval manuscripts have unique forms of lettering systems that vary widely across time-periods, regions, languages, scribal styles and a large number of unique marks. Machine learning uses training data to infer the rules to be used in recognition and transcription based on this approach has proven accurate in a number of cases, but less successful for script variants. As a result, different sets of training data for each variant of scripts and for each alphabet in which the manuscript is written is necessary.Footnote 35

Examples of ongoing work in this area include the "Research Environment for Ancient Documents" (READ), an open source web-based platform that provides a number of tools for converting images of orthographic units for transcription. It also has been used to link host languages with translations. Originally developed for Indic languages it has since been updated for testing on others, including Latin epigraphic texts.Footnote 36, Footnote 37 A related project is the EU funded Transkribus project, that has developed a wide range of AI software for converting images of many handwritten texts, including Medieval Latin scripts, to machine readable text. The web-based platform is noted for its ease of use and scope of applications. A character error rate shows the percentage of transcription errors by the neural network software.Footnote 38

Kraken is a text recognition software designed primarily for connected scripts. The starting point is the generation of high-quality scans-the total number is related to script type and features. Transcription is accomplished locally on HTML platforms. Transcription is limited to “diplomatic” formatted pages. There are numerous sites that give instructions of how to use kraken and other semi-automatic transcription programs. In general, it is not a one-step process and may involve multiple stages in which several different transcription software programs are used for preprocessing or application of multiple transcripts.Footnote 39 eScriptorium builds on kraken in the sense that it provides an integrated set of tools for transcription, annotation, translation and other tasks for working with historic documents including ancient manuscripts.Footnote 40 The source code for these and other handwritten text recognition software is freely available on GitHub.

7 Scholarly Infrastructures

Computing, information and communication technologies, together with digital content, have become an essential part of contemporary research in almost every scholarly discipline. The concepts and definition of “infrastructures” based on these have changed numerous times in the past several decades. Initially viewed in terms of computation, networks and software, scholarly infrastructures are now seen as including and perhaps primarily based on digital content and suites of services that deliver data in a meaningful form.Footnote 41

Digital libraries research has helped to identify core principles and best practices for creating, managing, linking and using large-scale data resources. A fundamental belief is that digital information has greatest value when it becomes part of a dynamic, growing and globally linked knowledge infrastructure—one that continually expands in scale, functionality and reach.Footnote 42

Yet insight into how best to approach infrastructures development remains elusive. Data originates from many sources containing heterogeneous data objects organized in many ways, delivered via diverse platforms by equally diverse users from all over the globe. It can be modified with ease, replicated and disseminated across continents with just a few keystrokes. As a result, there is a proliferation of data entities that have no permanent home, constantly circulating in the global networked environment, and used without knowledge of origins or integrity. The versatile and pliable nature of digital information that make it so valuable for transformative scholarly work also presents serious challenges for data preservation, archiving, curation and stewardship. The task to exploit and manage the inherent unpredictability and instability of dataflows becomes one of managing complexity.Footnote 43

It is becoming apparent that different types of information infrastructures serve the needs of scholars and researchers in different subject areas while at the same time maintaining fundamental and strong linkages to each other. It is also becoming apparent that at certain stages of development, infrastructures become self-organizing and self-generative as user input catalyzes further refinement and growth. When data resources and applications are seen of value to a broader community, they quickly become part of the scholarly infrastructures that support the workflows of domain research.

The continuing shift to transparency in internet accessible resources and increasingly open scholarly practices in the past two decades has reconfigured the workings of the scholarly enterprise. Beginning with the successes of the Jericho-like call for open access to research publications, open datasets, open-source software and services quickly followed. A movement toward open science, and more broadly, open scholarship, is now taking hold in many disciplines.Footnote 44 The potential positive impact of transparency of scholarly work and the scholarly enterprise is such that it is widely endorsed by members of the academic community as well as by multinational organizations including UNESCOFootnote 45 and the OECD.Footnote 46

Scholarly workflows across many domains have become more complex, involving multiple stages, numerous data sets, many computational operations and a variety of presentation modalities. It is critical that there be a detailed record of workflow steps and resources used at every stage so that results can be verified and data resources repurposed and reused. Workflow management and documentation tools can be integrated into a research undertaking without placing a heavy burden on the scholar. Project Jupyter has been proven to be invaluable in this regard. The web-based platform allows users to configure and manage projects and resources used based on a “notebook” concept. Prior examples of notebooks constructed for a variety of applications used are available on multiple web sites.Footnote 47

For the scholar, finding and using all that scholarly infrastructures offer may present difficult challenges. To assist scholars in finding and using content, tools and other resources, many university libraries have created digital scholarship laboratories staffed by data scientists and other technology and content specialists. The need for advanced expertise to create and extend the usability of data objects and tools has prompted university departments of computing and information sciences to add course tracks and majors for the data sciences.

8 Provenance

8.1 Provenance of Physical Artifacts

Prior to the creation of digital repositories, the provenance of a hand-written work could prove to be very difficult to establish. Scholars, frequently working alone, could scour reference works to help in tracing the journey of the artifact from the time and place of creation to the present as well as consulting colleagues and experts. But frequently gaps (lacunae) in the story were encountered. Manuscripts that contained explicit geographic references might help to establish dates, places and diffusion of copies. Differences in physical materials, styles, methods of construction and writing practices provided clues as well. It was not uncommon for papyrus scrolls to have tags attached giving the name of the author, subject and other important metadata. However, these were routinely lost.

Other circumstances intervened as well. As mentioned earlier in this article, manuscripts were seen as works of art or marketable merchandise. The richly illustrated "Illuminated Manuscripts" were greatly prized by collectors, with acquisition activity beginning in the nineteenth century and increasing into modern times. One result was the removal of pages, cutting an illustration from the page, or often simply disassembling the manuscript entirely. Yet another issue was simply the destruction of an artifact or manuscripts—both intentionallyFootnote 48 and accidentally.

8.2 Provenance of Digital Assets

Digital representations, once created, help to establish provenance of the physical counterparts. Recovered text and imagery may have immediate value by adding missing links to an artifact’s history. In many cases a ripple effect is generated whereby establishing provenance for one contributes to establishing the lineage of others and the cycle repeats and increases as a larger number of scholars contribute.

Provenance of digital representations may provide difficult to establish if care is not taken. Data in its original form as created, or “raw data,” generated by digital instruments must be changed to useful forms. Other scholars use these new forms as the starting point for their work. A central and important task is to trace newer forms back to the original datasets. Computations and transformations are not necessarily reversible. Recovery of source data from computed data may not be possible without a full record of prior actions taken. Establishing data provenance becomes something akin to tracing a daisy chain of data history. Therefore, it is important that efforts be made at the earliest stages of the data lifecycle to take such steps as to anticipate future use such that at any point in a project the source data can be traced and referenced. This suggests an imperative for scholarly workflows to be fully documented. Unless this is done, data curation and stewardship tasks may become problematic and others will encounter difficulties in reusing these data.

9 Conclusion

Digital libraries, epigraphy and paleography have natural connections that taken together play an important role in building evidence bases for many academic areas of study. The earliest handwritten works serve as a starting point for expanding our knowledge of the human experience. Remediation and recovery of text from damaged artifacts has rendered many writings legible for the first time. More collections that meet basic requirements for linking are being created. While epigraphy and paleography have long been part of the group of studies identified as “auxiliary sciences,” their status and value continue to grow as data and resources accumulate. Archaeology has not been discussed in this Special Issue, but began integrating computing and information technologies into its research and data management activities nearly fifty years ago.Footnote 49 Although the papers in this Special Issue are focused on the Greco-Roman classical period, many of the topics are relevant to studying written works from other cultures and time periods. The scope and scale of ancient manuscripts, documents and other written works is difficult to comprehend.Footnote 50 In addition to the millions of works from medieval Europe and parts of the Middle East there are many others. A large portion of the Cairo Genizah collection of several hundred thousand medieval Jewish documents and fragments written in Hebrew, Arabic and Aramaic and other languages has now been digitized and is accessible at several sites.Footnote 51 The libraries of Timbuktu contain nearly 500,000 documentsFootnote 52 and the newly discovered manuscripts from the Buddhist caves near Dunhuang, China, contain hundreds of thousands of documents in a vast variety languages and scripts.Footnote 53 Hopefully, future issues of the International Journal on Digital Libraries will contain papers discussing collections such as these and many others as well.