1 Written Works

Written expression is the most common form of recording and communicating the human experience of life on earth. This special issue will be concerned with written and inscribed works, from the earliest markings on stone and cave walls to the brilliantly crafted and elaborately illustrated manuscripts created by scribes in the centuries prior to the print era. Epigraphy and paleography are disciplines that study writings that pre-date the print era. By doing so, this work provides valuable evidence-based information to long-established fields in the humanities, social sciences and the arts. Many ancient works have been damaged, partially destroyed or in extreme states of deterioration. While museums and libraries maintain numerous collections with descriptive information, a larger number are scattered about, located around the world, undocumented, improperly referenced, lacking provenance information and held in poorly maintained physical environments. For those held in private collections, digitization is rarely done. Changes of ownership are frequent. Richly illustrated manuscripts are regularly bought and sold at auctions to anonymous buyers as investments, works of art or destined to be disassembled with individual pages and illustrations removed and sold. And perhaps most importantly, a great number are unknown and inaccessible to scholars for whom these might be of great value.

Digital libraries research and development have had a transformative effect on these two areas along with those disciplines that draw upon them. The advantages of digital versions of inscribed materials are manifest. In the global internet environment, geographically distributed collections can be linked, and content replicated, disseminated, shared and repurposed. Open access to digital versions of invaluable works is now possible for those physical collections for which direct access was difficult, restricted or prohibited.

Epigraphy and paleography research, along with contemporary research in almost every scholarly discipline, is becoming based on computationally intensive, multi-stage workflows using diverse datasets. Digital libraries research has continually contributed to epigraphic and paleographic studies by providing tools and services that support this transition. In return, digital libraries research has benefitted from unique challenges posed by the sheer magnitude and remarkable diversity of inscribed works created over a timeframe of nearly five millennia. Development of digital libraries practices and infrastructures are further prompted by appeals from an international community of scholars for a continuous stream of evidence-based information and resources to advance their work. The period of time between the availability of new resources and user uptake is very short and quickly followed by requests for yet more. Dynamic, iterative cycles of new technologies, data resources and user demands continue to expand in number and type. When a particular development is seen of value to a broader community, it quickly becomes part of scholarly infrastructures.

It is now feasible, both technologically and economically, to create extremely accurate digital facsimiles of a wide variety of inscribed works. In addition to creating high-resolution digital images of the work in its current state, new technologies together with computational methods have proven effective for digitally reconstructing, remediating and restoring ancient inscribed works on the many media upon which they were scrawled, engraved, etched, embossed, painted or penned. These efforts have rendered many artifacts legible and coherent for the first time while leaving the original in place, undamaged, intact and unaltered. Some accomplishments have been dramatic, such as the digital unrolling and deciphering of scrolls turned to charcoal by flash combustion, exposing text invisible to the human eye and penetrating layers of media to reveal overwritten text.

2 Epigraphy and Paleography

Epigraphy will be taken here as that research and scholarship concerned with inscriptions on durable materials until the widespread use of the codex, an early form of the book.Footnote 1 Epigraphic research involves identifying and deciphering the content, establishing the origins and context of creation and to, as best can be done, determine the use and purpose. In short, the goal is to build a scholarly description that is an authentic, readable and coherent work.

Paleography is taken here as research that focuses on handwritten texts in the form of codices and other types of long-form documents, including papyrus scrolls, created prior to the print era. Areas of study include examining text, script and substrate characteristics and techniques relevant to how the physical written artifact was created. Transcription and frequently translation of the text and other marking into a host language may be a primary goal. Like epigraphic scholars, paleographic scholars are concerned with the origins and circumstances that motivated the creation of the work.

The boundaries of both are indistinct, overlap and characterized by many specialist areas in keeping with the remarkable diversity of types of artifacts, inscriptive genres, writing techniques, orthographies, individual lettering forms and other markings.Footnote 2 Codicology is closely related to paleography, and there are differing views of the relationships and differences.Footnote 3

Both areas deal with ancient works that may exist in a many different editions, have multiple copies with slight differences and uneven translations. A single text may be the work of several authors some of whom may be fictitious. They may refer to outside sources that are lost, unknown or unrecognizable or to internal references that no longer can be found. Ancient versions of languages were fluid. Word order, grammars and orthographies changed with time and place. A nuanced orthographic change might lead to a different understanding. The Homer Multitext ProjectFootnote 4 is used by language specialists to explore these changes.

Understanding meaning and significance of the written content, both in the original form and copies made, can then be studied by classics scholars, historians and others from the humanities and social sciences. There is considerable subject matter that requires transdisciplinary collaboration. It is also common that those who specialize in ancient documents also are engaged in the latter stages of study.

The scope and reach of epigraphic and paleographic research, always broad, is rapidly expanding with the addition of digital collections and related resources. Web-accessible cataloged collections number in the thousands, individual volumes and artifacts number in the millions and a seemingly uncountable number of letterforms and unique, but meaningful markings. Collections are widely-dispersed, often constructed to meet local needs and may not conform to even the most basic standards necessary to facilitate discovery and aggregation.

While current digital libraries principles and practices can be invaluable in creating repositories in the form of images and accompanying annotation and metadata, transcription of large corpora of manuscripts requires automated transcription systems. Artificial intelligence and machine learning software platforms are being tested, with measurable success. This topic will be explored in a later section of the paper.

3 Aggregation, Repositories and Digital Libraries

Over the past several decades, digital libraries researchers working content providers, technologists and other stakeholders have developed standards for data representation, description and linkage strategies for distributed repositories. The degree of functionality across collections depends on the types and nature of linkages. Although basic interoperability for most repositories of epigraphic and paleographic content is still based primarily on descriptive, non-standard metadata, annotations and hyperlinks, more mature repositories and digital libraries have features that assist scholars in finding and compiling material. New tools to assist epigraphers and paleographers in cross-repository exploration, image collection, manipulation and comparison are continually under development.

Digitization of manuscripts is frequently done using flatbed scanners or photography to create page images. Enhancements using studio lighting techniques and computational photography are not uncommon. These methods are less expensive than laser scanning and multispectral illumination used in projects that have more financial support and in-house expertise. Creating high-fidelity images from legacy collections of analog slides or microfilm copies often proves to be difficult. Numerous sites include basic image processing options for users. More developed sites incorporate tools such as the Mirador Viewer,Footnote 5 a full-featured open-source software platform that provides for image annotation, workspaces for retrieval and comparison of images from geographically distributed repositories, APIs and multi-lingual support. It is optimized for repositories that are International Image Interoperability Framework (IIIF) compliant. One example of a collection that incorporates the Mirador Viewer is the Virtual Manuscript Library of Switzerland.Footnote 6

Browsing and searching of collections is frequently enabled by interface design features unique to individual sites. Many sites have adopted a quasi-Boolean approach of using filters for search, retrieval and compilation of similar entities. Europeana,Footnote 7 a web portal and digital library of cultural heritage, is the host for many European Union-funded projects. Most relevant to this special issue is the EAGLE Project.Footnote 8 EAGLE has created numerous resources for using Europeana's collection of more than 500,000 ancient manuscripts. It has also developed a multi-lingual collection of Greco-Roman inscriptions with annotations and peer-reviewed contributed translations. Of particular interest to users is the EAGLE Wikibase for using content and managing workflows.Footnote 9

EpiDoc is an international, collaborative effort that provides guidelines and tools for encoding digital scholarly editions of ancient documents. EpiDoc is an extension of the Text Encoding Initiative and enables encoding XML. It can address not only transcriptions and editorial annotations of texts themselves, but also aspects of the media on which the texts appear. As a result, search and retrieval of documents can be accomplished with a high degree of precision eliminating tedious trial and error often required with text retrieval methods.Footnote 10

Search of images in the form of photos, illustrations, maps, charts and other visual images is often an essential component within documents. The International Image Interoperability Framework (IIIF) is a set of standards that support cross-repository search and retrieval to other IIIF-compliant repositories and allows researchers to perform a wide variety of workflow tasks. It is a foundational requirement for highly functional repositories containing visual content.Footnote 11 The six IIIF APIsFootnote 12 are continually being implemented into global repositories.

Web portals link users to a large number of collections, exhibitions, tools and discussion forums. There are numerous international efforts to catalog and describe online collections and to develop more detailed standards for integrating collections, tools and resources. The Digital Classicist is a hub for scholars, students, professionals and others interested in the digital humanities and application of computational methods to the study of the ancient world. As well as cataloging projects, tools and other resources, it also is a hub for discussions that play an important role in bringing communities of like-minded scholars together.Footnote 13

Many collections are dedicated to specialty areas of which there are many. As examples, The Cuneiform Digital Library InitiativeFootnote 14 is a full-featured site of approximately 500,000 images and text for Assyriologists. Papyri.infoFootnote 15 aggregates papyrological documents from a number of papyrological collections as part of the Integrating Digital Papyrology project.

Ideally, sites would also include a workspace for scholars to perform basic tasks of search and retrieval, sorts, grouping, selection of subsets and other tasks to minimize repetitive downloading and data transfer between platforms. While tasks can expedite somewhat using web resources, many scholars resort to the more common alternative—to download content to their workstation for detailed analyses. Since datasets can be very large, this can be an impediment. It also assumes that sites have open data policies and not all do. One open-access site that has incorporated a user Personal Workspace is the Europeana Eagle Project.Footnote 16 Users can save queries, annotate and save results and do similarity searches on uploaded images. The mobile app can identify, with some degree of accuracy, inscriptions taken with a mobile phone.

It is widely recognized that adding relational properties conforming to semantic web and linked data protocols (RDF) to collections results in a quantum leap in functionality and interoperability. Semantic descriptions describe and connect data entities by revealing relationships in ways not possible by search based on metadata. Linked data, even in the most basic form, can transform and expedite data discovery, gathering, analysis and presentation stages of the scholarly workflow. Geographic and spatial data more broadly are seen as opportune for linked data applications. One project under development for nearly 10 years aimed at promoting linked open data within the humanities communities is the Graph of Ancient World Data. Geospatial information will be discussed in the following section.

However, the prospects for widespread incorporation of such features in the near term are dim. At this time, ongoing work is focused on building web-accessible digital collections that conform to established principles and formats. Additional emphasis is placed on transcription of catalogs, creating multi-lingual descriptive metadata and encouraging open access to repositories. A great deal has been accomplished in that regard, and it speaks to the dedication and efforts of a relatively small community of scholars, library and information scientist and technologists working for the good of all.

Nevertheless, it is enticing to consider the possibilities. As repositories of linked data are created, automated linking among individual repositories begins and integrated networks of repositories are created. New compound data objects can be added that support complex operations and at the same time document a history of use. These might also link to remote aural, visual and multimedia content, streaming media, social media, analytical tools and even systems of real-time remote sensors. For the scholar, each level of added functionality not only makes common tasks easier and faster, but also brings new methodologies and powerful analytics to a research project resulting in novel and qualitatively different approaches. The benefits of these capabilities cannot be overstated. The impact is such that research that would have taken years to accomplish only a few decades ago can now be completed in a matter of months.

One of the most full-featured and important digital libraries for scholars in the classics and beyond is the Perseus Digital Library. Being a resource under continuous development for more than three decades, the Perseus collections and tools brings a wealth of information, tools and services to meet the needs of scholars and students and members of the general public. The Perseus Digital Library incorporates the Scaife ViewerFootnote 17 in which more than 2000 works in Greek and Latin can be read both in the original language and translated versions. Perseus content is baseline data for scholarly works in the classics. It is also cited in best-selling novels and non-fiction. The Editor-In-Chief and developer of Perseus is Gregory Crane from Tufts University, USA. He and co-authors have two papers in this Special Issue.

Contemporary studies of epigraphic and paleographic sources are providing new insights into socio-cultural histories of peoples and places. New generations of Natural Language Processing tools are becoming widely available to support such work. One example is the Classical Languages ToolkitFootnote 18 applicable to multiple ancient languages. It is a part of the CROSSREADSFootnote 19 project funded the European Research Council (ERC).

4 Geographic Considerations

Geographic information sciences and systems (GIS) have long been a part of research in many domains, but more recently, these have demonstrated considerable value in studying the ancient world. New forms of digital geographic information combined with other spatial considerations have proven to benefit the works of epigraphic and paleographic scholars.

Modern gazetteers contain a wealth of descriptive information and help to disambiguate uncertainties associated with recorded place names and location of ancient sites. The information is critical to the spatial analyses needed in epigraphic and paleographic study. Pleiades,Footnote 20 a long-standing project, is a community-built resource that provides a large corpus of historic content, gazetteers and related geographic information about the ancient world in various forms. Key concepts that organize material are places, locations, names and connections. Content is structured and described in ways to facilitate and enhance scholarly workflows by enabling advanced search, visualizations and linkages to other sites. It encourages users to contribute and participate by adding and improving geographic information about the ancient world. Pleiades serves as a primary resource for a sizeable number of mapping and other projects. The Ancient World Mapping Center uses Pleiades data in an interactive digital atlas application for creating custom maps.Footnote 21

The Pelagios NetworkFootnote 22 is a collaborative project aimed at understanding the ancient world and its material culture in geographic terms as well as textual description. To create and maintain geographic connections, Pelagios draws on the community expertise of humanities scholars working with geographic data. The primary gazetteer used by participating projects is that provided by Pleiades. A primary goal of Pelagios and the “glue” that holds its many activities together is the focus on building and implementing linked data resources across geographic datasets associated with the ancient world. Pelagios group's its activities into several primary activities related to spatial analysis and semantic linking: The Annotation activity focuses on semantic geo-annotations for visual data entities; the Gazetteer activity aims to link external gazetteers by creating uniform place reference specifications; the Registry activity is concerned with registry services and the discovery of linked data collections for places and the Visualization activity focuses on the development of innovative tools for geospatial analysis. A partner project that has gained considerable praise is the Recognito initiative.Footnote 23, Footnote 24 Recognito is an open-source platform with an extensive set of software tools relative to collections building, collaborative work and semantic linking.

Terrestrial maps of the day were inaccurate and changed frequently. Far from being static, the ancient world, as studied today, is increasingly viewed as fluid and complex in many way—names, places and perceived locations changed frequently. Maps and gazetteers, if they existed, were simplistic and likely inaccurate. The uncertainties about what might be over the horizon for inhabitants were likely a constant concern.

In contrast, sophisticated maps of the planets, stars and their movements were referenced in cuneiform tablets, in hieroglyphics and refined over the centuries to high levels of precision by Greek and Roman scholars. Celestial navigation for voyages, placement and alignment of monuments, temples and places of worship were frequently located based on bearings relative to the sun and stars. Archeoastronomy is a growing area of study in this regard.Footnote 25

5 Digital Restoration, Remediation and Recovery

Advances in information and computing technologies over the past 20 years have made it possible and economical not only to capture high-resolution digital images of written artifacts in their current condition, but also to recover text from damaged and deteriorated works. This can be done using non-intrusive methods such that physical works are unharmed, left intact and unaltered. In many cases, the digital representation will prove to be of greater scholarly value than the original physical artifact by revealing textual markings that have become invisible or hidden with the passage of time.

Physical deterioration of written works is inevitable. Written works are prone to deterioration from the beginning and may proceed over the course of centuries depending upon the types of media, the chemistry of writing fluids, storage environments and other external factors. The result in almost every case is to distort or obscure the original work. Text fades, bleeds and blends with adjacent characters. Pages are subject to discoloration, mold and mildew. Bindings become brittle and crack, pottery breaks ,Footnote 26 stone erodes and metals oxidize. In some cases, text and markings fade to such a degree that they become invisible. This “hidden text” has been shown to have information critical to understanding as is the case with marginalia, scholia, edits and other markings added at later dates.

A number of new technologies have been developed to recover text. Of these, multispectral imaging has proven effective in many cases. The Lazarus Project based at the University of Rochester uses a multispectral camera system. Manuscripts are photographed at discrete wavelengths of light, which penetrate to different depths of the substrate to reveal text at different levels. The multiple images captured are then combined, or “stacked,” and processed using statistical algorithms to complete the recovery and reveal hidden text and images. Footnote 27 The clarity of the processed images is such that new scholarship can be done. The Recognito platform of Pelagios is used to host, share and annotate images.

Palimpsests were a special type of manuscript made to be reused. Pages could be erased to allow for new writings. Generally, faint traces of earlier writings remained. Such was the case of a tenth century palimpsest containing a scribal copy of an Archimedes treatise. Later, the text was erased and overwritten. Multispectral imaging techniques proved effective in recovering some overwritten text, but the heavily damaged pages required special X-ray scanning provided by a laboratory at Stanford.Footnote 28,Footnote 29

More common is the planarity distortion of pages interfering with scanning and readability. One such distortion affecting pages is “cockling”—bulges, creases, warping and so forth that can affect an entire manuscript. The Venetus A manuscript, a tenth century Byzantine manuscript of the Iliad and the one on which modern texts are primarily based, had deterioration in this regard. Image capture using flatbed scanners or photographs would result in 2-D images with significant distortion of the text. Three-dimensional scanning combined with virtual flattening algorithms was developed by the EDUCE Laboratory at the University of KentuckyFootnote 30 to create a digital facsimile of the original 645-page manuscript and prepare it for transcription by classists at the Harvard Center for Hellenic Studies.Footnote 31

Building on this technique, sophisticated set of algorithms were used to digitally “unroll” scrolls—even heavily damaged ones. A recent project by the EDUCE Laboratory succeeded in “virtual unwrapping” and deciphering parts of the En-Gedi scroll, one of several recovered from a 600 CE site. The parchment scroll had been burned and carbonized by heat from the eruption of Vesuvius in 79 AD. Merely touching the scroll caused disintegration. Recovering the text required a non-invasive digitization and X-ray-based micro-computed tomography. The project received extensive media coverage in both scholarly publications and public interest programs.Footnote 32

The richly illustrated "Illuminated Manuscripts" are particularly prone to changes. A scribe's palette might contain more than a dozen different pigments. Numerous technologies are effective in analyzing chemical structures of those pigments used in illustrations. Examples are multi-sensor hyperspectral imaging and Raman spectroscopy. Often, it is the logistical obstacles associated with the transport of rare and precious manuscripts to laboratories equipped with instruments for analysis. Insurance, transport risks as well as uncertain skills of laboratory staff in handling manuscripts are primary concerns. Remediation can require a multidisciplinary effort and involve participation from researchers in computer vision and computer graphics, materials scientists, chemists, physicists and others.

Developing and including remediated works in digital collections presents special requirements. Codices, scrolls, illustrations and computer-enhanced artifacts in general are compound data objects that have a “digital history.” It is important to capture and make available steps taken and intermediate stages in the remediation process as well as the final product.Footnote 33

6 Automated Transcription

A valuable addition to the study of paleography is new research on “artificial paleography"—the application of artificial intelligence and machine learning technologies for optical character recognition (OCR) and handwritten text recognition (HTR) for ancient written works. Automated transcription of ancient manuscripts would relieve scholars of the tedious tasks that must be done before analysis and interpretation can begin by giving them the means to search, retrieve and otherwise access content in the same fashion as a modern text file. It might also lead, eventually, to discovery of related manuscripts across distributed collections.

Optical character recognition software for many languages has been available for decades. Software for recognition and conversion of cursive scripts and alphabets with connected scripts, such as Arabic and Indic script, is still an active area of research. For ancient manuscripts, the tasks are yet more challenging. Medieval manuscripts have unique forms of lettering systems that vary widely across time periods, regions, languages (many of which are no longer in use), scribal styles and a large number of unique marks. Machine learning uses training data to infer the rules to be used in recognition and transcription. This approach has proven accurate in a number of cases, but less successful for script variants. As a result, different sets of training data for each variant of scripts and for each alphabet in which the manuscript is written are necessary.Footnote 34

Examples of ongoing work in this area include the "Research Environment for Ancient Documents (READ)," an open-source web-based platform that provides a number of tools for converting images of orthographic units for transcription. It also has been used to link host language with translations. Originally developed for Indic languages is has since been updated for testing on others, including Latin epigraphic texts.Footnote 35,Footnote 36 A related project is the EU-funded Transkribus project that has developed a wide range of AI software for converting images of many handwritten texts, including Medieval Latin scripts, to machine readable text. The web-based platform is noted for its ease of use and scope of applications. A character error rate shows the percentage of transcription errors by the neural network software.Footnote 37

Kraken is a text recognition software designed primarily for connected scripts. The starting point is the generation of high-quality scans—the total number is related to script type and features. Transcription is accomplished locally on HTML platforms. Transcription is limited to “diplomatic” formatted pages. There are numerous sites that give instructions of how to use Kraken and other semi-automatic transcription programs. In general, it is not a one-step process and may involve multiple stages in which several different transcription software are used. preprocessing or application of multiple transcripts.Footnote 38 eScriptorium builds on Kraken in the sense that it provides an integrated set of tools for transcription, annotation, translation and other tasks for working with historic documents including ancient manuscripts.Footnote 39 The source code for these and other handwritten text recognition software is freely available on GitHub.

7 Scholarly Infrastructures

Computing, information and communication technologies together with digital content have become an essential part of contemporary research in almost every scholarly discipline. The concepts and definition of “infrastructures” based on these have changed numerous times in the past several decades. Initially viewed in terms of computation, networks and software, scholarly infrastructures are now seen as including and perhaps primarily based on digital content and suites of services that deliver data in a meaningful form.Footnote 40

Digital libraries research has helped to identify core principles and best practices for creating, managing, linking and using large-scale data resources. A fundamental belief is that digital information has greatest value when it becomes part of a dynamic, growing and globally linked knowledge infrastructure—one that continually expands in scale, functionality and reach.Footnote 41

Yet insight into how best to approach infrastructures development remains elusive. Data originate from many sources containing heterogeneous data objects organized in many ways, delivered via diverse platforms by equally diverse users from all over the globe. It can be modified with ease, replicated and disseminated across continents with just a few keystrokes. As a result, there is a proliferation of data entities that have no permanent home, constantly circulating in the global networked environment and being used without knowledge of origins or integrity. The versatile and pliable nature of digital information that make it so valuable for transformative scholarly work also presents serious challenges for data preservation, archiving, curation and stewardship. The task to exploit and manage the inherent unpredictability and instability of dataflows becomes one of the managing complexities.Footnote 42

It is becoming apparent that different types of information infrastructures serve the needs of scholars and researchers in different subject areas while at the same time maintaining fundamental and strong linkages to each other. It is also becoming apparent that at certain stages of development, infrastructures become self-organizing and self-generative as user input catalyzes further refinement and growth. When data resources and applications are seen of value to a broader community, they quickly become part of the scholarly infrastructures that support the workflows of domain research.

The continuing shift to transparency in internet accessible resources and increasingly open scholarly practices in the past two decades has reconfigured the workings of the scholarly enterprise. Beginning with the successes of the Jericho-like call for open access to research publications, open datasets, open-source software and services quickly followed. A movement toward open science, and more broadly, open scholarship is now taking hold in many disciplines.Footnote 43 The potential positive impacts of transparency of scholarly work and the scholarly enterprise are such that it is widely endorsed by members of the academic community as well as multinational organizations including UNESCOFootnote 44 and the OECD.Footnote 45

Scholarly workflows across many domains have become more complex, involving multiple stages, numerous datasets, many computational operations and presentation modalities. It is critical that there be a detailed record that sets forth sufficient documentation such that the veracity of results can be established and data resources repurposed and reused. Workflow management and documentation can be integrated into scholarly research to some degree. Project Jupyter has been proven to be invaluable in this regard. The web-based platform allows users to configure and manage projects and resources used based on a “notebook” concept. Prior examples of notebooks constructed for a variety of applications used are available on multiple websites.Footnote 46

For the scholar, finding and using all that scholarly infrastructures offer may present difficult challenges. To assist scholars in finding and using content, tools and other resources, many university libraries have created digital scholarship laboratories staffed by data scientists and other technology and content specialists. The need for advanced expertise to create and extend the usability of data objects and tools has prompted university departments of computing and information sciences to add course tracks for data sciences, information cultures, platforms for collaboration and data stewardship which are just a few examples of these.

8 Provenance

8.1 Provenance of Physical Artifacts

Prior to creation of digital repositories, the provenance of a handwritten work could prove to be very difficult. Scholars, frequently working alone, could scour reference works to help in tracing the journey of the artifact from the time, place and source to the present, as well as consulting colleagues and experts exploring similar works. But frequently, there were often gaps (lacunae) in the story. Manuscripts that contain geographic references help to establish dates, places and diffusion of copies. Many different materials were used in the making of codices, and these varied according to time period and location. Analysis of these also provides valuable information. It was not uncommon for papyrus scrolls to have tags attached giving the name of the author, subject and other important metadata. However, these were routinely lost.

Other circumstances intervened as well. Manuscripts pages and illustrations were seen as works of art. The richly illustrated "Illuminated Manuscripts" were greatly prized by collectors, with acquisition activity beginning in the nineteenth century and increasing into modern times. One result was the removal of pages, cutting the illustration from the page or often simply disassembling the manuscript entirely. Yet another issue was simply destruction of artifact or manuscripts—both intentionallyFootnote 47 and accidentally.

8.2 Provenance of Digital Assets

As noted above, digital representations, once created, helping to establish provenance of the physical counterpart. Recovered text and imagery may have immediate value by adding missing links to an artifact’s history. In many cases, a ripple effect is generated whereby establishing provenance for one contributes to establishing the lineage of others, and the cycle repeats and increases as a larger number of scholars contribute.

Provenance of digital representations can encounter obstacles at many points. Data in its original form as created, or “raw data,” generated by digital instruments must be changed to useful forms. Other scholars use these new forms as the starting point for their work. A central and important task is to trace newer forms back to the original dataset. Computations and transformations are not necessarily reversible—reverting to source datasets may not be possible without a full record of prior actions taken. Establishing data provenance becomes something akin to tracing a daisy chain of data history. Therefore, it is important that efforts be made at the earliest stages of the data lifecycle to take such steps as to anticipate future use such that at any point in a project, the source data can be traced and referenced. This suggests an imperative for scholarly workflows to be fully documented. Unless this is done, data curation and stewardship tasks may become problematic, and others will encounter difficulties in reusing these data.

9 Conclusion

Digital libraries research and state-of-the-art computational approaches applied to developing, organizing and providing access to data resources and the means to use them are central to studying the human experience. Digital libraries research and practices, epigraphy and paleography have a natural affinity and play an important role in building the evidence base for accomplishing this. Records and artifacts from the distant past provide the starting point for documenting the human record. While epigraphy and paleography have long been part of the group identified as auxiliary sciences the status and value of these continue to grow in concert with evolving scholarly infrastructures. Archaeology has not been discussed in this Special Issue, but began integrating computing and information technologies into its research and data management activities nearly 50 years ago.

Although the papers in this Special Issue are focused on the Greco-Roman classical period, much of the substance and technologies discussed apply to studying other cultures as well. Hopefully future issues of the International Journal on Digital Libraries will contain papers discussing ancient written works from other parts of the world as well.