Special Issue: Epigraphy and Paleography: Bringing Records from the Distant Past to the Present

This special issue brings together three areas of research and scholarly work areas that would have demonstrated few obvious relationships three decades ago. Digital libraries research, practices and infrastructures have transformed the study of ancient inscriptions by providing organizing principles for collections building, defining interoperability requirements and developing innovative user tools and services. Yet linking collections and their contents to support advanced scholarly work in epigraphy and paleography tests the limits of current digital libraries applications. This is due, in part, to the magnitude and heterogeneity of works created over a time period of more than five millennia. The remarkable diversity ranges from the number of types of artifacts to the methods used in their production to the singularity of individual marks contained within them. Conversion of analog collections to digital repositories is well underway—but most often not in a way that meets the basic requirements needed to support scholarly workflows. This is beginning to change. In addition to efforts to develop complex data objects, linking strategies and repositories aggregation, there is a new use of imaging technologies and computational approaches to recognize, enhance, recover and restore writings. Most recently, leading-edge artificial intelligence methods are being applied for the automated transcription of handwritten text into machine readable forms. The articles in this special issue will give examples of each.


Written Works
Written expression is the most common form of recording and communicating the human experience of life on earth. This special issue will be concerned with written and inscribed works, from the earliest markings on stone and cave walls to the brilliantly crafted and elaborately illustrated manuscripts created by scribes in the centuries prior to the print era. Epigraphy and paleography are disciplines that study writings that pre-date the print era. By doing so, this work provides valuable evidence-based information to long-established fields in the humanities, social sciences and the arts. Many ancient works have been damaged, partially destroyed or in extreme states of deterioration. While museums and libraries maintain numerous collections with descriptive information, a larger number are scattered about, located around the world, undocumented, improperly referenced, lacking provenance information and held in poorly maintained physical environments. For those held in private B Stephen M. Griffin smg21601@gmail.com 1 Washington, DC 20016, USA collections, digitization is rarely done. Changes of ownership are frequent. Richly illustrated manuscripts are regularly bought and sold at auctions to anonymous buyers as investments, works of art or destined to be disassembled with individual pages and illustrations removed and sold. And perhaps most importantly, a great number are unknown and inaccessible to scholars for whom these might be of great value.
Digital libraries research and development have had a transformative effect on these two areas along with those disciplines that draw upon them. The advantages of digital versions of inscribed materials are manifest. In the global internet environment, geographically distributed collections can be linked, and content replicated, disseminated, shared and repurposed. Open access to digital versions of invaluable works is now possible for those physical collections for which direct access was difficult, restricted or prohibited.
Epigraphy and paleography research, along with contemporary research in almost every scholarly discipline, is becoming based on computationally intensive, multi-stage workflows using diverse datasets. Digital libraries research has continually contributed to epigraphic and paleographic studies by providing tools and services that support this transition. In return, digital libraries research has benefitted from unique challenges posed by the sheer magnitude and remarkable diversity of inscribed works created over a timeframe of nearly five millennia. Development of digital libraries practices and infrastructures are further prompted by appeals from an international community of scholars for a continuous stream of evidence-based information and resources to advance their work. The period of time between the availability of new resources and user uptake is very short and quickly followed by requests for yet more. Dynamic, iterative cycles of new technologies, data resources and user demands continue to expand in number and type. When a particular development is seen of value to a broader community, it quickly becomes part of scholarly infrastructures.
It is now feasible, both technologically and economically, to create extremely accurate digital facsimiles of a wide variety of inscribed works. In addition to creating high-resolution digital images of the work in its current state, new technologies together with computational methods have proven effective for digitally reconstructing, remediating and restoring ancient inscribed works on the many media upon which they were scrawled, engraved, etched, embossed, painted or penned. These efforts have rendered many artifacts legible and coherent for the first time while leaving the original in place, undamaged, intact and unaltered. Some accomplishments have been dramatic, such as the digital unrolling and deciphering of scrolls turned to charcoal by flash combustion, exposing text invisible to the human eye and penetrating layers of media to reveal overwritten text.

Epigraphy and Paleography
Epigraphy will be taken here as that research and scholarship concerned with inscriptions on durable materials until the widespread use of the codex, an early form of the book. 1 Epigraphic research involves identifying and deciphering the content, establishing the origins and context of creation and to, as best can be done, determine the use and purpose. In short, the goal is to build a scholarly description that is an authentic, readable and coherent work.
Paleography is taken here as research that focuses on handwritten texts in the form of codices and other types of long-form documents, including papyrus scrolls, created prior to the print era. Areas of study include examining text, script and substrate characteristics and techniques relevant to how the physical written artifact was created. Transcription and frequently translation of the text and other marking into a host language may be a primary goal. Like epigraphic scholars, paleographic scholars are concerned with the origins and circumstances that motivated the creation of the work.
The boundaries of both are indistinct, overlap and characterized by many specialist areas in keeping with the remarkable diversity of types of artifacts, inscriptive genres, writing techniques, orthographies, individual lettering forms and other markings. 2 Codicology is closely related to paleography, and there are differing views of the relationships and differences. 3 Both areas deal with ancient works that may exist in a many different editions, have multiple copies with slight differences and uneven translations. A single text may be the work of several authors some of whom may be fictitious. They may refer to outside sources that are lost, unknown or unrecognizable or to internal references that no longer can be found. Ancient versions of languages were fluid. Word order, grammars and orthographies changed with time and place. A nuanced orthographic change might lead to a different understanding. The Homer Multitext Project 4 is used by language specialists to explore these changes.
Understanding meaning and significance of the written content, both in the original form and copies made, can then be studied by classics scholars, historians and others from the humanities and social sciences. There is considerable subject matter that requires transdisciplinary collaboration. It is also common that those who specialize in ancient documents also are engaged in the latter stages of study.
The scope and reach of epigraphic and paleographic research, always broad, is rapidly expanding with the addition of digital collections and related resources. Webaccessible cataloged collections number in the thousands, individual volumes and artifacts number in the millions and a seemingly uncountable number of letterforms and unique, but meaningful markings. Collections are widely-dispersed, often constructed to meet local needs and may not conform to even the most basic standards necessary to facilitate discovery and aggregation.
While current digital libraries principles and practices can be invaluable in creating repositories in the form of images and accompanying annotation and metadata, transcription of large corpora of manuscripts requires automated transcription systems. Artificial intelligence and machine learning software platforms are being tested, with measurable success. This topic will be explored in a later section of the paper.

Aggregation, Repositories and Digital Libraries
Over the past several decades, digital libraries researchers working content providers, technologists and other stakeholders have developed standards for data representation, description and linkage strategies for distributed repositories. The degree of functionality across collections depends on the types and nature of linkages. Although basic interoperability for most repositories of epigraphic and paleographic content is still based primarily on descriptive, non-standard metadata, annotations and hyperlinks, more mature repositories and digital libraries have features that assist scholars in finding and compiling material. New tools to assist epigraphers and paleographers in cross-repository exploration, image collection, manipulation and comparison are continually under development. Digitization of manuscripts is frequently done using flatbed scanners or photography to create page images. Enhancements using studio lighting techniques and computational photography are not uncommon. These methods are less expensive than laser scanning and multispectral illumination used in projects that have more financial support and in-house expertise. Creating high-fidelity images from legacy collections of analog slides or microfilm copies often proves to be difficult. Numerous sites include basic image processing options for users. More developed sites incorporate tools such as the Mirador Viewer, 5 a full-featured open-source software platform that provides for image annotation, workspaces for retrieval and comparison of images from geographically distributed repositories, APIs and multi-lingual support. It is optimized for repositories that are International Image Interoperability Framework (IIIF) compliant. One example of a collection that incorporates the Mirador Viewer is the Virtual Manuscript Library of Switzerland. 6 Browsing and searching of collections is frequently enabled by interface design features unique to individual sites. Many sites have adopted a quasi-Boolean approach of using filters for search, retrieval and compilation of similar entities. Europeana, 7 a web portal and digital library of cultural heritage, is the host for many European Unionfunded projects. Most relevant to this special issue is the EAGLE Project. 8 EAGLE has created numerous resources for using Europeana's collection of more than 500,000 ancient manuscripts. It has also developed a multi-lingual collection of Greco-Roman inscriptions with annotations and peer-reviewed contributed translations. Of particular interest 5 https://projectmirador.org/. to users is the EAGLE Wikibase for using content and managing workflows. 9 EpiDoc is an international, collaborative effort that provides guidelines and tools for encoding digital scholarly editions of ancient documents. EpiDoc is an extension of the Text Encoding Initiative and enables encoding XML. It can address not only transcriptions and editorial annotations of texts themselves, but also aspects of the media on which the texts appear. As a result, search and retrieval of documents can be accomplished with a high degree of precision eliminating tedious trial and error often required with text retrieval methods. 10 Search of images in the form of photos, illustrations, maps, charts and other visual images is often an essential component within documents. The International Image Interoperability Framework (IIIF) is a set of standards that support cross-repository search and retrieval to other IIIF-compliant repositories and allows researchers to perform a wide variety of workflow tasks. It is a foundational requirement for highly functional repositories containing visual content. 11 The six IIIF APIs 12 are continually being implemented into global repositories.
Web portals link users to a large number of collections, exhibitions, tools and discussion forums. There are numerous international efforts to catalog and describe online collections and to develop more detailed standards for integrating collections, tools and resources. The Digital Classicist is a hub for scholars, students, professionals and others interested in the digital humanities and application of computational methods to the study of the ancient world. As well as cataloging projects, tools and other resources, it also is a hub for discussions that play an important role in bringing communities of like-minded scholars together. 13 Many collections are dedicated to specialty areas of which there are many. As examples, The Cuneiform Digital Library Initiative 14 is a full-featured site of approximately 500,000 images and text for Assyriologists. Papyri.info 15 aggregates papyrological documents from a number of papyrological collections as part of the Integrating Digital Papyrology project. 9 The EAGLE MediaWiki was the winner of the 2016 Digital Humanities Suite of Tools (http://www.eagle-network.eu/wiki/index. php/Main_Page). 10 https://sourceforge.net/p/epidoc/wiki/Home/. 11 The France and England Project: Medieval Manuscripts between 700 and 1200 and curated by the British Library brings together very high resolution of illuminated manuscripts, together with links to events, publications, social networks, blogs, etc. The images are IIF compliant. 12 a) Image delivery; b) presentation; c) authentication; d) content search; e) change discovery and f) content state. 13 The Digital Classist (https://www.digitalclassicist.org/). 14 https://cdli.mpiwg-berlin.mpg.de/. 15 https://papyri.info/.
Ideally, sites would also include a workspace for scholars to perform basic tasks of search and retrieval, sorts, grouping, selection of subsets and other tasks to minimize repetitive downloading and data transfer between platforms. While tasks can expedite somewhat using web resources, many scholars resort to the more common alternative-to download content to their workstation for detailed analyses. Since datasets can be very large, this can be an impediment. It also assumes that sites have open data policies and not all do. One open-access site that has incorporated a user Personal Workspace is the Europeana Eagle Project. 16 Users can save queries, annotate and save results and do similarity searches on uploaded images. The mobile app can identify, with some degree of accuracy, inscriptions taken with a mobile phone.
It is widely recognized that adding relational properties conforming to semantic web and linked data protocols (RDF) to collections results in a quantum leap in functionality and interoperability. Semantic descriptions describe and connect data entities by revealing relationships in ways not possible by search based on metadata. Linked data, even in the most basic form, can transform and expedite data discovery, gathering, analysis and presentation stages of the scholarly workflow. Geographic and spatial data more broadly are seen as opportune for linked data applications. One project under development for nearly 10 years aimed at promoting linked open data within the humanities communities is the Graph of Ancient World Data. Geospatial information will be discussed in the following section.
However, the prospects for widespread incorporation of such features in the near term are dim. At this time, ongoing work is focused on building web-accessible digital collections that conform to established principles and formats. Additional emphasis is placed on transcription of catalogs, creating multi-lingual descriptive metadata and encouraging open access to repositories. A great deal has been accomplished in that regard, and it speaks to the dedication and efforts of a relatively small community of scholars, library and information scientist and technologists working for the good of all.
Nevertheless, it is enticing to consider the possibilities. As repositories of linked data are created, automated linking among individual repositories begins and integrated networks of repositories are created. New compound data objects can be added that support complex operations and at the same time document a history of use. These might also link to remote aural, visual and multimedia content, streaming media, social media, analytical tools and even systems of real-time remote sensors. For the scholar, each level of added functionality not only makes common tasks easier and faster, but also brings new methodologies and powerful analytics to 16 https://www.eagle-network.eu/resources/search-inscriptions/. a research project resulting in novel and qualitatively different approaches. The benefits of these capabilities cannot be overstated. The impact is such that research that would have taken years to accomplish only a few decades ago can now be completed in a matter of months.
One of the most full-featured and important digital libraries for scholars in the classics and beyond is the Perseus Digital Library. Being a resource under continuous development for more than three decades, the Perseus collections and tools brings a wealth of information, tools and services to meet the needs of scholars and students and members of the general public. The Perseus Digital Library incorporates the Scaife Viewer 17 in which more than 2000 works in Greek and Latin can be read both in the original language and translated versions. Perseus content is baseline data for scholarly works in the classics. It is also cited in best-selling novels and non-fiction. The Editor-In-Chief and developer of Perseus is Gregory Crane from Tufts University, USA. He and coauthors have two papers in this Special Issue.
Contemporary studies of epigraphic and paleographic sources are providing new insights into socio-cultural histories of peoples and places. New generations of Natural Language Processing tools are becoming widely available to support such work. One example is the Classical Languages Toolkit 18 applicable to multiple ancient languages. It is a part of the CROSSREADS 19 project funded the European Research Council (ERC).

Geographic Considerations
Geographic information sciences and systems (GIS) have long been a part of research in many domains, but more recently, these have demonstrated considerable value in studying the ancient world. New forms of digital geographic information combined with other spatial considerations have proven to benefit the works of epigraphic and paleographic scholars.
Modern gazetteers contain a wealth of descriptive information and help to disambiguate uncertainties associated with recorded place names and location of ancient sites. The information is critical to the spatial analyses needed in epigraphic and paleographic study. Pleiades, 20 a long-standing project, is a community-built resource that provides a large corpus of historic content, gazetteers and related geographic information about the ancient world in various forms. Key concepts that organize material are places, locations, names and connections. Content is structured and described in ways 17 https://scaife.perseus.org/. to facilitate and enhance scholarly workflows by enabling advanced search, visualizations and linkages to other sites. It encourages users to contribute and participate by adding and improving geographic information about the ancient world. Pleiades serves as a primary resource for a sizeable number of mapping and other projects. The Ancient World Mapping Center uses Pleiades data in an interactive digital atlas application for creating custom maps. 21 The Pelagios Network 22 is a collaborative project aimed at understanding the ancient world and its material culture in geographic terms as well as textual description. To create and maintain geographic connections, Pelagios draws on the community expertise of humanities scholars working with geographic data. The primary gazetteer used by participating projects is that provided by Pleiades. A primary goal of Pelagios and the "glue" that holds its many activities together is the focus on building and implementing linked data resources across geographic datasets associated with the ancient world. Pelagios group's its activities into several primary activities related to spatial analysis and semantic linking: The Annotation activity focuses on semantic geo-annotations for visual data entities; the Gazetteer activity aims to link external gazetteers by creating uniform place reference specifications; the Registry activity is concerned with registry services and the discovery of linked data collections for places and the Visualization activity focuses on the development of innovative tools for geospatial analysis. A partner project that has gained considerable praise is the Recognito initiative. 23,24 Recognito is an open-source platform with an extensive set of software tools relative to collections building, collaborative work and semantic linking.
Terrestrial maps of the day were inaccurate and changed frequently. Far from being static, the ancient world, as studied today, is increasingly viewed as fluid and complex in many way-names, places and perceived locations changed frequently. Maps and gazetteers, if they existed, were simplistic and likely inaccurate. The uncertainties about what might be over the horizon for inhabitants were likely a constant concern.
In contrast, sophisticated maps of the planets, stars and their movements were referenced in cuneiform tablets, in hieroglyphics and refined over the centuries to high levels of precision by Greek and Roman scholars. Celestial navigation for voyages, placement and alignment of monuments, temples and places of worship were frequently located based on 21 http://awmc.unc.edu/wordpress/. bearings relative to the sun and stars. Archeoastronomy is a growing area of study in this regard. 25

Digital Restoration, Remediation and Recovery
Advances in information and computing technologies over the past 20 years have made it possible and economical not only to capture high-resolution digital images of written artifacts in their current condition, but also to recover text from damaged and deteriorated works. This can be done using nonintrusive methods such that physical works are unharmed, left intact and unaltered. In many cases, the digital representation will prove to be of greater scholarly value than the original physical artifact by revealing textual markings that have become invisible or hidden with the passage of time.
Physical deterioration of written works is inevitable. Written works are prone to deterioration from the beginning and may proceed over the course of centuries depending upon the types of media, the chemistry of writing fluids, storage environments and other external factors. The result in almost every case is to distort or obscure the original work. Text fades, bleeds and blends with adjacent characters. Pages are subject to discoloration, mold and mildew. Bindings become brittle and crack, pottery breaks , 26 stone erodes and metals oxidize. In some cases, text and markings fade to such a degree that they become invisible. This "hidden text" has been shown to have information critical to understanding as is the case with marginalia, scholia, edits and other markings added at later dates.
A number of new technologies have been developed to recover text. Of these, multispectral imaging has proven effective in many cases. The Lazarus Project based at the University of Rochester uses a multispectral camera system. Manuscripts are photographed at discrete wavelengths of light, which penetrate to different depths of the substrate to reveal text at different levels. The multiple images captured are then combined, or "stacked," and processed using statistical algorithms to complete the recovery and reveal hidden text and images. 27 The clarity of the processed images is such that new scholarship can be done. The Recognito platform of Pelagios is used to host, share and annotate images.
Palimpsests were a special type of manuscript made to be reused. Pages could be erased to allow for new writings.
Generally, faint traces of earlier writings remained. Such was the case of a tenth century palimpsest containing a scribal copy of an Archimedes treatise. Later, the text was erased and overwritten. Multispectral imaging techniques proved effective in recovering some overwritten text, but the heavily damaged pages required special X-ray scanning provided by a laboratory at Stanford. 28,29 More common is the planarity distortion of pages interfering with scanning and readability. One such distortion affecting pages is "cockling"-bulges, creases, warping and so forth that can affect an entire manuscript. The Venetus A manuscript, a tenth century Byzantine manuscript of the Iliad and the one on which modern texts are primarily based, had deterioration in this regard. Image capture using flatbed scanners or photographs would result in 2-D images with significant distortion of the text. Three-dimensional scanning combined with virtual flattening algorithms was developed by the EDUCE Laboratory at the University of Kentucky 30 to create a digital facsimile of the original 645-page manuscript and prepare it for transcription by classists at the Harvard Center for Hellenic Studies. 31 Building on this technique, sophisticated set of algorithms were used to digitally "unroll" scrolls-even heavily damaged ones. A recent project by the EDUCE Laboratory succeeded in "virtual unwrapping" and deciphering parts of the En-Gedi scroll, one of several recovered from a 600 CE site. The parchment scroll had been burned and carbonized by heat from the eruption of Vesuvius in 79 AD. Merely touching the scroll caused disintegration. Recovering the text required a non-invasive digitization and X-ray-based micro-computed tomography. The project received extensive media coverage in both scholarly publications and public interest programs. 32 The richly illustrated "Illuminated Manuscripts" are particularly prone to changes. A scribe's palette might contain more than a dozen different pigments. Numerous technologies are effective in analyzing chemical structures of those pigments used in illustrations. Examples are multi-sensor hyperspectral imaging and Raman spectroscopy. Often, it is the logistical obstacles associated with the transport of rare and precious manuscripts to laboratories equipped with instruments for analysis. Insurance, transport risks as well as 28 https://www.archimedespalimpsest.net/. 29 https://thewalters.org/news/lost-and-found-the-secrets-ofarchimedes/. 30 The Digital Restoration Initiative laboratory at the University of Kentucky has developed pioneering methods for recovering writings from severely deteriorated artifacts. The paper by Brent Seales, Director of the Laboratory and co-author describes state-of-the-art techniques for these tasks. 31 Robot Scans Ancient Manuscript in 3-D.
Developing and including remediated works in digital collections presents special requirements. Codices, scrolls, illustrations and computer-enhanced artifacts in general are compound data objects that have a "digital history." It is important to capture and make available steps taken and intermediate stages in the remediation process as well as the final product. 33

Automated Transcription
A valuable addition to the study of paleography is new research on "artificial paleography"-the application of artificial intelligence and machine learning technologies for optical character recognition (OCR) and handwritten text recognition (HTR) for ancient written works. Automated transcription of ancient manuscripts would relieve scholars of the tedious tasks that must be done before analysis and interpretation can begin by giving them the means to search, retrieve and otherwise access content in the same fashion as a modern text file. It might also lead, eventually, to discovery of related manuscripts across distributed collections.
Optical character recognition software for many languages has been available for decades. Software for recognition and conversion of cursive scripts and alphabets with connected scripts, such as Arabic and Indic script, is still an active area of research. For ancient manuscripts, the tasks are yet more challenging. Medieval manuscripts have unique forms of lettering systems that vary widely across time periods, regions, languages (many of which are no longer in use), scribal styles and a large number of unique marks. Machine learning uses training data to infer the rules to be used in recognition and transcription. This approach has proven accurate in a number of cases, but less successful for script variants. As a result, different sets of training data for each variant of scripts and for each alphabet in which the manuscript is written are necessary. 34 Examples of ongoing work in this area include the "Research Environment for Ancient Documents (READ)," an open-source web-based platform that provides a number of tools for converting images of orthographic units for transcription. It also has been used to link host language with translations. Originally developed for Indic languages is has since been updated for testing on others, including Latin 33 iiif-discuss@googlegroups.com. 34 Project Muse https://muse.jhu.edu/pub/1/article/853521. epigraphic texts. 35,36 A related project is the EU-funded Transkribus project that has developed a wide range of AI software for converting images of many handwritten texts, including Medieval Latin scripts, to machine readable text. The web-based platform is noted for its ease of use and scope of applications. A character error rate shows the percentage of transcription errors by the neural network software. 37 Kraken is a text recognition software designed primarily for connected scripts. The starting point is the generation of high-quality scans-the total number is related to script type and features. Transcription is accomplished locally on HTML platforms. Transcription is limited to "diplomatic" formatted pages. There are numerous sites that give instructions of how to use Kraken and other semi-automatic transcription programs. In general, it is not a one-step process and may involve multiple stages in which several different transcription software are used. preprocessing or application of multiple transcripts. 38 eScriptorium builds on Kraken in the sense that it provides an integrated set of tools for transcription, annotation, translation and other tasks for working with historic documents including ancient manuscripts. 39 The source code for these and other handwritten text recognition software is freely available on GitHub.

Scholarly Infrastructures
Computing, information and communication technologies together with digital content have become an essential part of contemporary research in almost every scholarly discipline. The concepts and definition of "infrastructures" based on these have changed numerous times in the past several decades. Initially viewed in terms of computation, networks and software, scholarly infrastructures are now seen as including and perhaps primarily based on digital content and suites of services that deliver data in a meaningful form. 40 35 https://github.com/readsoftware/read. 36 https://pric.unive.it/projects/read-latin/home. 37 https://readcoop.eu/transkribus/public-models/. 38 https://kraken.re/2.0.0/training.html. 39 https://escripta.hypotheses.org/423. 40 The idea of developing an "Information Infrastructure" was put forward in the Federal High-Performance Computing and Communication program (HPCC) in early 1993. Funding was requested as part of the 1994 Supplement to the President's Budget. [https://www.nitrd.gov/ pubs] The vision of a "global information infrastructure" was vague from the start and in retrospect viewed as exceptionally narrow. The purpose was to support scientific research in the national interests and to expand efforts to build a National Research and Education Network-soon to become named the "Internet." 1993 proved to be an auspicious year: the World Wide Web platform was placed into the public domain; the first web browser, Mosaic, was developed, and the NSFNet internet "backbone" was upgraded to a T3 capacity of 45 mb/sec. Despite the Digital libraries research has helped to identify core principles and best practices for creating, managing, linking and using large-scale data resources. A fundamental belief is that digital information has greatest value when it becomes part of a dynamic, growing and globally linked knowledge infrastructure-one that continually expands in scale, functionality and reach. 41 Yet insight into how best to approach infrastructures development remains elusive. Data originate from many sources containing heterogeneous data objects organized in many ways, delivered via diverse platforms by equally diverse users from all over the globe. It can be modified with ease, replicated and disseminated across continents with just a few keystrokes. As a result, there is a proliferation of data entities that have no permanent home, constantly circulating in the global networked environment and being used without knowledge of origins or integrity. The versatile and pliable nature of digital information that make it so valuable for transformative scholarly work also presents serious challenges for data preservation, archiving, curation and stewardship. The task to exploit and manage the inherent unpredictability and instability of dataflows becomes one of the managing complexities. 42 It is becoming apparent that different types of information infrastructures serve the needs of scholars and researchers in different subject areas while at the same time maintaining fundamental and strong linkages to each other. It is also becoming apparent that at certain stages of development, infrastructures become self-organizing and self-generative as user input catalyzes further refinement and growth. When data resources and applications are seen of value to a broader community, they quickly become part of the scholarly infrastructures that support the workflows of domain research.
The open scholarship is now taking hold in many disciplines. 43 The potential positive impacts of transparency of scholarly work and the scholarly enterprise are such that it is widely endorsed by members of the academic community as well as multinational organizations including UNESCO 44 and the OECD. 45 Scholarly workflows across many domains have become more complex, involving multiple stages, numerous datasets, many computational operations and presentation modalities. It is critical that there be a detailed record that sets forth sufficient documentation such that the veracity of results can be established and data resources repurposed and reused. Workflow management and documentation can be integrated into scholarly research to some degree. Project Jupyter has been proven to be invaluable in this regard. The web-based platform allows users to configure and manage projects and resources used based on a "notebook" concept. Prior examples of notebooks constructed for a variety of applications used are available on multiple websites. 46 For the scholar, finding and using all that scholarly infrastructures offer may present difficult challenges. To assist scholars in finding and using content, tools and other resources, many university libraries have created digital scholarship laboratories staffed by data scientists and other technology and content specialists. The need for advanced expertise to create and extend the usability of data objects and tools has prompted university departments of computing and information sciences to add course tracks for data sciences, information cultures, platforms for collaboration and data stewardship which are just a few examples of these.

Provenance of Physical Artifacts
Prior to creation of digital repositories, the provenance of a handwritten work could prove to be very difficult. Scholars, frequently working alone, could scour reference works to help in tracing the journey of the artifact from the time, 43 Some disciplines have already embraced this over the past several decades, most notably physics and astronomy. One reason is that many researchers rely on unique large-scale, very expensive devices used in experiments. Examples are the CERN particle accelerator and the Webb Space Telescope. Vast amounts of data are collected for a single experiment. The CERN accelerator that detected evidence of the presence of a Higgs boson produced petabytes of data each second. place and source to the present, as well as consulting colleagues and experts exploring similar works. But frequently, there were often gaps (lacunae) in the story. Manuscripts that contain geographic references help to establish dates, places and diffusion of copies. Many different materials were used in the making of codices, and these varied according to time period and location. Analysis of these also provides valuable information. It was not uncommon for papyrus scrolls to have tags attached giving the name of the author, subject and other important metadata. However, these were routinely lost.
Other circumstances intervened as well. Manuscripts pages and illustrations were seen as works of art. The richly illustrated "Illuminated Manuscripts" were greatly prized by collectors, with acquisition activity beginning in the nineteenth century and increasing into modern times. One result was the removal of pages, cutting the illustration from the page or often simply disassembling the manuscript entirely. Yet another issue was simply destruction of artifact or manuscripts-both intentionally 47 and accidentally.

Provenance of Digital Assets
As noted above, digital representations, once created, helping to establish provenance of the physical counterpart. Recovered text and imagery may have immediate value by adding missing links to an artifact's history. In many cases, a ripple effect is generated whereby establishing provenance for one contributes to establishing the lineage of others, and the cycle repeats and increases as a larger number of scholars contribute.
Provenance of digital representations can encounter obstacles at many points. Data in its original form as created, or "raw data," generated by digital instruments must be changed to useful forms. Other scholars use these new forms as the starting point for their work. A central and important task is to trace newer forms back to the original dataset. Computations and transformations are not necessarily reversible-reverting to source datasets may not be possible without a full record of prior actions taken. Establishing data provenance becomes something akin to tracing a daisy chain of data history. Therefore, it is important that efforts be made at the earliest stages of the data lifecycle to take such steps as to anticipate future use such that at any point in a project, the source data can be traced and referenced. This suggests an imperative for scholarly workflows to be fully documented. Unless this is done, data curation and stewardship tasks may become problematic, and others will encounter difficulties in reusing these data.

Conclusion
Digital libraries research and state-of-the-art computational approaches applied to developing, organizing and providing access to data resources and the means to use them are central to studying the human experience. Digital libraries research and practices, epigraphy and paleography have a natural affinity and play an important role in building the evidence base for accomplishing this. Records and artifacts from the distant past provide the starting point for documenting the human record. While epigraphy and paleography have long been part of the group identified as auxiliary sciences the status and value of these continue to grow in concert with evolving scholarly infrastructures. Archaeology has not been discussed in this Special Issue, but began integrating computing and information technologies into its research and data management activities nearly 50 years ago.
Although the papers in this Special Issue are focused on the Greco-Roman classical period, much of the substance and technologies discussed apply to studying other cultures as well. Hopefully future issues of the International Journal on Digital Libraries will contain papers discussing ancient written works from other parts of the world as well.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.