1 Introduction: littera scripta manet (“the written word remains”)

Over the centuries writing has migrated from the painted surface of stone walls to the grooves of inscriptions to the strokes of quill, pen, and ink. From its inception, written text has necessitated the combination of language with a material substrate, marrying the abstract nature of words to the materiality of the earth [1]. Given its durability, stone served as an obvious first and enduring choice. While written language is approximately 5000 years old [2]—an age that, in the timeline of human experience, seems quite ancient—5000 years represents only a brief moment in the geological formation of the earth’s extruded foundational layer of bedrock, which itself can last hundreds of millions or even billions of years [3]. Carving text into stone suggests permanence; it takes millennia to both form and efface rock.

While we still inscribe stone (tombstones, monuments) and we continue to write on skin (tattoos abound), the seeming abandonment of our quotidian method of writing on paper in favor of electronic digital composition has progressed at a breathtaking pace. In less than two decades, e-mail has replaced handwritten letters; e-docs have replaced typeset manuscripts; even hastily penned familial notes posted on the fridge have given way to text messages sent via mobile phone [4]. Computers have made it possible to generate and save texts at lightning speed and then eradicate them just as quickly. The digital revolution has freed the written word from its need for a material conduit.

It is an interesting irony, therefore, that the quick and ephemeral work of computer technology offers a new approach to the stony, staid study of the “carved in stone” epigraph. Texts inscribed in stone or metal actually outlast the presumed “backup” longevity of the modern computer’s memory system [5], yet the power and speed of digital processing expand the boundaries of epigraphic study beyond what could have been envisioned just a few decades ago. This essay is about the technical approaches to the materiality of inscribed text and how the established physical methods of study and conservation are steadily and rapidly making way for the innovative methods and abstract representations of digital technologies.

A review of past, present, and future approaches to the study and analysis of epigraphic texts provides a distinctive vantage point from which to consider the convergence of written language and computer technology and the resulting emergence of the digital library. Our work in the digitization and digital restoration of various artifacts [6] has by no means been central to the development of epigraphic analysis as a whole, but we share origins in the digital revolution of the 1990s and the subsequent emergence of the digital library. Our work on digital methods and heritage objects, leading to developments in Heritage Science, thus intersects with epigraphic methods and scholarship. At this nexus, we find some interesting stories and personal recollections that prove insightful. We wish to emphasize some important trends and share a few anecdotes from our experience in digitizing and digitally restoring various objects.

In particular, three key trends have conspired to usher in a new era of analysis and conservation, not just with epigraphic texts, but with material artifacts of all kinds:

  1. 1.

    Digitization or the advent of algorithms based on digital representations together with computation.

  2. 2.

    Elemental analysis, the precision and capability of which has accelerated in recent years to provide unprecedented understanding of the material composition of objects over a range of scales that is astounding.

  3. 3.

    Big data and the space of associated machine learning approaches, which have broken through and stand to transform aspects of our understanding thought to be longstanding and settled. Undoubtedly a product of the first two innovations, this new lens of massive data collections is engineered and structured so that signals long thought to be vanished—or indeed, never known to exist [7]—can be rescued from the brink of destruction and nursed back into significance or brought to light for the first time.

In this essay, we discuss the trends and advances related to the aforementioned technical currents, identifying today’s technologies and the capabilities they enable. More than just “technology for technology’s sake,” digitization, elemental analysis, and machine learning—when applied in tandem with each other to epigraphic texts—stand to reinvigorate scholarly exploration and advance the goal of conservation and nondestructive analysis. As solid and staid as text in rock may seem to be, those words still disappear, if not from wind and rain and the geological forces that make rivers run and mountains move, then from human hands, human conflict, and human neglect. The time is now to protect and preserve by using technology to its fullest extent in the service of scholarship and reduced risk of loss.

2 The analog prologue

Significant scholarly interest in epigraphy developed during the European enlightenment of the fifteenth century and, of course, relied on analog methods. At its inception, written language was married to a physical container, making writing one of the earliest forms of analog recording. Artists and scribes hand-copied what they saw, read, and felt onto the earliest known manuscript bases: cuneiform, stone, papyrus, parchment, paper, and even bronze plates.

The rubbing is one of the earliest known facsimile techniques, readily adopted for copying epigraphy because of the relief of the incised text [8]. Epigraphic texts and textures have long been accurately transcribed from carved substrate to written material using this method. To create a rubbing, material such as papyrus, parchment, paper or even cloth is draped over the epigraphy, and charcoal, chalk, or other material is used to blacken the pressed material, creating dark and light areas relative to the pits and bumps of the underlying surface. Ancient rubbings captured a more exact reproduction than transcription; and, unlike photography, which would come later, requiring the development of a physical apparatus (the pinhole camera and ultimately the optics needed for image formation) and a chemical breakthrough (photosensitive materials), rubbings relied on simple, readily available materials to produce a replica 2D image.

Probably one of the most effective facsimile methods for epigraphy that became widely used in Europe with scholarly expeditions to Egypt (e.g., Richard Lepsius, 1842–45) is that of paper squeezes [9]. The concept of mold-making has been around for thousands of years, going back to the bronze age when fighters relied on stone molds to create spear tips [10] and artists used molds to make figurines [11]. But the envisioning of an epigraphic surface as a mold to which a laminate of paper can be made to conform—which is what constitutes a “squeeze”—cannot be traced back more than a few hundred years.

While epigraphic scholarship was well underway, the introduction of squeeze techniques brought a new dimension to the work, literally. The squeeze creates a reverse image in 3D of the epigraphic surface at a 1:1 scale, and as such represents an incredibly powerful facsimile of an object that is inherently 3D. The first time we encountered authentic squeezes was at the Ashmolean Museum in Oxford, England while organizing a planned digital scan of the Parian Marble. The inverse image, peeled from the hard surface of the marble rock, created a surprising mirrored “abstraction” that factored out the shiny, flecked marble appearance and emphasized instead the shape of the incisions. After encountering difficulties in digitizing the Parian Marble due to the reflectance property of the stone, (see Sect. 3.2), we realized that squeezes offer another, and sometimes maybe even better, pathway to creating 3D digital models of inscriptions.

The interest in epigraphic scholarship and its associated techniques for analog facsimile accelerated in the mid-to-late nineteenth century due to both the advent of photography, the growth of classical epigraphy, and the explosion of Egyptology through the work of pioneers such as Henry Breasted (1865–1935) [12]. These events pushed existing analog methods further as they began to incorporate photography and photographic reproduction.

Although likely seen in its early days as technical wizardry, photographic facsimiles did not solve all the problems of epigraphic facsimile due in large part to two serious constraints: the need for perfect lighting and the sheer dimensionality of incised material. A single 2D photograph could not capture features like depth and surface orientation. In the same way that the rubbing could be misleading because of its degrees of shading being dependent on many different variables, so too did photography run up against its limitations.

While this trifecta of approaches—squeezes, rubbings, and photographs—creates a powerful suite of analog representations, a lack nonetheless persists. None provide a true sampling of the original material properties: the way the surface reflects light, the chemistry of the materials, the weight and density of the substrate, for example. The photograph captures only rudimentary reflectance information, which is highly dependent on the incident lighting available. The rubbing represents even less of the surface properties in any direct or measurable way. The squeeze eliminates all but the dimensionality: The 3D papier-mâché-like material is white or light gray and shows only the reverse topography of the epigraphic surface.

Elemental analysis added an ability to understand materiality, helping the epigrapher to glean evidence about the composite origins of objects. Such information that goes beyond the text alone could assist with both geographical and chronological attributions. Initially conducted through inspection of small samples removed for study, materials analysis became a more mature process with the development of X-ray fluorescence (XRF) techniques. XRF emerged in the 50 years following the discovery of X-ray in 1895 by German physicist Wilhelm Conrad Roentgen. By 1913, scientists had discovered that when stimulated, an element emits characteristic X-rays whose frequencies are proportionally related to the atomic number of that element, a phenomenon now known as Mosely’s law. By 1928, XRF techniques based on this phenomenon were being used to perform quantitative analyses of materials, and by the 1950s commercial detectors were available, turning XRF into a practical technique for elemental analysis [13].

With this analog evidence for epigraphic material—a suite of rubbings, photographs, squeezes, and chemical/geological analysis from the material itself—scholarly work advanced and even thrived over a long period of time. The publication of scholarly works captured it all in essential print-based form, often obscuring the scholarly decision-making internals as well as the technical apparatus used in the arrival at specific findings.

3 The digital era

For millennia these epigraphy-specific analog methods, whether individually or combined with each other, formed the entirety of a scholar's ability to represent, transport, publish, and study epigraphic material. The methods produced such a weak facsimile, however, that very often the scholar still needed to be in the presence of the actual material to conduct a satisfactory study. Nonetheless, this analog era set the stage for a very real “digital renaissance,” ignited by the interest in and funding for digital libraries. We benefited from the NSF’s support of the Digital Library Initiatives [14, 15], working with others toward new methods by taking advantage of funding opportunities. This era saw a change from “humanities computing” to “digital humanities,” with an associated acceleration of work in many fields, including that of epigraphic analysis [16].

3.1 Digital photography

The advent of high-resolution digital photography offers innumerable advantages—not unique to epigraphic applications—over analog. But specifically for epigraphy, digital photography eliminates the time-consuming and imprecise cycle of lighting/film development/analysis required to optimize the readability of epigraphic material. Incisions are often non-planar and sometimes specular, creating a uniquely challenging set-up for film cameras. In addition, many epigraphic materials exist in situ in environments where lighting control is not straightforward.

With digital photography and controllable lighting, however, large numbers of images under all kinds of lighting conditions can be captured on a short time scale, reviewed and re-captured as necessary, and then transported anywhere, anytime, to be viewed and studied by anyone. Also, the electronic capture provides instant, born-digital images ready for computational/algorithmic enhancement, a task that is multiple steps away in film-based imaging. Associated algorithms for storing, processing, balancing, sharpening, aligning, and otherwise improving the quality of those images through computational means open up entirely new pathways for analysis.

Independent of how images are captured, the lighting required to create an image of an epigraphic text has always been tricky. The 3D relief creates shadows that can be misleading in the imagery, and the surface characteristics of stone—its response to incident light—have always been a challenge to capture faithfully and accurately. In early analog days, the lighting situation was primitive, with flashes and harsh lighting that proved difficult to control. But just as digital photography changed the landscape for image capture, the digital revolution also produced new lighting systems controlled with precision through the very computational systems managing the camera sensors and the massive data they produce. The LED light now enables very narrow controllable bandwidth light to work together with the digital camera to overcome the analog pitfalls of systems past.

3.2 3D Structured light and photogrammetry

With superior controlled incident light and high-resolution digital photography, other techniques quickly emerged. Digital photography made the viewing of epigraphic texts nimble and mobile, yet a vital component of the in-person experience remained out of reach: 3D. Photography detaches the text from its grooved and chipped container, but the importance of acquiring metric 3D information of an incised text cannot be overstated. When discussing his attempt to capture shape detail by lighting an object from the surrounding four corners, the renowned, innovative epigrapher James Henry Breasted bemoaned that “Even a group of eight such negatives would not record all that the wall discloses to the eye of the trained epigrapher … for a badly weathered inscription on stone contains much which is visible to the trained and experienced eye, but which nevertheless is too faint and confused to be recorded photographically,” [12]. While the analog squeeze was revolutionary for its 1:1 scale capture of an epigraphic text, it lacked the color detail that photography could provide. Today’s digital 3D models are much more versatile. They admit visualization possibilities, such as lighting modifications and texturing models, that enhance the view of surface structures, thus enabling scholars to tease out shapes and features that mere squeezes may possess but are nonetheless difficult to elicit precisely.

Advancements in computer vision, applied and diffused across many different fields of study, eventually enabled field-ready 3D acquisition capabilities. Built on the remote sensing and analog measurement systems of the twentieth century, the Silicon Valley boom of the 1980s and 1990s fueled the pioneering of computer vision work. A translation of ideas occurred—from specialized systems for applications in manufacturing and the measurement of landscape surveys and mapping—into the framework of the emerging digital camera, light source, and computational environments. A new focus on camera calibration, feature detection and matching, and large-scale optimization techniques to incorporate evidence from many captured images brought close-range 3D reconstruction to the desktop “personal” computer.

This advancement made practical 3D reconstruction a reality, and by the 1990s, cameras and computer hardware had evolved to the point that 3D imaging was possible using field-deployable structured light systems and digital photogrammetry tools. In structured light systems, a pattern of light, such as alternating stripes, is cast onto an object, and numerous cameras placed at different angles capture the way the object’s shape distorts this pattern. Precise measurements of these distortions are used to calculate the 3D coordinates of any detail on the object's surface. Photogrammetry, which is almost as old as photography itself, similarly relies on 2D photographs to determine 3D shape. It applies triangulation to a stack of photographs taken with cameras placed at different locations. “Lines of sight” from each of these cameras to various points on the object are calculated, and their mathematical intersections produce the 3D coordinates used to create a digital 3D model. Photometric stereo, now emerging as a practical in-the-field approach, uses controlled changes in light source positions to add very fine shape, surface, and texture details to the coarser photogrammetric reconstruction, extracting unprecedented resolution and detail.

Early applications of these 3D reconstruction techniques to epigraphy proved quite challenging. One of our own early attempts at the Ashmolean Museum in September of 2005 ended in disappointing failure due to our underestimation of these technical challenges. Just prior to a museum upgrade that would see much of the collection put into extended storage, we were given the opportunity to digitize and analyze the upper portion of the Parian Marble, an epigraphic legend acquired by Oxford University in 1667 and eventually given to the Ashmolean. The Parian Marble is a notoriously effaced and difficult to read inscription, and we hoped to use 3D cues from a structured light system to improve legibility as well as support some novel approaches for visualization and rendering. But despite the previously successful work of others using similar techniques on sculpture, [17], the structured light system we built from scratch in our laboratory did not perform well on the extremely specular material of the Parian Marble. In addition, the Ashmolean had commissioned the marble to be pressure-cleaned to a sparkly white in the 1980s, which not only washed away the old patina, making it even harder to apply structured light successfully, but likely abraded the actual text even further. The specular surface of the shiny marble thus defeated our reconstruction methods. Nonetheless, we remain proud to have been allowed to do the work, and we prefer to characterize our attempt not as “first to fail,” but rather “first to try.”

3.3 Reflectance transform imaging (RTI)

The techniques in play for solving these problems, like structured light and photogrammetry, have evolved and matured over time. Another specialized approach that employs digital images and lights is reflectance transform imaging. RTI is a system and method for capturing, representing, and then manipulating/interacting with a model of the surface reflectance. RTI uses computer-controlled light sources in known locations, together with digital images captured as those sources are fired in sequence, to estimate the bidirectional reflectance function, a radiometric model of how light interacts with the surface of the substrate. Estimating this function at every point in the model gives subsequent algorithms the ability to render the model photo-realistically, in some cases improving the visibility of an object over what can be seen in the field in the presence of the real object itself. In fact, after emerging from storage to be re-installed in the Ashmolean, the Parian Marble was successfully modeled using RTI, demonstrating the technique’s power to visualize difficult epigraphic material [18].

With methods like RTI for improved lighting capture and visualization, and 3D photogrammetry for recovering the metric shape of an object, almost all the weaknesses of the analog models for epigraphic facsimile are overcome. As with squeeze techniques, surface shape is recovered. But unlike the white or gray papier-mâché, this shape includes estimates of the exact way the surface reflects light at every point. And unlike analog photography, with unpredictable lighting and difficult-to-repeat results, the management of incident light through spectral LEDs yields highly controlled, repeatable results over thousands of captured digital images.

3.4 Portable (and scanning) XRF for materials analysis

Techniques based on digital photography have given scholars an enormous boost in their ability to capture and visualize the shape and the reflectance properties of epigraphic material. However, they leave out the important aspect of material analysis. This field has also experienced the accelerated development of noninvasive, data-rich techniques and devices. Discovered only 120 years ago, X-ray analysis is now entering a new phase premised on digital sensors, focused control of source beam production, and unprecedented commercialization into focused sectors such as Heritage Science.

Data about the material properties of the substrate of an epigraphic piece can now be collected and digitized noninvasively in the field at an unprecedented level of resolution and convenience. Field-ready systems allow for periodic sampling of XRF-based measurements with a hand-held unit as well as a more complete scanning of the entire surface, where every point yields a chemical signature. This so-called “scanning XRF” allows a parallel image to be captured, where the pixels of the image are the XRF estimates of the chemistry at every point. Its usage in one analysis, for example, enabled scholars to discover elemental residual evidence of the tool actually used to create the incised text [19].

3.5 Field work—an ongoing challenge

It should be noted that the introduction of digital techniques did not remove the need for field work with equipment capable of being established and used on site. Epigraphic material is stunningly diverse, from Egyptian walls that measure many meters to tiny fragments in the corners of museums and libraries scattered around the world. Our own early experiences with digital systems necessitated work in the field in Puerto Rico creating digital models of Taino petroglyphs standing unprotected in the field [20]. While the technology had changed, the field work itself retained many of the same challenges that had always existed: access permissions, care to keep equipment clean and functioning in all environments, and mechanisms to prevent the loss of information before it can find its way to safe repositories. Our deployed team could have learned countless lessons from the field workers of old, who—despite using analog rather than digital gear—would likely have warned us against dropping our USB cables into the muddy waters at the feet of the vertical rocks to be studied. Likewise, they would have advised us that snakes, while not indigenous to the laboratory setting, could be a factor in the jungle near Utuado. And it is likely that scholars schooled in fieldwork from earlier generations carefully guarded their rubbings and squeezes and photography on their return trips. A rogue airport worker interested in a camera inside a checked bag can create a very large hole in the digital data from a remote site.

4 Beyond digitization: data science and the lens of big data

As we all have watched big data and attendant artificial intelligence and machine learning algorithms come of age during the last decade, Antonio’s observation from The Tempest seems never more prescient: “What is past is prologue.” In the shift of technology from physical/analog facsimile to digitization and digital facsimile, we now have moved beyond digitization merely for the sake of transport and reproduction. In this new post-facsimile era—the so-called “fourth paradigm of data-intensive science” [21]—scholars can capture measurements that enable previously unimaginable analyses of epigraphic materials. Such analyses provide new avenues of data-driven answers to the three big epigraphic questions: text restoration, geographical attribution, and chronological attribution.

This shift to large-scale data collection from objects for the purpose of advanced analysis has not necessarily been straightforward, nor has it been uniformly accepted by all corners of the scholarly community. The power of facsimile, preservation, and access through electronic means was quickly accepted and adopted, especially with more than a hundred years of analog study and publication leading up to that point. But detailed and massive data collection that goes beyond what is needed for “good” digital reproductions can be a more difficult case to make, especially when the reasons for capturing the data, such as supporting a large inference system to deal with lacunae, cannot be convincingly justified with results until after the data are captured and organized. Examples exist, however, of impressive findings that came from having collected such data—citizen science applications, scanning XRF, and textual evidence at very large scales—demonstrating the discoveries that can result from the shift toward data collection and analysis.

4.1 Citizen science and crowdsourcing

Tombstones and gravesites worldwide form the largest collection of epigraphic text, and crowdsourced “citizen science” applications help aggregate the work that laypersons are able and willing to do with smartphones and apps in the field [22]. Although the quality varies, the sentiment and the efficient implementation of crowdsourced work have never before been possible. Even with less-than-perfect data, the arrival of at-scale data collections forms a platform from which more advanced techniques can operate.

Beyond citizen science applications, crowdsourced work has found its way into rigorous academic communities, leading to high-quality scholarly communication and interaction. The international community involved with Epigraphy.info [23], for example, has as its mission “to gather and enhance the many existing epigraphic efforts,” and to facilitate “digital tools, practices and methodologies for managing collections of inscriptions.” Such examples of open scholarly communication are premised on digitization and digital tools for advancing the field.

4.2 Shape and materials

One important opportunity to arise from the capture of digital, metric shape information is the foundation that massive amounts of such geometric information can provide for understanding, interpreting, and analyzing the object. The paleographic analysis of letter forms, including those coming from epigraphic material, increasingly relies on relative, absolute, and normalized measurements of those letter forms. Once an object is reconstructed in 3D, its metric information is made explicit, something that was not possible from photography (only in approximation) and only implicit in the squeezes. But surface measurements from 3D reconstructions open up a rich dimensional landscape and support some surprising results.

First, surface information allows for normalization and the comparison of the features of epigraphic letter forms over a large corpus of material with varying surface properties. Second, other data measurement schemes can exploit shape information to more accurately correct for things that shape deformations otherwise distort. The recovery of an accurate shape model can help make paleographic measurements explicitly, apply corrections for the fact that epigraphic text is fundamentally 3D in nature, and facilitate advanced methods for improving the performance of other instruments—such as scanning X-ray fluorescence (XRF)—by taking the specifics of shape into account.

One such example is the use of XRF to scan and measure the elemental composition across the surface of an incised text. As mentioned before, XRF is a powerful tool for understanding and even enhancing incised evidence that may be effaced and eroded [24]. But the shape changes, especially on surfaces that overall are very far from planar, create challenges for scanning XRF systems, which normally make a planar assumption in the way they operate. However, shape models derived from accumulated data can inform such a scanning XRF system to either more accurately scan the 3D surface (through a control process that maintains a uniform distance of the sensor from the surface) or through a correction that can be applied to the collected data relative to the shape characteristics and a model of how those characteristics will influence the measurements [25,26,27].

4.3 Textual characterization and corpora at scale

Among the examples of applying deep learning/machine learning to the textual characterization of various epigraphic corpora, the approach of the Ithaca system [28] is a centrally recognized contribution. Premised on an earlier system and trained on the contents of the Packard Humanities Institute dataset of transcriptions, which contains the texts of 178,551 inscriptions, the results from the Ithaca work demonstrate several important trends. First, the synergy between scholars and the system framework dramatically improved the process and its accuracy. This interactive approach, together with the summarization power of the framework to encompass so many transcriptions, points to the future of such systems. Second, the deep learning framework brings a representational model to address the problem of inferring missing text (lacunae) based on a summarized large corpus, an essential component that was missing until the past few years of development in artificial intelligence. The framework itself has been shown to capture subtle patterns and to generate nuanced inferences based on correctly supervised/curated data. And third, the confluence of scholarly interest, prior work to digitize and make available large-scale data sets for analysis, and the representation models from deep learning are yielding at-scale approaches to corpora that far outpace what was possible just a few years ago.

The digitization of material, having given way to pure digital data collection, now has a purpose beyond just facsimile and access. Using digital collections, it is now possible to build and then apply a large-scale lens of data through which we can view large corpora and pose and answer new questions. At the same time, improvements in addressing the age-old questions of textual restoration, geographic attribution, and chronological attribution will lead to new precision and conclusions that challenge prior results.

5 The frontier

While the development of digital epigraphy and at-scale artificial intelligence approaches are certainly novel and ongoing, other innovations at the fast-moving digital frontier are rapidly advancing. The continued exploration with multi-modal imaging, which proved so difficult in the analog days before highly controllable and portable LED light sources and associated digital sensors were available, now encompasses even X-ray and computed tomography. Artificial intelligence applications are moving beyond the core questions of textual restoration and attribution and are now incorporating “style transfer” methods capable of envisioning stylistic renderings that approximate an object’s appearance prior to it being damaged. And as these new digitization and data science pathways have developed, so, too, have structured metadata approaches. Advances in metadata creation are enabling broad interoperability, wide-ranging search and experimentation over canonical datasets, and even digital provenance that captures the complex chain of digital manipulations applied to an object, so that its scholarly study and the development of a valid interpretive context is firmly supported.

5.1 Tomography leading to virtual unwrapping

Pushing noninvasive methods to the extreme, X-ray imaging offers the ability to recover evidence of text even when damage is severe or physical constraints prevent any other type of direct analysis. One of the first examples of incised text being studied in this way was the Antikythera Device [7]. This unique, enigmatic artifact was discovered in 1901 off the coast of the Greek island Antikythera and was long considered to be a mysterious and unknown object from antiquity. Thanks to noninvasive X-ray analysis in 2005, the structure and function of the device were discovered. More importantly in the context of epigraphy, the computed tomography (CT) of the device led to the discovery of a number of incised markings throughout the mechanism that were instrumental in understanding its construction and intended function. Those incised markings appeared prominently in the CT because of the nature of the text: incised in metal, creating a 3D shape that was captured by the volumetric nature of the CT.

The Antikythera mystery is likely not considered by the epigraphic community to be a typical incised or carved text, but its discovery and analysis thanks to noninvasive X-ray represents an important inflection point. Other use of micro-CT in the field at the time included the study of cuneiform tablets, again with incised markings, which enabled the analysis of interior writing that could not be seen without damage to the outer “envelope” of the objects. [29]

The trend from straightforward, conventional imaging and visualization toward more advanced processing took a big step forward in the work to read the Jerash amulets [30] with an early implementation of “virtual unwrapping” of the complex surface from tomography. Since the Jerash amulet was inscribed, the primary digital manipulation of the volumetric data was to unwrap it and create a flattened surface on which the incised text became apparent. This pipeline of processing (noninvasive micro-CT scanning followed by algorithmic modeling and manipulation of that data) set the stage for a widely accepted approach to virtual unwrapping that may lead to many more epigraphic discoveries. A number of candidate artifacts exist, such as native American copper plates in a mortuary context arranged with other artifacts (engraved shells and woven textiles) in a funerary tableau [31, 32]. The degradation of early copper makes it a challenge to recover the iconography, especially when artifacts have been stacked and fused over time. We firmly believe, however, that complete virtual unwrapping stands ready, as proven by the scroll from En-Gedi, for inscribed surface manipulation as well as damaged/wrapped text [33].

5.2 Style transfer for improved legibility

While the concept of facsimile (e.g., 3D reconstruction) and accurate modeling (digital imaging with various light sources and positions) has led to the ability to visualize certain effects—alterations in lighting, different colors, and textured models—techniques in deep learning have extended visualization into the realm of complex style transfers [34]. These advanced image rendering techniques combine the details of an image, like a photograph, with the unique colors, structures, or other chosen elements of an external image source, such as a painting by Van Gogh, that have been “learned” by a computational neural network. The applied result is the creation of an entirely new, born-digital representation that presents the original image in the style that has been learned by the network, such as a photo looking like a Van Gogh painting.

We have seen the power of this technique in our own virtual unwrapping work. Presenting textual scholars with black-and-white X-ray images of hidden writing is not ideal; it would be much better to present the born-digital texts as they would appear if they could be accessed (as in the case of the cuneiform tablets hidden within their clay envelopes) or as they would have looked prior to being damaged (as in the case of the En-Gedi scroll). We have thus been experimenting with neural networks that infer color and texture information from photographs of text on papyrus and parchment and can be used to render CT images that look as if the born-digital text is written on the appropriate substrate.

Similar approaches can inform the study of digitally restored epigraphic texts. Information gleaned from material analyses of the base in which the inscribed text sits, combined with both current and historical data and images regarding such properties as color variations, textures, or the types of cuts and grooves made by certain tools, for example, can be used to render a digital image of a restored or discovered incised text so that it appears brand new. This type of deep learning application creates an entirely new space for experimenting with ways to visualize the digital representations of objects and to experiment with how the visualization can improve scholarship. It also opens a Pandora’s box of provenance concerns.

5.3 Metadata and the digital provenance chain

Any discussion of scholarly work on a cultural artifact must address issues of provenance, and the proliferation of born-digital objects for study only complicates the issue. Recent events involving forged materials purchased by the Museum of the Bible, as well as the increasingly terrifying ability of anyone to create “deep fake” images and videos, remind us of this fact. As material objects are transformed into bits and bytes that can be enhanced or that can elicit non-explicit information from the data, which can then be used to reveal hidden objects or even create new ones, the digital provenance chain becomes arguably more important than traditional analog concerns regarding physical origins and acquisitions. It cannot be acceptable to produce final conclusions from data that have been arbitrarily (and perhaps irreversibly) manipulated to achieve results. The peer review process demands that the black box of scholarly analysis be opened and made transparent. Particularly in the case of the latest frontier of new techniques—reading interior text without opening an object, capturing data that cannot be seen with the naked eye, eliciting information from objects using devices that must be confirmed and validated as to their accuracy—peer review and archival standards must be facilitated.

Thankfully the digital ecosystem surrounding the data of born-digital material can be accurately and perfectly recorded through appropriate schemas and frameworks. Structured, standardized metadata schemas—and the enforcement of populating those standards with accurate information—are today enabling large-scale databases characterized by widespread interoperability. The epigraphic community was among the first to recognize the power of informative metadata attached to scholarly interpretations. EpiDoc, a TEI-XML-based set of guidelines used to encode scholarly and educational digital editions of inscriptions, was developed in the 2000s [35].

However, metadata approaches to documenting the digital provenance of manipulations/machine learning applications are only now emerging, and their development is a matter of design and intention as the field moves forward. Yes, the technical barriers are challenging, but they are not insurmountable. For example, as soon as we saw Hebrew text appear on the screen, especially once it was confirmed to be from the Bible, we recognized the importance and the challenge of verifying the integrity of the virtually unwrapped En-Gedi scroll. It was imperative that we engendered the confidence of the scholarly community in our results. As we continued to develop and refine our virtual unwrapping software pipeline and the results it produced, we remained cognizant of the need to document and track every computational step in our digital processes. In exploring ways to document the digital provenance chain for our complicated born-digital images and then disseminate that metadata in a clear, concise, and organized way, we settled on the use of the Metadata Encoding Transmission Standard (METS) [36]. This highly flexible and comprehensive metadata “container” allows us to enumerate all of the files involved in the creation of a born-digital document, as well as to capture the entire processing pipeline from primary acquired data to presented findings. This digital pathway—a careful record that makes explicit for analysis and review every algorithmic step imposed upon the data so that conclusions relying on that data are supported and may be editorialized—effectively dismisses the “black box” in favor of digital rigor and transparency.

This digital provenance chain is an extremely important concept that must be realized in the born-digital realm of publications and explication of research ideas and findings. Other clear and much-needed outcomes that will result from strongly designed metadata schemas include a transformation of the publication pathway and the bundling of disparate but related scholarly work into ever more tightly woven, definitive works. These strong steps toward completely born-digital scholarly publications surrounding epigraphic material as envisioned by scholars as early as 2009 [37] will be seen as transformational in the scholarly tradition.

The many pieces of technology are now almost in place for the publication pathway around epigraphic research as born-digital. Canonical reference to other digital objects and scholarly work has been solved [38, 39]. Metadata frameworks as standardized and accepted ecosystems have materialized, together with associated tools for creating and manipulating data. The data itself representing epigraphic text are now primarily digital; where it is not, it can be digitized and ingested as the first step in the scholarly process of preparing publications. And the distribution, access, and search facilities that scholars for a generation now have used as part of the “library,” which is really now the Internet, is familiar territory.

Finally, those of us who have instigated this digital renaissance are tempted to view the digital world we have created as sets of discrete object categories—images, videos, papers, text. Yet, the emerging complex digital object comprising all of these, wrapped with metadata in a way to support its longevity and comprehensive growth, will encompass all prior scholarship and include ways to visualize, search, and make explicit all aspects that are necessary for rigorous research and definitive scholarship. The born-digital world will result in ever tighter and more connected sets of scholarly objects able to be searched, critiqued, and explored as the field moves forward. This convergence is achievable only through the transition from analog to digital and the maturation of a dozen key components currently materializing: durable reference to digital objects, standardized metadata schemas, interoperable protocols for information exchange, scholarly communities willing to adapt, etc. As this digital world continues to mature, the era of data science and the application of large-scale artificial intelligence techniques will push it forward out of necessity for peer review, replicability, and the central need for scholars to continue to produce their results in a way that is connected to the community and the next generation of scholars.

6 Conclusion

Perhaps the first writing was in damp Mesopotamian clay. More ephemeral writing was done by Jesus, who wrote with his finger in sand (John 8:8–8). People have used leaves, bark, and the walls of caves to record writing. Eventually inscriptions emerged, and over the millennia we have accumulated the few that have survived, the ones that we have discovered, and the many we have studied. The last millennium of scholarly work in analog epigraphic preservation and study presented a slow and steady parade of human innovations: the printing press, photography, and materials analysis. But nothing compares to the digital era, where we have invented a way of representing and studying carved rocks that no one but this generation could have envisioned.

As we ponder with fondness the traditional approaches to epigraphy and the scholarly record they produced, none of us will miss the constraints: small numbers of samples, the hopelessness of limited access to artifacts, the impossibility of accurate measurement, and the uncertainty of the scholarly inferences and conclusions drawn in the face of all that. We will also be glad to see the retirement of the scholarly black box, in favor of a transparent and peer-reviewable critical and technical apparatus that grows, tightens, and becomes definitive over time.

While the past is prologue, the future as a sequel may be tempting to downplay—like most sequels—as impossibly boring compared to the transformation of the last three decades of digitization and now data science. But in this essay, we have argued that, like the worst epigraphic text constructed by the most inexperienced of scribes, we have only scratched the surface. And that surface is not the epigrapher’s stone, but the silicon of computing and data science on which the future will be built.