Introduction

Parchment, a writing support whose origins are believed to be in ancient Pergamon, represents an irreplaceable source of historical, artistic and societal information [1]. Over the centuries parchment has been the foundation for a multitude of media from illuminated Gospels to the utilitarian documents used in everyday life.

Aside from the text, the physical parchment object holds vast quantities of biological information that—although in many cases is invisible to the naked eye—can be used to provide a deeper level of understanding about book production, livestock economies, handling, conservation and the historic use of the object [2,3,4]. This emerging field that we have termed “biocodicology” looks to unlock these biological signals to allow a greater understanding of the manuscript as a physical object.

Codicology is the study of the physical structure of the book, which promotes a better understanding of its production and subsequent history [5]. It is often referred to as “the archaeology of the book”, concerning itself with the materials (parchment, sometimes referred to as membrane or vellum, paper, pigments, inks and so on), and techniques used to make books, including their binding [6].

Biocodicology, the study of the biological information stored in manuscripts, looks to expand the field of codicology to include the biomolecular techniques of proteomics [3] and genomics [4, 7] to further develop our understanding of how manuscripts were produced and used through history and how this can help shape and inform our views of the past. This review is intended to provide a primer to this emerging field highlighting the challenges and opportunities in conducting these novel analyses with heritage objects. While our review focuses on the application of biocodicology to parchment based objects, for example highlighting the animal origins of the documents, many of the techniques may also be applied to paper books when targeting, for example, microbiome data or glues and surface treatments. We hope that this review will be used as a guide for conservators and curators on the possible applications of biocodicology to their collections, by illuminating the potential opportunities it offers.

The evolution of codicological analysis—from manuscript to molecule

Original biological analysis

That parchment documents house biological data is not a new observation; follicle patterns of the animals used to produce parchment and leather have likely served as a proxy for the identification of breed and species since the beginning of parchment making itself [2, 8]. This method relies heavily on the subjective experience and training of the user, which can lead to errors (for example, many catalogued sheepskin parchments are classified as vellum) as natural biological variation can often lead to misidentification. In addition, follicle patterns are not always visible and can therefore not be used as an objective or routine method of species identification. Michael Ryder, a pioneer of the follicle pattern analysis of parchment, was able to determine different wool qualities which he believed could be linked to particular breed types. However, the analysis he performed required thin section microscopy of parchment fragments, which necessitated destructive sampling and therefore greatly limited the number and type of samples that could be analysed [9].

Looking at the biomolecular data contained within books is again not a new idea [10, 11], but early attempts to investigate the biomolecular composition of parchment proved to be harder than initially imagined. Due to the nature of the technologies at the time, destructive sampling of the documents was necessary to obtain sufficient amounts of starting material to analyse. This intrinsically limited the kind of analyses that could be adopted as routine or that were even feasible, with only a very small number of documents being chosen as proof of concept studies rather than a large scale analysis. Initially the focus lay on the retrieval of genetic material, with pioneering studies demonstrating the difficulties inherent to the methodologies of the time (contamination and a lack of sensitivity), but also highlighting the possibilities that genetic analyses could provide [1, 11,12,13,14,15]. Early proteomic analysis using mass spectrometry appeared to be more successful, the pioneering study of Toniolo et al. [16] used a high profile document, believed to be Marco Polo’s Bible, to demonstrate the importance of their technique and achieve species identification. Limitations of their extraction technique, database and technical sensitivity meant however that a few other animals couldn’t be categorically excluded, and, as with genetic analysis, their protocol required destructive sampling (5 mg of parchment) rendering it an intriguing but not universally adoptable analysis.

In addition to the parchment itself, there have also been studies investigating the biological microenvironment of manuscripts. There has been a long-standing interest from the conservation community in identifying potentially damaging microorganisms that inhabit these documents in order to assess the risk they pose [17]. Traditional methods involving swabbing, bacterial culture and basic DNA sequencing provide some information but can be limited in that they preferentially only select the most abundant species leading to a somewhat biased interpretation. However, these methods are now accepted by the conservation community and have been successful within their limitations [17,18,19,20].

The omics revolution

During the last decade, we have seen both a genomic and proteomic revolution, offering the technological advances necessary to more fully unlock the biomolecular data held within parchment documents and historical artefacts in general [21]. The fundamental change of methodology within ‘omics’ is that it takes a so-called ‘shotgun’ approach, whereby instead of targeting a specific set of molecules you instead extract and identify all the biomolecules present, providing a ‘biomolecular snapshot’ of the environment. This approach has the advantage of detecting possibly surprising elements, that would not be identified in a more targeted approach, leading to unexpected discoveries. It also gives a more representative assessment of the environment analysed and can allow for relative quantification of identified elements.

First, we will review the terminology involved. Three types of biomolecular analyses can be undertaken: genetics (DNA), proteomics (proteins) and the microbiome (microbial genetics), which all have different information to contribute. By playing to each methods strengths, we can reveal a more complete biological picture of the document to aid in its study and conservation.

What is genetic analysis?

Deoxyribonucleic acid (DNA) makes up one of the four main categories of biomolecules that also include carbohydrates, lipids (fats) and proteins, and carries an organism’s genetic code. Famously found to be a helical structure, DNA holds the blueprint of who we are [22]. By extracting and sequencing DNA we can detect the species and sex of the parchment animal and by analysing the small differences in the genetic code we can determine possible breed variation (dependant on geography) and relatedness to other individuals [1, 4, 7, 11,12,13,14,15, 23,24,25]. However, we must be aware that the DNA we extract from parchment is of a variable quality and abundance compared to DNA sampled from living individuals [4, 26]. This will inevitably affect the resolution of the analysis and might compromise the level of detail we would like to achieve. This inherent complication has to be kept in mind when deciding to undertake a (possibly lengthy and costly) genetic analysis of historical documents. A counterpoint to this is that in the last decade technological advances have exponentially increased the amount and quality of genetic data that can be recovered from degraded specimens, by replacing traditional DNA sequencing methods with so-called next generation sequencing (NGS). These new high throughput DNA sequencing techniques (HTS) are ideally suited to sequence damaged DNA from historical and archaeological sources (reviewed by Orlando et al. [26]) and thus opens the future possibility of high resolution genetic analyses of many cultural heritage objects [21] (Table 1).

Table 1 Glossary of useful terminology

What is protein analysis?

Proteins make up another of the four main categories of biomolecules. Proteins are composed of chains of molecules called amino acids of which there are 20 different types that naturally occur in the body. The sequence of these specific amino acids is what gives proteins their primary structure, and the way this long chain folds up on itself (helices, sheets, etc) confers the proteins secondary and tertiary structure [22]. Proteins are the main functional building blocks of life. They have very varied functions including structural (collagen), enzymatic (trypsin), transport/storage (haemoglobin), immunological (antibodies) and messenger functions (hormones). Proteins seem to have much more robust survival rates than DNA, with some of the earliest proteins identified dated to over 3.8 million years [27]. By looking at the profile (or fingerprint) of one particular protein using mass spectrometry we can identify the species of animal it came from. This basic form of protein analysis is called peptide mass fingerprinting (PMF) and is the basis for the ZooMS [28] technique later adapted into eZooMS [3] for non-invasive use on parchment. The study of all the proteins present in a sample is known as proteomics and when applied to historic or ancient proteins it is called palaeoproteomics [29,30,31]. Much like DNA, minor changes in the sequence of proteins (primary structure) can allow us to discriminate between different species, allowing species identification [32, 33].

A major advantage of proteomic based studies is that proteins have tissue specificity; while the DNA of every cell is identical, proteins are specific to different tissues and environments, allowing the identification of not only the species but also of the biological tissue [34, 35]. For example it would be possible to detect the proteins present in egg white glares on the surfaces of some parchments thus allowing for a species ID (chicken), as would DNA, but also confirming proteomically that the substance is egg white, which would not be possible through DNA analysis.

What is the microbiome?

Over the last decade there has been a dramatic shift in the way we think about the microorganisms surrounding us [36]. Since the launch of the Human Microbiome Project in 2007 microbes have taken centre stage in the study of health and disease, highlighting how dependant we are on these microscopic organisms in all aspects of our daily lives [37]. Microbiomes (the community of microorganisms in a certain ecological niche) are present not only on humans but in the environment around us, with characteristic communities forming in different locations. Parchment documents also have their own microbiome characteristic to them, formed from its production, history, use and conservation [4, 17, 38]. We can imagine that the microbiome on the surface of these documents as a type of microbial fingerprint or signature, that can provide us with additional information about the history of the object, although interpreting this information is still at an early stage [4, 38].

Biocodicology—a step by step guide

When contemplating a biocodicological analysis of a manuscript there are various aspects that can be investigated: protein analysis, genetic analysis, and visual analysis including animal dimensions, scraping and production marks and evidence of animal disease (Fig. 1).

Fig. 1
figure 1

Methods of biocodicological analysis

The selection of the techniques to be employed will depend greatly on the question being posed. In all situations, it is highly recommended to start with the least invasive and costly technique and only move on to other techniques if the question is not answered, in line with ethical sampling guidance provided by The Institute of Conservation (ICON) [39]. Visual analysis should always be the first step in the process, and may in some cases be sufficient. If visual analysis is not enough, our proposed next step would be a minimal non-invasive sampling using PVC erasers to provide samples for a basic form of protein analysis called peptide mass fingerprinting (PMF) [3]. This would provide a species identification and parchment quality index. There is also the possibility of performing a more in-depth proteomic analysis to identify additional proteins from the surface of the document. For more complex questions, if a larger amount of eraser rubbings can be obtained, then DNA analysis can be used to provide information on species, sex, relatedness and microbial presence [4].

Visual analysis

Methods of production

Visual analysis of parchment can be divided into two main categories:

  1. 1.

    The observation of traces left from the manufacture of parchment. This starts with the method of skinning the animal where flay cuts left by the butcher’s knife might later open up and create holes during the stretching of the skin on the frame. Some of them are sewn (by various types of stitches), some holes are left or later covered by patches. Many other traces come from the shaving or final treatment of the surface of parchment, including striation marks left by the parchment maker’s knife. In most cases the surface treatment completely removes follicle patterns, therefore other criteria must be used to identify the type of skin employed such as its stiffness or flexibility or the curvature of the margin of parchment folios when opened (as a reaction to the climate). Parchment thickness is also an important measure that should be registered [40, 41].

  2. 2.

    The observation and measurement of the anatomical features of the animal visible in the parchment can help to identify the part of the body (e.g. pelvis bone or specific vertebrae) and give us an approximate estimate of size and age. Observation of the position of the spine and the belly of the animal can also help to understand how the skin was divided into sheets (bifolia), which were later organised into the gatherings (quires) of the codex. Identifying the hair and flesh sides is also crucial as it relates to different historical methods of quire construction (including pricking and lining of the folios) and can lead to a better understanding of manuscript production practices in different scriptoria. To help aid these visual identifications it is recommended to observe parchment folios in different types of lighting for example transmitted or raking light [42, 43].

Methods of construction

While parchment documents in archives take the form of flat, rolled or folded deeds, most documents from medieval libraries take the form of a codex. This means that the parchment leaves were secured together to form a three-dimensional object: the book in its binding. The parchment leaves were folded to form bifolios; then 3, 4, 5 or more bifolios were gathered to make quires. The quires were sewn together as a text block on cords or skin thongs. The construct was strengthened and its opening—how it opens—controlled by the use of various elements depending on the period: this could include tabs, endbands and a spine lining. It was then protected by wooden boards and a skin covering. In the Middle Ages, all materials used to make the bindings were carefully selected for their mechanical properties, depending on the role they had to play [44]. Other than the wood used for the boards, the plant fibres used for the sewing thread or fabric linings and, occasionally, metal clasps, all materials were animal based: alum-tawed skin, tanned leather, parchment, fur, silk linings, fish glue, casein glue, egg white finish, beeswax and even tendons to make thread. We study the construction of a book by looking at each of the elements of the binding: how they are put together, but also the materials that were selected for each of these elements [45]. By identifying and examining the various type of skins used to make the structure, we can learn about how the book functioned. It can also tell us what was involved in the manufacture of the book, such as where they were produced, the network of skills involved and even about local production methods and facilities and how the materials were traded.

Evidence of disease

Finally, a visual inspection of parchment documents can also inform us on the health of the animals used for their production. A variety of diseases and parasites are likely to cause lesions on animal skins, leaving scars that may be recorded in the parchment. The shape, size, nature and anatomical location of these scars, when combined with an identification of the affected species, can in some cases allow the diagnosis of their origin, as different pathogens manifest in different ways on the skin. For instance, warble-fly (Hypoderma bovis) breathing holes and exit holes have long since been identified in the parchment record, as the lesions they produce are unmistakable: small, circular perforations grouped along the line of the back in cattle skins. Other diseases that have not yet been investigated in parchment documents can be identified in a similar way, with ongoing work looking to identify two major medieval sheep conditions, scab and sheep-pox.

Biomolecular analysis

The most significant advance in the biomolecular analysis of manuscripts has been the development of novel sampling techniques. Manuscripts are highly valued but the great importance attributed to these documents means that analysing them is highly restricted and any form of destructive analysis is highly scrutinised. This means that until now analyses have seen the predominance of non-invasive imaging techniques for example multispectral imaging, XRF and Raman spectroscopy. Although these techniques can provide valuable information, they are insufficient to address more biological questions (regarding species, sex, breed and origin of animals used) that can only be answered through DNA or protein analysis. The development of our non-invasive sampling technique [3] has allowed us access to thousands of previously unanalysed documents all with the approval of conservators and curators.

Our sampling method is based on triboelectric extraction using conventional PVC erasers found in conservation studios. The documents can be sampled in situ, with the eraser crumbs collected and sent to a lab for subsequent analysis without need for specialist conditions or equipment. Depending on the biocodicological analysis that is required different amounts of eraser crumbs will need to be collected (Fig. 2). This process is now accepted by the majority of conservation studios as a non-invasive surface cleaning technique appropriate for parchment documents. Ultimately, the decision of where and whether to sample lies with the conservator who has the best understanding of the condition of the document and might decide in particular cases that a document is too fragile or brittle to be cleaned or sampled in this manner.

Fig. 2
figure 2

Examples of eraser crumbs needed to perform different biocodicological analysis represented in 1.5 ml microcentrifuge tubes. a eZooMS, b DNA analysis of animal and microbiome, c microbiome or proteomic analysis

When embarking for the first time on a biocodicological analysis we have to start with the question that is being asked because depending on the enquiry a different methodological route should be undertaken. This is best illustrated with some opinionated practical examples, informed by our biocodicological research:

Q. What species of animal is this particular document made from?

  1. A.

    For simple species identification the best course of action is to undertake basic protein analysis (eZooMS). This method is cheap, fast and for parchment has a very high success rate (> 90%). Only a tiny amount of eraser crumbs is needed (20 μl, Fig. 2a) and only one sample per bifolio. In addition to the species ID we are also able to provide some detail of the production quality of the parchment through the PQI (Parchment Quality Index).

Q. Are these two fragments related?

  1. A.

    Initially, eZooMS would be the best first step just to confirm that the two fragments are the same animal and have a similar PQI. The next logical step would be to undergo DNA analysis. This would require a greater amount of eraser crumbs (200 μl, Fig. 2b) and therefore the condition of the parchment would need to be assessed. By analysing the mitochondrial data obtained from the animal it would be possible to determine if the fragments belonged to the same maternal line/herd. By doing more in-depth sequencing it might be possible to determine if they belong to the same individual. Unfortunately, this comes at a price. DNA analysis is not only more labour and time intensive but it is also much more expensive than protein analysis (Fig. 3). Given the complexity and cost of this type of analysis currently it cannot be carried out routinely and therefore it would be in the interest of all parties (curators, conservators and scientists) to see if the question could be at least partially resolved through other means (previous documentary evidence, palaeography, etc). However, the information that can be obtained through this kind of analysis is a valuable resource for the document, giving us the species, sex, and possibly even how the animals relate to modern breeds in addition to microbial data and evidence of human handling [4, 7, 23]. This information is crucial in order to gain better insight into how livestock has changed over time, how this may have impacted the economy as a whole, estimates into flock and herd sizes and the kill off patterns used depending on the agrarian economy.

    Fig. 3
    figure 3

    Overview of proteomic and genetic methods applied in biocodicology

Q. What is this stain?

  1. A.

    The best way to approach this question is to undertake proteomic analysis. This is a step further than eZooMS, but doesn’t require much more sample, approximately 50 μl in volume (Fig. 2c). By undertaking a complete proteomic analysis we can not only determine the species of the parchment and the PQI, but we can also identify all the additional proteins present on the surface of the parchment. Proteomics offers a tissue specificity that DNA cannot provide. The analysis will not only tell you that chicken proteins are present (as would DNA), but they can tell you that those proteins are specific to egg yolk for example. This tissue specificity is much more revealing in the case of stains and gives vital clues as to how the document may have been used in the past. This, of course, does come at a cost as it is more expensive than basic eZooMS and takes longer, but it is significantly cheaper and quicker than DNA analysis and more informative in the case of stains.

Q. I am concerned about the condition of a document, could there be potentially damaging bacteria or fungi present?

  1. A.

    Here the answer lies in the analysis of the microbiome. By using eraser crumbs we are preferentially removing surface DNA, which comprise the bacteria and fungi that have or are still inhabiting the surface of the document, therefore the amount of sequencing (data) required in order to get an informative answer is likely to be less than for animal genetic analyses. However, although species can be detected we still don’t know how likely they are to be directly implicated in damaging the object, so further studies and analysis will need to be carried out before definitive links can be established. Although, pioneering studies are starting to emerge in this area [18, 38, 46].

Finally amongst all these other decisions we must also take into account the original substrate as not all materials have the same biomolecular profile and it might be difficult to obtain specific results for certain materials (outlined in Table 2). We know that obtaining host DNA from tanned skins is often not possible (as the tanning process adversely affects the host nuclear DNA although some mitochondrial DNA may survive [47, 48]) so this should be taken into consideration when designing your biocodicological strategy.

Table 2 Summary of information that can be obtained from different substrates

Conclusions

The recent biomolecular revolution is changing the way we think about archaeological and historic artefacts and challenging our views on what information can be garnered from these heritage objects. Until recently the majority of biomolecular techniques required some form of destructive sampling, albeit very small amounts, but although destructive sampling is widely accepted in the archaeological community (in part due to the long-standing use of radiocarbon dating techniques) it contravenes most conventions for manuscript curation and conservation. This has meant that only a select few documents have been subjected to this kind of analysis with little interest in the widespread adoption of these techniques.

The price of biomolecular analysis has also been a limiting factor. Many archives and libraries have extremely limited funds with which to both conserve and archive documents and find that any form of analysis is out of their reach. Although the prices for DNA sequencing have dropped dramatically in the past decade, this analysis can easily run into hundreds if not thousands of pounds, an important factor to take into consideration when working with the limited funds available to libraries and archives.

We are now in a position to be able to address both these issues. Our development of a non-invasive sampling technique has been fundamental to this step change in how we approach biocodicology. By using a technique that was developed alongside conservators and that is widely accepted in the conservation community we have been able to broaden our access to documents from just a handful to thousands. As our triboelectric sampling technique is used in situ by the conservators themselves we remove the logistical problems of transporting precious documents. All sampling can be done in house by a conservator with no need for specialist equipment or training. An additional advantage to using our triboelectric extraction when compared to destructively taken physical samples, is that we are preferentially extracting the surface biomolecules (including surface treatment and stains, as well as the microbiome) and not overwhelming the extraction with the underlying collagen and animal DNA.

The cost of analysing these samples is decreasing year on year and is becoming much more plausible to fund. However, research funding for libraries and archives is still quite limited and usually not substantial enough to routinely include these types of analyses. However, by highlighting the biobank contained within parchment we hope to encourage increased funding in this area, this will not only benefit the conservation of the document but could also help to sustainably unlock the vital biological record it contains.

As with any other emerging field, there are numerous groups working on similar problems using different methods to tackle the same questions. Numerous groups have successfully analysed host DNA as well as the microbiome of various historical documents using NGS [7, 38, 46]. There has also been differing non-invasive methods developed to recover proteins from both paper and parchment documents [49,50,51] revealing evidence of disease and substance use, which can open exciting new avenues of research.

One question that must be addressed is that of dating. This is one of the most frequent requests that researchers have and are eager to resolve. Documents can be dated through direct textual evidence (if a date of production is present) but more often these objects are dated on paleographical details, that although are incredibly helpful are not without problems and often can only offer a date range rather than a precise date. Radiocarbon dating does provide a more precise form of dating, however this comes at a cost and is also generally reported as a calibrated date range. It necessarily requires destructive sampling of at least 3–10 mg [52] which, as we have previously discussed, is not routinely accepted by most conservators and curators and can therefore only be used when the object is considered of such high importance that the potential results merits the destructive samples [53,54,55,56]. We would also advocate that any sample remaining from a radiocarbon analysis be identified as the important biobank that it is and hopefully not discarded but used for other analyses. Finding an objective method of dating that does not require destructive analysis is a prime objective. Our current techniques explained above, unfortunately, cannot yet provide this; at best, we could possibly use the genetic data to provide a form of relative dating (e.g. document X is older than document Y) which does not solve the issue of precise dating. As sensitivity increases and new methods appear, there may emerge a possibility for objective discreet dating of parchments using non-invasive procedures and this in turn would prove to be another revolution in what we know about historical manuscripts.

Biocodicology offers the possibility of interrogating manuscripts in a novel and informative way. The generation of biological data associated to parchment documents will add a further level of understanding and interpretation to these invaluable objects. As methods continue to improve we hope that biocodicology will become a highly relevant discipline in manuscript studies, contributing a different but complementary perspective to the current scholarship. We hope this somewhat opinionated review will act as a catalyst to further interactions between the heritage science community and parchment scholars, curators and conservators.