1 Introduction

Unlike its ubiquitously investigated counterpart, ancient DNA (aDNA), ancient RNA (aRNA) is by comparison an understudied, somewhat neglected biomolecule. At the time of writing, only a handful of publications directly addressing the utility of aRNA (Rollo 1985; Fordyce et al. 2013a; Rollo et al. 1991; Smith et al. 2014a, b; Venanzi and Rollo 1990; Ng et al. 2014) have appeared in the peer-reviewed literature over the same 30-year period (Table 1) that saw aDNA work grow in leaps and bounds to a current total of almost 6,000 research publications. Perhaps ironically, aRNA was initially at the forefront of archaeogenetics in the field’s early years; but, due to an early lack of “interesting” data and an assumption of poorer general preservation, its study went on hiatus from the early 1990s until this decade. Possibly due to the additional precautions known to be essential to working with fresh RNA, such as strictly RNase-free conditions and much colder storage temperatures, aRNA was almost subconsciously dismissed as a recalcitrant and often fruitless molecule. This, as we will explore further, perhaps rests on two preconceptions about aRNA: its limited usefulness and unavailability, both of which may perhaps be incorrect.

Table 1 Summary of research publications into ancient RNA (not including reviews or critiques)

From the outset, a major goal of aDNA research teams was to explore the evolutionary process at the molecular level; aDNA was a gift in the form of an evolutionary photo album, reducing the need to infer ancestral or extinct sequences but instead providing a relatively accurate snapshot of past genomes. As a result, several recent publications have rewritten early hominid evolution (Vernot and Akey 2015; Seguin-Orlando et al. 2014; Fu et al. 2014; Meyer et al. 2012), and similarly methodological work has clarified the evolutionary trajectories of numerous plants (Allaby et al. 2015; Palmer et al. 2012; Paris 2016) and animals (Evin et al. 2015; Skoglund et al. 2015; Orlando et al. 2013) through the lens of both natural and human-mediated selection (i.e., domestication syndrome). When reconstructing phylogenies or genomic mutations at this level, RNA is not necessarily useful, since none of its eukaryotic incarnations give any more information than the genomic DNA sequences from which they originated. In fact, when one considers the transcriptome model and the absence of introns, RNA contains substantially less useful information than DNA for this purpose (viruses and viroids with RNA-based genomes are the exception to this rule, as will be discussed later in the chapter).

Phylogenetic analysis however is not the only use of ancient sequence data. Increasingly, paleogenomics is attempting to unravel cellular processes, mediated by the genome, as they originally occur(ed) in real time. The stresses that cause responses, the factors that lead to mutations, and a host of intricacies of interaction are all potential areas of investigation, and ones in which aRNA may provide more accurate and powerful insights than DNA. The different classes (such as messenger, regulatory, ribosomal, etc.), and importantly amounts, of RNA molecules can tell us a great deal about what was going on in the genome at the last moment before death or dormancy and thus what kind of environmental conditions were being experienced. The challenge here is unraveling those different classes, either through sequence or other chemical markers, to gain a truer picture.

The utility of aRNA is further compounded by its availability. The general robustness of DNA in comparison to RNA is well documented and almost an extension of the biochemical dogma surrounding ancient biomolecules. As early researchers noted, possibly underestimating of the gravity of their observation, biochemical preservation of nucleic acids in seeds is not a simple model and that the “trend may be general” for ancient biomolecules to break down into ultra-small fragments (Rollo 1985). In terms of chemical stability, different pattern-specific degradation rates should be expected between RNA and DNA (Willerslev et al. 2004); see Sect. 3. However it is the release of RNases during autolytic decomposition in many tissues, which promotes the general degradation of RNA over DNA (Huynen et al. 2012). This naturally implies that near-ideal preservation conditions would be essential for anything more than trace levels of aRNA survival. Since these early studies, aRNA extractions have not generally been attempted from tissues under conditions that are now known to be conducive to aDNA persistence. During this time, however, a number of successful attempts at germinating archaeological seeds (Yashina et al. 2012; Sallon et al. 2008) suggested core RNA components involved in germination must be capable of sufficient survival under the correct conditions.

In this chapter, we will discuss older and post-hiatus research into aRNA, promising recent research following some paradigm-shifting discoveries in transcriptomics, and suggest steps toward a potential new synthesis on its utility.

2 A Brief History of aRNA Study

The number of studies that have focused primarily on aRNA is sufficiently small that a comprehensive review can be given in this section (see Table 1). This was not destined to be the case however; following the groundbreaking recovery of the first aDNA sequences from a preserved Quagga in 1984 (Higuchi et al. 1984), attention almost immediately turned to other ancient biomolecules. Only the next year, aRNA extraction from ancient cress seeds was a fortuitous accident in what was an attempt to indiscriminately extract nucleic acid from plant material (Rollo 1985), the RNA itself being identified by molecular hybridization. Similar results derived from preferential extraction of RNA in a general nucleic acid extraction method were subsequently observed using maize kernels (Rollo et al. 1991), causing disagreement over the relative proportions of depolymerized, modified DNA (Pääbo 1986; Rogan and Salvo 1990) versus unmodified RNA. The controversy stemmed from the ubiquitous presence of uracil in ancient nucleic acids, originally part of the robust argument (Venanzi and Rollo 1990) for the increased survivability of RNA over DNA.

However, the presence of uracil in archaeogenomic samples is now largely attributed to a breakdown process of DNA; in the presence of water molecules, hydrolytic deamination of cytosines results in uracil. This process occurs readily at overhanging ends of fragmented, double-stranded DNA molecules (of which there are many in a typical aDNA sample) and so cannot be attributed solely to the presence of RNA. Although this was empirically demonstrated relatively early on (Pääbo 1989) and several years later in detail (Hofreiter et al. 2001), and despite the interesting sidenote that DNase-free RNase removed the majority of nucleic acids of all types from mummified maize kernels, the study of aRNA became very much a secondary concern to aDNA. Again, this was presumably due to its perceived (lack of) information value and an unfortunate propensity in academic science to neglecting the publication of negative results. There were no further explicit aRNA research papers published for several years, the only interim mention being a revisiting of existing archaeobotanical extraction methods (Rollo et al. 1994).

The next piece of research to explicitly investigate aRNA appeared in 1999, following the detection of tomato mosaic tobamovirus genomic RNA in a glacial ice cores ranging from 5,000 to 140,000 years old (Castello et al. 1999). While the clean-lab procedures and controls common in today’s paleogenomic labs were not followed, and similarities to modern strains persisted at all strata, the potential for aRNA from pathogen genomes, as opposed to relatively uninformative transcriptomes, was rapidly becoming apparent as a viable source of ecological, genomic, and pathogenic information in a previously untapped biomolecular resource. An elegant hypothesis of atmospheric recycling of viruses from melted glacial water was put forward to explain the somewhat concerning similarity between modern and ancient sequences, touted as much the same process as is commonly observed with bacterial and fungal spores. However, this hypothesis apparently failed to gain traction in the academic community, and aRNA was again not studied widely for several years, until a similar study also claiming to have evidence for ancient viral RNA genomes was published, detailing influenza A in Siberian lake ice (Zhang et al. 2006). The authors here proposed a similar hypothesis of host-/carrier-mediated recycling resulting from freeze/thaw cycles of the lakes, but this was met with skepticism, and the research was widely discredited as being laboratory contamination (Worobey 2008) on the basis of suspicious levels of similarity to modern laboratory strains in a supposedly rapidly evolving genome. Simultaneous questions were also raised about the earlier tobamovirus work, on the same basis, and by virtue that few of the (by now quite famous) criteria for ancient DNA authenticity (Cooper and Poinar 2000) were applied to either study.

In their 2004 review of nucleic acid potential in permafrost conditions (Willerslev et al. 2004), the authors noted a general expectation of decreased RNA survival in the archaeological record, in particular with reference to fragmentation (see Sect. 3.1). Pertinently, molecular fragment size does not automatically define a molecule as “ancient.” In fact, no particular criterion does; “ancient,” after all, is a subjective term, as is “historical.” Historical DNA and RNA are terms often used to describe samples not necessarily of an archaeological context but found in some other biological repository such as museum or herbarium. Since the majority of degradation patterns and techniques for extracting and analyzing these materials are identical to those of unequivocally “ancient” materials (and a distinct lack of alternative ancient RNA!), it is not inappropriate to discuss examples of aRNA work from herbaria here. Recent research into general biomolecular breakdown processes advocate that the majority of fragmentation events occur within the first few years, the remaining degradation plateauing and thereafter defined by long-term environment (Kistler et al. 2017). While older RNA has been examined from seeds, which themselves are adapted specifically for long-term stability of cellular machinery including nucleic acid components, it is encouraging that RNA from softer tissues can survive for at least decades after death. A 2013 study identified amplifiable viroid RNA of peach latent mosaic viroid in 50-year-old leaf tissue (Guy 2013), showing amplicon lengths within reasonable expectations based on observations of similarly aged DNA molecules. A 1997 indirect attempt to identify aRNA from similar (although older) tissue samples suggested that complete virions can remain intact, enough to still be infectious, for a century or more (Fraile et al. 1997). While an inoculation method from herbaria lesions such as this would not meet the typical criteria for the study of ancient biomolecules (Cooper and Poinar 2000), the results and negative controls provide encouraging evidence of aRNA persistence.

Further evidence of aRNA persistence came about the following year, when a partial aRNA viral genome was sequenced from a permafrost environment, albeit a surprising one (Ng et al. 2014). The presence of a plant virus in caribou feces allowed insight not only into the paleoecology of northwest Canada but also into the survivability of a dogmatically “fragile” molecule in the presence of a substance replete with microbial activity and all its associated enzymatic activity. The presence of enzymes from the caribou digestive tract and its microflora suggests that permafrost environments have the potential to negate certain decompositional processes and allow RNA to survive for long periods.

The increasing power and ubiquity of high-throughput sequencing (a.k.a “next-generation sequencing,” “NGS,” “second-generation sequencing,” and “massively parallel sequencing”) platforms developed over the past decade have revolutionized the way in which ancient DNA is analyzed, and readers will no doubt come across this epithet several times in this volume. Given well-preserved samples, it was inevitable that such platforms would also be applied to aRNA (Fig. 1). Indeed, the same year that saw Guy’s RT-PCR work on peach mosaic viroids also saw the first NGS work on ancient plant aRNA (Fordyce et al. 2013a), in which partial transcriptomes were recovered from ancient maize kernels, thus showing the molecules’ viability over several thousand years. The following year, the first complete aRNA genome – of a common RNA plant virus – was sequenced using similar technology, from a 750-year-old barley grain (Smith et al. 2014a). In contrast to earlier studies, these two studies’ use of NGS technology allowed the authenticity of the aRNA to be confirmed by virtue of high-coverage cytosine deamination patterns, a phenomenon routinely observed when using NGS on ancient DNA. Later the same year, a specific class of aRNA called short interfering RNA (siRNA) was identified in the same barley sample, where in vivo activity was shown to be evident from correlation between these epigenetic-related siRNAs and genomic methylation patterns (Smith et al. 2014b). The importance of certain RNA classes will be discussed later in this chapter.

Fig. 1
figure 1

Cumulative number of ancient RNA research publications, not including reviews or critiques. Note the dramatic increase of output since the early 2010s, as NGS services become increasingly ubiquitous and affordable

Stripping away hypotheticals, reviews, and unverified work, the most demonstrably absent evidence of ancient RNA (even as confirmed negative results) lies with metazoa. Until the very recent publication of RNA from Ötzi the Tyrolean “iceman” (Keller et al. 2017), the only ancient RNA to be accepted as genuine was isolated from plant tissue and was either endogenous (i.e., belonging to the organism being studied; see Sect. 3.1 for a detailed definition) or viral in nature. The reticence to study animal aRNA likely stems from concerns about the lack of available aRNA in general, exacerbated by (somewhat justifiably) the more violent decomposition processes in animal soft tissues compared to, for example, a desiccated selection of cereal grains. However, as Willerslev et al. observed, further recovery of aRNA from the right (i.e., permafrost) contexts shows “enormous promise” and will, in the coming years, doubtlessly be explored. As we are seeing with the iceman RNA, this exploration is already beginning, again in the context of permafrost environments. The Ötzi publication (Keller et al. 2017) is also important for reasons other than being aRNA from metazoa, due to the type of RNA sequenced. Regulatory microRNA (miRNA) can be used to identify tissues, infection, and other environmental stresses. Different tissues from Ötzi demonstrated expected miRNA profiles, and further adaptive qualities of ancient miRNA as environmental response drivers are too becoming apparent (Smith et al. 2017). As we will see later, these endogenous regulatory RNAs have the potential to inform not only about the molecular evolution of species but allow us to see those processes develop as they occurred in the archaeological record.

3 Diagenesis

The processes involved in the degradation of DNA (also known as “diagenesis”) are becoming more apparent as the subtle nuances of the events and conditions underlying these processes are becoming increasingly well characterized (Kistler et al. 2017). As the reader will discover elsewhere in this book, the most apparent factors involved in diagenesis of aDNA are migration, fragmentation, deamination, cross-linking and enzymatic breakdown, and numerous hypothesized factors related to hitherto unknown chemical interactions resulting in the postmortem formation of “noncanonical” nucleobases that potentially interfere with experimental procedures. However, there is no reason to believe that any of the breakdown processes seen in the diagenesis of aRNA are fundamentally different to those of aDNA, although there are, as we will see, some subtle differences, which should be taken into account.

As with aDNA, the evidence so far points to the fact that aRNA persistence is largely determined by the archaeological (“depositional”) environment, and evidently the two molecules show similar patterns under similar environs. Typically, colder, dryer conditions are more conducive to nucleic acid survival than their opposites, although in some cases, the effects of one condition can vastly outweigh others. Extreme aridity, for example, allows long-term survival of DNA and RNA even in high temperatures, such as those observed in hot, arid sites in southern Egypt (Smith et al. 2014a) and Arizona (Fordyce et al. 2013a). Conversely, permafrost conditions are now known to allow survival of specific classes of RNAs even in mammalian tissues, where immediate freezing can arrest the harsh autolytic and microbial decomposition processes (Keller et al. 2017). While most of these examples are relatively recent and have only been possible by utilizing next-generation sequencing technologies, the latter perhaps represents a reevaluation of the “RNA survival dogma.”

3.1 Migration and Loss

The most noticeable characteristic of ancient DNA and RNA is its availability – or, more precisely, lack thereof. The recoverable quantity of endogenous nucleic acids per unit mass of tissue from ancient material is usually significantly depleted from its original in vivo levels, and only in exceptional cases will it approach a level that is comparable to modern material. There are several possible mechanisms underlying this, and the reality is probably at least some degree a reflection of them all.

First, as previously mentioned, much of this loss in actual recovery can be attributed to the ultrashort fragments in the theoretical fragment size distribution due to the limitations of isolation chemistry. The majority of current methods employed for DNA recovery rely on precipitation of DNA from solution in a combination of chaotropic salt and alcohol, followed by binding to a silicon dioxide matrix, and this isolation of RNA is chemically identical (Poeckh et al. 2008). Methods have been refined over the years to allow recovery of smaller and smaller fragments (Dabney et al. 2013), down to less than 15 bases, although the very smallest fragments of the theoretical distribution are never recovered using current methods.

Second, it is to be expected that at least some of the original nucleic acids will have been degraded into derivatives that are no longer recognizable as the original nucleic acids. This type of time-dependent degradation is a complex issue but likely to be a function of various factors such as temperature, humidity, tissue type, surrounding pH, microbial activity, and even background radiation. Paleogenomicists generally find congruent evidence to support this (for at least DNA); for example, burnt grains often have much less recoverable material than desiccated equivalents, and material from permafrost environments generally gives better results than those from tropical conditions. Specific decay processes can also influence the levels of DNA and RNA independently, particularly the often more ubiquitous presence of RNases in many tissues and microorganisms (Guy 2013). All these processes are likely, in some fashion, to be an extension of the issues discussed in this section – and also intrinsic to the problem of recovering ultrashort fragments.

Finally, the diffusion of molecules through the matrix of their deposition environment(s), away from their source, greatly reduces the amount of available genetic information. Often in ancient DNA literature, particularly where a next-generation sequencing approach has been taken, there is at least a passing mention of the “endogenous content,” usually represented as a percentage of total reads. “Endogenous” in this case refers to DNA sequences that likely belong to the organism being studied, unless dealing with a metagenomic assembly, and “exogenous” refers to everything else (or sequences of such low complexity they could belong to a wide range of organisms, including the one being studied). In a situation where significant molecular movement occurs, the endogenous DNA diffuses away from its tissue of origin and is in turn replaced by other DNA from the surrounding environment and thus may result in low endogenous content. The extent of this diffusion is again determined by several environmental factors including temperature, humidity, and tissue type. For example, in a small, reasonably closed system such as a seed with intact pericarp, one might expect greater levels of endogenous DNA and RNA than, for example, porous bone. Quite often this is the case, although again dependent on other factors such as water percolation and temperature. Several studies have shown increased endogenous content where liquid water is largely absent, for example, from desiccated (Palmer et al. 2012) and permafrozen (Mouttham et al. 2015) environments, which contrasts with the results from humid environments (Pinhasi et al. 2015). When dealing with the nucleic acid content of a metagenomic assemblage (such as a soil sediment core), the genomic data generated might have only a superficial bearing on the physical mass of identifiable macrofossil species contained within it (Smith et al. 2015); however the anaerobic conditions of truly waterlogged material, while potentially lacking in endogenous ancient DNA, often slow other diagenetic processes detailed later (Brown et al. 2015).

It is important to note that as with the other factors mentioned, we are forming hypotheses of RNA availability based primarily on empirical evidence of DNA as a proxy. Other than differential types of chemical breakdown, the limited evidence of aRNA behavior has so far been as expected; however, we may have to alter our expectations of aRNA as more evidence becomes available.

3.2 Fragmentation

Like DNA, the primary structural support element of RNA lies in its phosphate backbone. Breakage of both strands is a requirement for full DNA molecule fragmentation, and so one might be forgiven for expecting a greater rate of fragmentation in a single-stranded molecule such as RNA. Primarily, RNA’s 2′ hydroxyl (OH) group, which DNA does not have, has the potential to induce strand cleavage by hydrolyzing its own adjacent phosphodiester bond (Fordyce et al. 2013b). This action can be further compounded when catalyzed by certain cations (Lindahl 1967) such as calcium, which may be ubiquitous in, for example, a skeletal assemblage. Indeed, a review of the potential for DNA and RNA survivability in permafrost conditions (Willerslev et al. 2004) outlined succinctly the expectation of a generally elevated degradation rate of RNA compared to DNA.

However, empirical data from truly ancient plant material suggests that in some circumstances, the opposite may be the case (Fordyce et al. 2013a). The reason is possibly down to the fact that, in practicality, RNA by itself is quite often not entirely single stranded. Since the principle of complementary base pairing still very much applies to it, RNA has a propensity to form secondary structure, spontaneously folding back in on itself and creating de facto strings of base pairs from sequence regions with enough complementarity to each other (Zuker et al. 1999). Secondary structure formation is thought to effect the rate of phosphodiester bond hydrolysis (Fordyce et al. 2013b), seen by the greater persistence in highly secondary structure-forming RNA types such as ribosomal RNA compared to messenger RNA transcripts (Laing and Draper 1994). In fact, it is this ability to form secondary structures which has the emergence of the microRNA regulatory pathway (see later).

While it seems that exact relative rates of degradation cannot be estimated due to a dearth of data, Fordyce et al. provide a detailed review of biochemical interactions contributing to RNA breakdown (Fordyce et al. 2013b), and Willerslev et al. also noted that the presence of the 2′ hydroxyl group in RNA should, in theory, increase its inherent lability. Expected fragmentation patterns are especially pertinent, as recent research (Kistler et al. 2017) has noted that the theoretical distribution of DNA degradation, which should follow an exponential curve increasing toward small fragments, is not seen in recovered material. Instead such fragmentation often follows a lognormal distribution (Renaud et al. 2017), the “missing” upper end of which can be explained by inefficiencies in isolation protocols (that is to say, ultrashort fragments having insufficient mass for salt-assisted precipitation and binding to a silica medium, the basic principles around which the vast majority of nucleic acid extractions are carried out).

A further compounding factor for modeling aRNA decay is the variable abundance, length, and types found in a cell. A typical eukaryotic nuclear genome has a standardized size and copy number (depending on ploidy, although this is two for the majority of species), and mitochondria again have a standard size but variable copy number. RNA species however are much, much smaller and vary wildly in size in vivo compared to their DNA counterparts. A small regulatory RNA at its smallest is around 18 nucleotides (nt), while a large transcript can be as long as 100,000 nt in length. The oft-abundant ribosomal RNAs are usually in between at a few thousand nt. To compound things further, the copy number of RNA varies according to tissue type, age, and even the immediacy of the organism’s environment; under stress, a cell may be producing more regulatory RNAs or making more transcript for a certain gene. Since the smallest physically recoverable size of RNA is around the small RNA size, disentangling “real” small RNA from the breakdown product of a larger transcript is fraught with problems.

Congruent with expectations for DNA (Kistler et al. 2017), excessive strand breakage alone renders nucleic acids unusable in terms of both recovery and analysis; while the physical mass of aRNA in such a breakdown process may not change, at least within a closed system, resultant ultrashort (<10 nt) molecules are not recoverable by most extraction protocols. If they were, the reduced sequence complexity renders such fragments highly prone to false reference alignments and thus misidentification. On the other hand, base cleavage from the backbone (depurination) events are predicted to occur at a slower rate in RNA than DNA, potentially allowing a greater proportion of usefully sized RNA molecules to be competent for sequencing.

3.3 Deamination

Cytosine deamination, the loss of the amine group on a cytosine to produce uracil, is arguably one of the most characterized and discussed lesions of ancient DNA, even to the point of its presence being seen as a proxy for authenticity (Briggs et al. 2007). This particular lesion, at least in ancient DNA, is (probably) limited to exposed, single-stranded ends (“overhangs”) where the complementary strand is broken at the phosphodiester backbone and the terminal end no longer has enough entropy to sustain the hydrogen bonds between complementary bases. When reading these newly converted uracils, many of the polymerases involved in sequencing library preparation treat uracils as thymine and so incorporate adenine as the complementary base. Following several rounds of PCR, cloned molecules of those containing uracil now contain thymine and are read by sequencers as such. During subsequent data analysis when mapping reads to a reference genome, the patterns of deamination signals can be characterized by virtue of the large numbers of sequencing reads using the program mapDamage (Ginolhac et al. 2011). They typically manifest as cytosine > uracil (read as thymine) misincorporations at the 5′ end of the molecule. At the 3′ end, we see a “mirror image” misincorporation; overhanging uracils are paired with adenine during the “strand repair” step of library construction (whereas an intact cytosine would normally pair with guanine). During PCR, the “misincorporated” adenines show as a guanine > adenine mismatch when mapped to the reference sequence.

How does this play out when the target molecule is single stranded, as are the majority of RNA classes? We know that cytosines in ancient RNA can become deaminated in the same way as DNA, but interestingly, cytosine deamination damage patterns are not randomized or constant across the strand as one would expect from a molecule which is entirely single stranded. In the small RNA fraction of archaeological barley, the authors discovered distinct misincorporation patterns at both ends of the sequenced molecule, with significantly fewer in the middle region (Smith et al. 2014a). Secondary structure formation, as detailed above, may “protect” mid-sequence cytosines while leaving terminal nucleotides exposed.

An emerging phenomenon in ancient DNA is the observation of a different type of deamination, this time from 5-methyl-cytosine (5mC) to thymine. The deamination reaction has much the same chemistry as C > U transitions, tends to occur at overhangs, and in sequencing shows up as the same C > T modification. However, it cannot be distinguished from the deamination product of an unmethylated cytosine, unless steps are taking in library building, such as removing uracils (UDG treatment), or using a polymerase that stalls at uracils (i.e., only reporting sequences that do not contain deaminated cytosines). Where this takes place, a similar algorithm to mapDamage, EpiPALEOMIX (Hanghøj et al. 2016) can be used to identify sites that have previously been methylated cytosine.

Exactly how this phenomenon could be applied to ancient RNA is not entirely clear – yet – but could have applications. Cytosine methylation akin to that of DNA (5mC) is known to occur in certain types of noncoding RNA, such as ribosomal, regulatory, and transfer (Schaefer et al. 2009), and in untranslated (e.g., intronic) sections of coding mRNA transcripts (Squires et al. 2012). Using an approach similar to that employed detecting deamination of methylated cytosines in ancient DNA, for example, these lesions could be used as an identifying marker or proxy for identifying RNA function from a fragmented, muddled-up dataset and so begin to disentangle the knots described in the introduction to this chapter.

3.4 Cross-linking

Several types of molecular crosslinks are well-documented barriers to successful sequencing of ancient DNA. Cross-linking is essentially chemical bonding of a molecule (DNA, RNA, protein) to a nucleic acid strand which occurs, abnormally, in degraded material. The most ubiquitously studied type relevant to ancient DNA, interstrand crosslinks (ICLs), occurs via alkylating agents between strands of dsDNA and prevents amplification by restricting denaturation (Willerslev and Cooper 2005) and is thought to be even more limiting to data generation than fragmentation (Hansen et al. 2006). Similarly, intrastrand crosslinks, although less described in ancient DNA research, can occur as a bond between different sections of the same strand (Huang and Li 2013) and similarly inhibit amplification. Equally, intermolecular crosslinks between aDNA and proteins are a known phenomenon in ancient DNA (Willerslev and Cooper 2005). Whether or not these types of crosslinks occur spontaneously in ancient RNA is unknown, but induced cross-linking of RNA-RNA duplexes is a known diagnostic tool (Harris and Christian 2009).

One could speculate that RNA could form intrastrand crosslinks with itself or interstrand crosslinks with either RNA, DNA, or protein. Either could inhibit laboratory steps such as reverse transcription or PCR in much the same fashion as they do DNA; however, data at this point is lacking.

3.5 Enzymatic Breakdown

As we have mentioned throughout this chapter, enzymatic breakdown is a major contributor to nucleic acid survival, especially concerning RNA. The effect of DNases on aDNA is, according to a lack of attention in published literature, not considered to be a grave issue for its survival. RNases however are produced in significant quantities in most organisms, eukaryotes, and prokaryotes, as a way of maintaining transcript levels as part of regular cellular machinery. During postmortem decomposition, RNases are released as tissues break down and are intermixed with released RNA, resulting in RNA breakdown. Several factors can slow these processes such as desiccation (de los Rios et al. 1996) and freezing, although it is believed that these only reduce RNA activity as opposed to halting it altogether (Awano et al. 2008). Unsurprisingly, the amount of RNase posing a threat to RNA is dependent on tissue type, i.e., whether the tissue in question is susceptible to microbial attack or whether endogenous RNases are limited as part of a reproductive strategy as in dormant seed endosperm (Spanò et al. 2008).

Clearly this has implications for molecular bioarchaeology. As RNA studies are so limited, it is difficult to predict survival based on tissue or depositional factors. However, the presence of amplifiable RNA in mammalian soft tissues (Keller et al. 2017) and ubiquitous presence in seed endosperm (Smith et al. 2014a) suggests that the conditions most conducive for DNA survival may also inhibit RNase activity sufficiently to allow RNA recovery.

4 Perceived Information Value Compared to Ancient DNA

An unofficial dogma within molecular biology relates to the rapid breakdown of RNA molecules due to their increased lability, exposed single strands, and greater susceptibility to nucleases. Standardized laboratory protocols suggest that RNA should be stored (in water) at −80 instead of the more conventional −20 often used for purified DNA, nominally because of RNase activity in water-based storage media (Chomczynski and Sacchi 2006). However, the acceptance of other forms of RNA storage at higher temperatures (Chomczynski and Sacchi 2006; Fabre et al. 2014) suggests that there has been at least some degree of conflation of the “inherent” stability of RNA and laboratory-specific storage conditions, resulting in a dismissal of ancient RNA being worthy of attention.

There are, however, some legitimate concerns with the amount of comparably informative information RNA can give over DNA. A major issue is linked to the previously mentioned types of cellular RNA, in particular the overabundance of ribosomal RNA, combined with the issue of fragmentation. The average mammalian genome has several hundred copies of rRNA genes, and plants several thousand, although this is highly variable (Rogers and Bendich 1987). This routinely gives rise to such an abundance of those transcripts, and since (unlike intact RNA) the fragmented nature of aRNA renders any form of size selection moot, other RNA types are overwhelmed by relatively uninformative rRNA hyper-redundancy. This in turn reduces the relative level of regulatory or transcriptomic sequences, which arguably form the most informative fraction of the data. Fortunately, other methods can be employed to selectively isolate RNA sequences of interest.

One example of this is the enrichment of small RNA (see Sect. 6 for more details). Endogenously produced transcripts, of which small RNAs are no exception, are left with 3′ hydroxyl (OH) groups at the terminal 3′ end of the molecule (Elbashir et al. 2001). Fragmented molecules are on the other hand not necessarily left with 3′ OH groups, especially where mechanical (or other nonenzymatic) shearing has taken place (however experimental data for this phenomenon is lacking). Newer library construction methods take advantage of this fact by employing ligation enzymes that do not require ATP but are restricted to ligating specially modified adapters only to fragments containing a 3′ hydroxyl group (Tuschl et al. 2014). This allows for direct targeting of either mature short RNAs or the terminal ends of other transcripts. In both cases, the original type of RNA can be deduced from the sequence data itself if the organism’s transcriptome is well characterized. With this method, not only can non-hyper-redundant RNA sequences be retrieved, but their expression levels give a direct snapshot of in vivo processes that were taking place perimortem and, by extension, what sort of pressures were present.

5 Regulatory RNA

Eukaryotic genomes are regulated by networks of interactions between transcripts, proteins, and the genome itself. One such actor in a (relatively) new interaction pathway is that of the small RNA, 18–24 nt RNA molecules which themselves appear in two distinct flavors: short interfering RNA (siRNA) and microRNA (miRNA). Both have been discovered to persist in archaeological material, under very different preservation conditions.

MicroRNAs are involved in specific, often highly conserved regulatory pathways and as such are derived from specific genes within a genome. miRNA gene primary transcripts form a stem-loop (or “hairpin”), double-stranded structure from which the mature miRNA is processed by a series of protein complexes. DICER-like proteins cleave the dsRNA into its mature length, and the complementary strand is degraded, leaving only a single-stranded mature miRNA. This is then incorporated into a second set of Argonaute (AGO) proteins to form the RISC (RNA-induced silencing complex). Depending on the specifics of the RISC, the mature miRNA is directed to its genomic or transcriptomic target to enact its function, which is usually downregulation of messenger RNA by interception through complementary base pairing followed by cleavage of the transcript by the AGO protein. As is the case with regulatory networks, the miRNA target may itself be a downregulator such as a transcription factor, and so increased miRNA expression may not simply equate to reduced protein-coding gene expression. This however is a simplified summary; for a detailed overview of the miRNA biogenesis pathway, see Winter et al. (2009). Relevant to this chapter, however, is that these miRNAs are now known to be recoverable not only from ancient material (Keller et al. 2017; Smith et al. 2017) but from mammalian soft tissues that would have a priori not been suitable candidates due to autolytic releases of RNases. The first study of this kind (taken from various tissues of Ötzi, the permafrost-preserved Tyrolean “iceman”) showed that tissue-specific miRNA profiles are recoverable from ancient material, validating the principle of aRNA recovery for purposes beyond the genomic. The second, taken from desiccated barley grain from ancient Egypt, demonstrated that differential, environmentally induced profiles can be similarly reconstructed and as such give a real-time in vivo snapshot of adaptive processes. Even with these two conditions being at almost opposite ends of the spectrum of archaeological record (although both known to be conducive to aDNA survival), they represent access to information that cannot be achieved using ancient DNA alone.

The second type of small RNA, siRNA, is of similar size although differs in function and biogenesis. Typically, siRNA directs methyltransferases to genomic targets, resulting in suppression of gene expression since transcriptases often falter at methylated sites. siRNA can also be incorporated into the RISC complex to neutralize transcripts and other RNA molecules. Theoretically, siRNA can be produced from any RNA molecule; any transcript can be made double stranded using endogenous polymerases, and the resulting dsRNA can be processed into a targeting complex, similar to miRNA. Ancient siRNA sequences, isolated from the same archaeological barley as mentioned above, have been shown to correlate with genomic sequences showing elevated methylation (Smith et al. 2014b) in what is likely a stress-induced response in plants known as RNA-directed DNA methylation (RdDM).

Pertinently, siRNAs do not have to be endogenous to the genome facilitating their biogenesis (Snead and Rossi 2010). As previously mentioned, any RNA molecule can become a template for siRNA, regardless of its origin. A known immune response in plants is to produce siRNA deriving directly from the genome of an invading pathogen, allowing the RISC complex to be directed to that genome and prevent the pathogen’s functions, such as protein production or replication. In some cases, as detailed below, this could be key to discovering exogenous RNA of interest.

6 RNA Genomes

To date, only one aRNA genome has been characterized (Smith et al. 2014a). Sequence fragments of ancient barley stripe mosaic virus (BSMV) were detected when performing routine metagenomics on sRNA sequence data. Interestingly, many of the small sequences used to reconstruct the genome were in the size ranges expected for siRNA, suggesting that at least some of these fragments were not necessarily fragmented genomic components but the result of siRNA biogenesis as a defense mechanism by the infected host. This was, essentially, a serendipitous finding; equally fortunate is the fact that the typical RNA genome, limited only to viruses (or at least as far as we know), is only between 3 and 30 Kb in length, making reconstruction from a relatively small NGS dataset a fairly straightforward process.

The concept of a genomic “arms race” between pathogen and host, particularly where humans and staple crops are concerned, has long been a concern of the medical and agricultural communities (Stahl and Bishop 2000). One tool we may have to address these concerns is an understanding of the evolution of pathogens that we may be able to predict their evolutionary trajectories given a certain set of circumstances. Being in possession of more ancient or archaic strains of pathogens can give a wider basis in which to incorporate into existing models. Several human pathogenic genomes have been characterized from ancient DNA (Bos et al. 2015; Schuenemann et al. 2011; Müller et al. 2014), but the general lack of aRNA data means that viral genomes have not. Indeed, previously discovered but possibly inauthentic ancient viruses (Castello et al. 1999; Zhang et al. 2006) would have benefitted from being sequenced using NGS technology; like aDNA, telltale patterns of cytosine deamination are known to occur toward the ends of aRNA molecules, which is an obvious control for authenticity. Perhaps if RNA work becomes routine for samples of an appropriate preservation condition, more pathogen genomes may be recovered.

7 Endogenous Transcriptomics

While small RNA molecules can tell us about how the genome is being regulated, ribosomal and transfer RNA, due to their inherent ubiquity, can tell us relatively little. The final aspect of the transcriptome, however, messenger RNA, can potentially provide verification for aspects of the regulatory processes identified from small RNA. As with complete aRNA genomes, only one empirical study assessing the complete transcriptome using high-throughput sequencing has been published (Fordyce et al. 2013a). This represents an untapped resource which, when combined with other RNA classes, can give truly new insights into evolution, domestication, and drivers of these such as human interaction and paleoclimate.

8 Technical Considerations for aRNA

To conclude, we recap some of the technical considerations that should be taken into account when performing this kind of work. This should not be taken as anything akin to “criteria of authenticity” since those considerations should be a given when working with ancient biomolecules, however, but the following minor points can make big differences.

8.1 Isolation of aRNA

At the risk of stating the obvious, all extraction and pre-PCR laboratory work should take place in a strictly controlled setting, in the same fashion as with ancient DNA. Reagents, tubes, etc. should be certified RNase-free, and it is especially important to bear this in mind when making one’s own reagents. Unlike modern samples, however, the release of endogenous RNases from sample tissue during predigestion (i.e., crushing/grinding) is probably not such an issue due to the likely degradation of these enzymes over time. This however should not be a given, so samples that possibly contain active RNases (e.g., historical/herbaria tissues) should be flash frozen in liquid nitrogen first, to minimize potential degradation.

Analyses based on extracted RNA often require prior removal of any co-extracted DNA. There are several ways to do this, but no single method is ideal. DNase treatment is routinely used for modern tissues, but the fragmentary nature of ancient DNA limits the number of cleavage sites available to DNase (Sutton and Brown 1997) and so reduces its efficiency. Compensating for this using extended incubation may prove detrimental as DNase will preferentially degrade RNA where a suitable DNA substrate is unavailable (Smith 2012). The second method is to extract total nucleic acids in acidic (pH 4.8) phenol. Since DNA is slightly less acidic than RNA, lowering the pH of the organic solvent encourages DNA molecules to move out of the aqueous phase toward the interphase, leaving RNA alone in the aqueous phase (Chomczynski and Sacchi 2006). While this works well to eliminate DNA, it is time-consuming, reduces overall yield of RNA, and involves working with a dangerous substance. The removal of other coprecipitates often seen from ancient tissues can be achieved using repeat organic extractions.

Since many protocols for size-based selection of RNA are designed to capture small (>18 nt) RNA, no particular modified protocols geared specifically to ancient, fragmented RNA are required. Generally, they work along similar principles to ancient DNA, being bound to a silica matrix in the presence of chaotropic salts such as guanidinium thiocyanate.

8.2 NGS Library Building

As we described earlier, the 3′ hydroxyl group of endogenously processed small RNA can be taken advantage of for enrichment by using a pre-adenylated adapter for NGS library construction. This however has limitations where RNA fragments do not include a 3′ OH group where nonenzymatic shearing has taken place, and the high copy number of redundant rRNA and tRNA makes extracting meaningful information difficult. There are no protocols available at present to directly address this, although certain types of molecular enrichment might be a possibility with refinement to capture RNA instead of DNA. For the time being, we can take advantage of the ever-increasing output of existing NGS platforms and simply discard redundant data, albeit at a significantly higher cost per base than DNA.

8.3 Cytosine Deamination

As with aDNA, cytosines in aRNA are susceptible to hydrolytic deamination that converts them to uracil. Again, like aDNA, this tends to occur toward the termini of the molecule where cytosines are likely to be exposed on a single strand. Unlike aDNA, however, where UDG (uracil-DNA-glycolase) treatment can be employed to remove uracils and repair the abasic sites (Briggs et al. 2010), RNA cannot be subject to uracil removal since uracil is a canonical base. This poses a problem in that damaged and undamaged bases are chemically identical, and so no laboratory-based treatment can distinguish them. In these cases, a bioinformatics approach can be used to reconstruct contiguous sequences from shorter fragments but only where coverage is sufficient to allow deaminated terminal bases to be overlapped by unaltered cytosines in mid-sequence of other reads.

However, this is further compounded by uncertainty about secondary structure dynamics; in general, we see a propensity for mid-sequence deamination to be reduced, hypothetically because of pseudo-dsRNA formation; however, to be certain for individual reads, one would need to predict all possible secondary structures and calculate a likelihood of deamination for each cytosine. This, unsurprisingly, would consume massive amounts of computational time and may be unfeasible for NGS datasets but may not be necessary if significant coverage depth exists to call a consensus. For the time being, more data is needed.

8.4 RNA Methylation

As previously described, cytosine methylation in RNA is a known phenomenon, although its exact function(s) are unknown (Hussain et al. 2013). However, from a technical perspective specific to ancient biomolecules, it would be useful to (a) identify whether 5mC deamination to thymine occurs as in aDNA and (b) assess if the rate of which is comparable also. While a thymine in RNA is distinguishable from its uracil counterpart, this distinction would be lost during either reverse transcription or sequencing steps given current technology and so again produce problems with pattern matching. However, more information is needed on the extent of RNA methylation, causes, and effects before the phenomenon can be explored in ancient material.

9 Future Perspectives

In something akin to Moore’s Law (i.e., that computer processing power doubles around every 18 months), we are seeing a steady, if not exponential, increase in the economic value and raw power of next-generation sequencing technologies. This is allowing for increasingly small, heavily contaminated samples to be sequenced with the prospect of generating enough information to be economically and scientifically worthwhile simply by “sequencing more.” We are also seeing increasingly sophisticated models of nucleic acid diagenesis and encouraging aRNA results from permafrost and desiccated materials alike. These factors, combined with a renewed focus on genome activity and regulation, hold great potential for a renewed interest in ancient RNA. Especially pertinent are such themes in today’s world; when concerned with distinct evolutionary processes such as domestication, agriculture, and responses to changes in environment, a deeper look at the past may give us a new perspective of the future.

10 Conclusions

For various reasons, ancient RNA has not been given the same level of attention as ancient DNA since it was realized that ancient biomolecules can be sequenced. Some of those reasons are justified, but recent research challenges some of the assumptions about the molecule’s inherent instability and lack of availability. New sequencing technologies have shown that recovery and extrapolation of meaningful information from ancient RNA are not only possible but can give insight into areas that more traditional methods cannot. In particular, in vivo processes can be reconstructed, and even more excitingly, the basis of response to changing environments can be documented. These responses are key to our understanding of the fundamentals of molecular evolution, and so being granted access to these moments as they occurred in the past is a resource definitely worthy of further exploitation.

We suggest that these recent advances, although modest, are the beginning of a new enthusiasm for aRNA research. There are, of course, challenges to comprehensively analyze this kind of data, but the constant stream of new ideas to molecular biology can be adapted to the study of their degraded, ancient counterparts. The rise of massively parallel sequencing technologies, combined with a renewed focus on RNA-related biological dimensions such as epigenetics, suggests that we will soon be able to track and trace molecular evolution more coherently than ever before.