Keywords

7.1 Shining a Light on the Past: The Promise of Ancient DNA

Ancient DNA (aDNA) has fostered a revolution in evolutionary genomics, as it allows direct observation of historical molecular diversity (Der Sarkissian et al. 2014). Previously, hypotheses were based solely on the observation of modern genetic diversity, which is the end effect of thousands of years of evolution, with the main caveat that the same pattern of genetic variation is often consistent with different historical scenarios (Lawson et al. 2018). The analysis of aDNA allows the genomic characterization of populations at different points in time, adding a fundamentally new dimension to evolutionary studies (Gutaker and Burbano 2017; Orlando et al. 2021).

The very first aDNA analysis was conducted on a mitochondrial sequence of a museum-preserved quagga (Higuchi et al. 1984). Since then, the field of archaeogenomics has rapidly flourished (Morozova et al. 2016), allowing for a better understanding of human, animal, and plant evolutionary history. Recent advances in this field include sedimentary, epigenetic, pathogens, and microbiome aDNA analysis (Key et al. 2020; Parducci et al. 2017; Pedersen et al. 2014; Spyrou et al. 2019; Warinner et al. 2014).

aDNA has already had a remarkable impact on our understanding of human history, shedding light on important patterns of migration (Lacan et al. 2011), admixture (Yang et al. 2020), adaptation (Marciniak and Perry 2017), population dispersal, expansion, and decline (Nielsen et al. 2017). Notably, aDNA gave fundamental contribution to our knowledge about the genetic relationships between modern humans and their extinct relatives Neanderthals (Weyrich et al. 2015) and Denisovans (Krause et al. 2010; Reich et al. 2010), the latter of which have only been identified through aDNA analysis. Similar insights have been gained in other animals, such as dogs (Botigue et al. 2017; Leathlobhair et al. 2018), cattle (Daly et al. 2018; Verdugo et al. 2019), pigs (Frantz et al. 2019), and horses (Gaunitz et al. 2018). These studies have led to a reassessment of previous evidence and an overturning of the existing narrative (Librado et al. 2021).

Now, aDNA promises a similar revolution in our understanding of how crops have been domesticated and spread around the globe, and the ways that these processes have shaped genetic diversity. By revealing how crops have adapted to new environments and what genetic diversity has been lost, aDNA can also set a basis for future breeding strategies (di Donato et al. 2018; Pont et al. 2019b). Crop archaeogenomics is still in its infancy, but aDNA from several important crops has been analysed, including maize (Ramos-Madrigal et al. 2016), barley (Mascher et al. 2016; Palmer et al. 2009), cotton (Palmer et al. 2012), bean (Trucchi et al. 2021), sunflower (Wales et al. 2018), sorghum (Smith et al. 2019), watermelon (Renner et al. 2019), and emmer wheat (Scott et al. 2019).

In this chapter, we first give a very brief overview of the history of wheat cultivation and the key genetic changes involved. The aDNA technology promises unique insights in this area. We review the wheat aDNA studies carried out so far and their contribution to understanding phenomena that have shaped wheat genomes. To conclude, we discuss the key open questions in this field and discuss the limitations posed by wheat’s large polyploid genome and idiosyncratic preservation. Our goal is to give an overview of the important answered and unanswered questions in the history of wheat cultivation and the promise of aDNA for resolving them.

7.2 A Brief History of Wheat Cultivation

Human societies have relied on wheat for thousands of years. Thus, the history of wheat domestication, geographic expansion, and cultivation has cross-disciplinary significance (Fig. 7.1). Understanding how wheat genetic diversity has been shaped also has contemporary relevance due to its continued nutritional and economic importance. Archaeogenomic studies aim to give new information about at least three key aspects of this process: domestication, dispersal, and gene flow between different wheat species. To contextualize contributions from archaeogenomics, we briefly overview these basic tenets of wheat cultivation history.

Fig. 7.1
figure 1

Wheat has been culturally important for millennia, and DNA extracted from ancient specimens can reveal how humans have shaped crop genetic diversity. Left: Facsimile of a vignette on the tomb of Sennedjem and Iineferti showing grain harvest in the abundant fields of the next life (painted by Charles K Wilkinson in 1922 CE, original ca. 1295–1213 BCE, public domain image from the Metropolitan Museum of Art). Right: Archaeological specimens of desiccated emmer wheat chaff from Egypt. Photo from Dorian Q. Fuller, University College London, Institute of Archaeology

7.2.1 Domestication

Wild tetraploid emmer wheat was one of the first species to be domesticated (Haas et al. 2018), during the so-called Neolithic Transition, in parallel with humans’ shift from hunting and gathering to agriculture and animal husbandry (Diamond 2002). The quintessential trait for cereal domestication is the loss of rachis brittleness: in wild cereals, the spikelets disarticulate spontaneously from the rachis upon maturity, ensuring seed dispersal and germination. In domestic cereals, the rachis is non-brittle; spikelets remain attached, allowing easier harvesting but requiring subsequent sowing in the following season in order to germinate. Because plants with a non-brittle rachis depend on human action for dispersal, this phenotype has been used to define domestication in cereals (Abbo et al. 2014; Snir et al. 2015). Loss-of-function mutations in the TtBtr1-A and TtBtr1-B genes on chromosomes 3A and 3B are the main determinants of such phenotype (Avni et al. 2017; Nave et al. 2019). Therefore, alleles at these two loci essentially distinguish wild from domesticated emmer wheat. Other traits that are favourable in the human-mediated environment and most likely deleterious in a wild environment (Kantar et al. 2017; Purugganan and Fuller 2009) give a more broad definition of the “domestication syndrome” (Larson et al. 2014), like the loss of seed dormancy and larger seed sizes (Haas et al. 2018; Zohary 2013).

Wild emmer wheat has a very restricted distribution, growing only in the Fertile Crescent region of Southwest (SW) Asia (Vavilov et al. 1992). The exact location of the emergence of domestic emmer has been a long-standing controversy. In the 2000s, early genetic studies started addressing this issue, with the so-called cradle of agriculture theory (Lev-Yadun et al. 2000). Further genetic studies had pointed to the Northern Fertile Crescent and specifically to the Karaca Dağ Mountain region as the centre of domestication of emmer wheat (Luo et al. 2007; Ozkan et al. 2002, 2005), mostly based on the higher similarities between the genomes of the modern domestic landraces and the wild emmer from the Northern Levant, compared to that of the Southern Levant (Avni et al. 2017).

However, this monophyletic origin has been challenged with increasing evidence that different wild populations have contributed to domestic wheats. Several authors argue that domestic emmer wheat arose from an admixed wild population and that mutations for domestication traits appeared in different chromosomes at different times and possibly in different places (Civáň et al. 2013; Jorgensen et al. 2017; Oliveira et al. 2020). This is in line with the observation that the domestic phenotype, which requires at least two independent recessive mutations, took millennia to be established (Avni et al. 2017; Fuller et al. 2014). As testified by the archaeological record, wild emmer wheat was first exploited in the Southern Levant, where increasing, even though small, proportions of phenotypically domestic emmer wheat are found at different archaeological sites as early as during Early Pre-Pottery Neolithic B (8700–8200 BCE) (Arranz-Otaegui et al. 2018). However, domesticated emmer is found in very high proportions in the Northern Levant starting from the Middle/Late Pre-Pottery Neolithic B (8200–6300 BCE) (Arranz-Otaegui et al. 2016). This indicates that wild emmer was managed (a phenomenon often regarded as “pre-domestication cultivation”) (Fuller et al. 2010) long before the domestic forms emerged, and that probably wild populations from across the Fertile Crescent contributed to the domestic pool (Feldman and Kislev 2007). The role of introgression from wild to domestic wheat has been demonstrated by several studies, e.g. (Cheng et al. 2019; Pont et al. 2019b; Przewieslik-Allen et al. 2021), even though the context in which these introgression events took place remains unknown.

Overall, archaeology and genetics point to a slow and geographically widespread domestication process in which both the Northern Levant and the Southern Levant played an important role.

7.2.2 Evolution

Domestic emmer wheat (Triticum turgidum subsp. dicoccon) gave rise to today’s most economically important wheats: tetraploid durum wheat (T. turgidum subsp. durum) and hexaploid bread wheat (T. turgidum subsp. aestivum). These descendants differ from their ancestor in one character of great agricultural importance: the free-threshing phenotype. Emmer is a hulled, non-free-threshing wheat, and the extraction of seeds from husks requires substantial mechanical processing. On the other hand, durum and bread wheat are naked and free-threshing: as the spikelets disarticulate from the rachis they fall apart, releasing the seeds without further processing. While durum wheat is tetraploid (BBAA), bread wheat is hexaploid (BBAADD) and evolved from the hybridization of tetraploid wheat with the diploid wild goatgrass (Aegilops tauschii), donor of the D subgenome (Haas et al. 2018; Pont et al. 2019a). The tetraploid that contributed the B and A subgenomes to bread wheat has been a matter of debate (Sharma et al. 2019), but considering the need for multiple mutations to determine the free-threshing phenotype, the most supported (and most parsimonious) models indicate that hybridization with A. tauschii occurred with a free-threshing tetraploid (Zhou et al. 2020).

The emergence of modern wheat is therefore the result of three processes: (I) domestication of wild emmer wheat, associated with the loss of rachis brittleness; (II) crop evolution (often also referred to as crop improvement under cultivation), which includes the emergence of the free-threshing phenotype and adaptation to new ecological niches; (III) allopolyploidization between a free-threshing tetraploid with A. tauschii, giving rise to bread wheat. We summarize these changes in Fig. 7.2.

Fig. 7.2
figure 2

Schematic representation of the domestication and evolution of the most economically important wheats today, showing important phenotypes and the mutations that determine them. Basic information about the appearance of the different wheats in the archaeological record is given on the right. The small white hand represents the investment of human labour in processing the harvest. *For simplicity, we use the common name “durum wheat” for all free-threshing tetraploids, but other common names are used for free-threshing tetraploids, and it is not known which was involved in this allopolyploid event. This scheme is an adaptation of the model proposed by Sharma et al. (2019)

Perhaps surprisingly, hulled wheats continued to be used for thousands of years after the appearance of free-threshing durum wheat and bread wheat. The slow and regionally specific shifts in wheat usage probably reflect cultural practices and preferences (Nesbitt and Samuel 1996). Also, increasing archaeological evidence shows that early farmers relied on a wide range of other domestic wheats for their subsistence, including einkorn, spelt, and Triticum timopheevii alongside emmer and free-threshing wheats (Özbaşaran et al. 2018). This is in accordance with the evidence for intra and interspecific introgression that has been detected in modern wheat (Cheng et al. 2019; Zhou et al. 2020).

7.3 Archaeogenomics of Wheat

Wheat archaeogenomics is a powerful tool to investigate how wild wheat evolved into domestic forms and how these domestic wheat varieties adapted to different ecological niches and cultural preferences through history.

However, the limitations and the characteristics of ancient genomes have to some extent impacted the approach taken in this research field. Before high-quality reference genomes were available, most studies avoided whole-genome analysis and used a target and amplification strategy. This mitigates the challenges of a large genome but gives much less rich genomic information. Furthermore, the primers used for amplification mask the characteristic patterns of degradation that are useful for ruling out contamination by confirming the antiquity of the DNA. Unlike these amplification methods, whole-genome libraries can also be re-analysed to get more data without further destructive sampling of rare material. For these reasons, amplification approaches are no longer recommended for ancient samples (Gutaker and Burbano 2017; Prüfer and Meyer 2015).

We first overview wheat aDNA studies that use amplification and then describe the first two whole-genome analyses. Even though wheat archaeogenomics is in a germinal stage, the results have shifted our understanding of wheat genetics in important ways.

7.3.1 Target Gene Amplification

The most common use of target gene amplification has been to interrogate key genes or to identify wheat remains at the species level. The x and y copies of the Glu1 loci were often the focus of early studies. These genes, present in all wheat subgenomes, are located in the long arms of chromosome 1 and encode for the high molecular weight glutenin subunits (HMW-GSs), storage proteins present in the starchy endosperm cells of wheat. Allelic varieties in these genes impact the properties of dough for bread making. Because of its effect over bread quality, the evolution of the HMW genes can provide insights into the nature of human selective pressures during wheat evolution (Allaby et al. 1999). In this manuscript, authors surveyed these loci in a collection of modern and ancient wheats, constructed a phylogenetic tree, and obtained time estimates by using a substitution rate to calibrate the observed variation. By comparing the genetic variability for x and y copies in each genome, they were able to determine that the genetic variability in these loci for the cultivated species predates domestication, pointing to either incomplete lineage sorting, multiple domestication events, or introgression after domestication. Another study used a similar approach with the same loci to inquire about the origins of spelt (Blatter et al. 2002). They surveyed a collection of modern and ancient bread wheat and spelt specimens and determined that the high genetic variability of spelt compared to that of bread wheat in the A and B genomes are compatible with the origin of spelt being a hybridization event between bread wheat and hulled tetraploid emmer.

HMW genes have also been used to identify wheat remains at the subspecies level and inform about its dispersal. Without associated chaff, it is difficult to distinguish between free-threshing wheats (e.g. bread wheat or durum wheat). Bilgic et al. (2016) targeted the HMW promoter region in 8400-year-old specimens from a notorious Neolithic site in central Turkey, Çatalhöyük, to determine whether the genetic variability characteristic of the D genome could be recovered, as a proof of that wheat being hexaploid. The finding of HMW subunits from the A, B, and D genomes is quite remarkable, since it evidences the presence of hexaploid wheat at a very early point in time and highlights the importance of this settlement in the expansion of hexaploid wheat cultivation. Another study used the Internal Transcribed Spacer regions (ITS1 and ITS2) and the Inter-Genic Spacer region (IGS) from the nuclear ribosomal DNA for species level identification (Li et al. 2011). They also found early evidence for hexaploid wheat in Northwest China around 1760–1540 BCE.

These results highlight the high diversity of wheats consumed by humans during early agricultural expansion. Free-threshing naked wheats first appear in the archaeological record between 7000 and 5500 BCE (Feldman and Kislev 2007). Early naked wheats co-existed with domestic and wild emmer populations (Bilgic et al. 2016), giving opportunities for genetic exchange. Along with the protracted period of emmer domestication, this probably explains the higher genetic diversity on A and B subgenomes of modern bread wheat compared to the D subgenome (Cheng et al. 2019). This demonstrates how the details of agricultural history directly impact modern wheat diversity and breeding. Moreover, other wild Triticum species gave rise to domestic forms during the Neolithic. These include the diploid einkorn wheat, Triticum monococcum subsp. monococcum, that emerged from wild einkorn, T. monococcum subsp. Aegilopoides (Nesbitt and Samuel 1996), spelt (Triticum spelta), an hulled hexaploid, and tetraploid T. timopheevii (domesticated from T. timopheevii araraticum) (Wagenaar 1966), only recently classified thanks to aDNA analysis.

The position of T. timopheevii within the domestication process of wheat in SW Asia exemplifies the value of aDNA to gain insights on certain domestication processes. Briefly, due to the technical difficulties in the identification of T. timopheevii, for a long time its existence was questioned, and it was often unclassified, or ascribed to other wheat species, such as “New Glume Wheat”. Recently, archaeological remains described as “New Glume Wheat” have been designated as domestic T. timopheevii based on aDNA evidence (Czajkowska et al. 2020). The authors used the Ppd1 locus to identify G genome alleles in “New Glume Wheat” remains. This study has sparked the interest of the archaeobotanical community. Decades have passed since the first classification of an archaeological specimen to “New Glume Wheat”. It was not until numerous remains of this type of wheat were found in several Neolithic and Bronze Age archaeological sites in northern Greece and compared with other locations (Jones et al. 2000) that archaeologists were able to describe the distinctive features of this wheat (Ulaş and Fiorentino 2021). Nevertheless, identification based on grain morphology is still problematic. The identification of New Glume Wheat as domestic T. timopheevii thanks to ancient DNA analysis has had important ramifications on our understanding of the complexity of the domestication process in SW Asia and the confirmation that multiple species evolved into domestic forms, moving away from the “founder crops” theory. T. timopheevii was actually cultivated for a very long period of time in certain regions. New efforts are now being undertaken to revisit archaeobotanical assemblages and reassess the relative abundance of plant species, with the expectation that many grains classified as emmer wheat will now be classified as T. timopheevii.

The HMW loci were also used, together with the ribulose 1,5 biphosphate carboxylase (rbcL) and the chloroplast microsatellite WCT12 in the chloroplast genome to study the viability of DNA extraction on ancient plant specimens (Fernández et al. 2013). In this study, 126 grains of naked wheat in different preservation conditions (charred, partially charred, and waterlogged) were analysed (Fig. 7.3 shows different preservation conditions of ancient wheat samples). Results showed that DNA extraction from totally charred remains is virtually impossible, while DNA amplification of modern contaminants is pervasive. Unfortunately, almost all of the most ancient archaeological wheat specimens are charred, which is a severe limitation for future aDNA studies.

Fig. 7.3
figure 3

Examples of different preservation conditions of archaeobotanical wheat. Left: charred emmer wheat seeds from the Vinča culture in Serbia (middle/late Neolithic; c. 5400–4600/4500 BC), published in Filipovic (2014). Right: Waterlogged chaff remains of Triticum cf. durum/turgidum from the end of the 5th millennium BC at the site of Les Bagnoles. Photo by Raül Soteras, AgriChange Project, reproduced with permission

As mentioned above, one important limitation of amplification-based studies is the confidence with which one can rule out contamination. Commonly used indicators such as the fragment length distribution or deamination patterns are difficult to assess in target-specific PCR amplification studies. In addition, Allaby et al. (1999) reported PCR jumping, probably related with the shortness of some fragments. Their results showed patterns of linked diversity that did not exist in the modern pool and had to manually rearrange the observed diversity so it would match known modern haplotypes with the subsequent potential biases.

Different strategies have been used to increase confidence in the antiquity of the data. Allaby et al. replicated the results in situ with the same specimen and produced blanks with each extraction run. Czajkowska et al. (2020) performed the extractions in laboratory facilities where no wheat had been processed before, hoping to preclude contamination. Bilgic et al. (2016) processed all samples in two different facilities, so that replication of the results acts as a proof of authenticity. In spite of this, even if contamination can be ruled out, it is not possible to distinguish deamination patterns from true polymorphisms. Therefore, phylogenetic analyses and interpretation of the accumulation of variation through time should be taken with caution unless transitions (C/T or G/A SNPs) are excluded.

7.3.2 Whole-Genome Analyses

As with modern wheat samples, the genomic scale of archaeological wheat genetics has been expanded since the publication of reference genomes (Table 7.1). Nevertheless, only two studies have so far reported whole-genome sequence from archaeological wheat specimens. One has been the analysis of several bread wheat remains from China to infer dispersal into the region (Wu et al. 2019). The earliest bread wheat remains found in China date to approximately 4500 years ago in the north-western part of the country, but the most interesting aspect of its dispersal is that upon its arrival, wheat had to be adapted to a wide variety of climatic conditions. Ancient wheat from two archaeological sites within the Xinjiang winter-spring wheat zone was analysed. Even though coverage was extremely low (0.25–0.01x), the authors were able to call more than 7000 SNP sites, compare them with modern data from neighbouring regions, and provide new evidence on wheat dispersal in China, a still controversial topic. Their results were consistent with one of the routes that had been previously suggested: an early dispersal into the Qinjianh Tibetan plateau, based on the highest genetic similarities between the ancient samples and the modern ones from that region. Conversely, another ancient route that advocated for an introduction towards the eastern region was not supported. However, more data is needed to determine whether different gene pools were introduced to China and to confirm that modern landraces correspond with ancient ones from the same area.

Table 7.1 Genomic information available for wheats and relatives mentioned in the text

Another whole-genome analysis of archaeobotanical specimens looked at two desiccated samples of 3000-year-old emmer wheat chaff (Fig. 7.4) from Egypt (Scott et al. 2019) to investigate early wheat dispersal and introgression from wild populations. The ancient samples were used to genotype exonic SNPs that segregate in modern accessions, at which coverage was 0.48 X after quality control, yielding approximately 100,000 high confidence genotypes. The authors used a haplotype-based approach to overcome as much as possible the limitations of aDNA analysis of polyploid species. Nearby sites that are not broken apart by recombination form co-inherited blocks called haplotypes. A “haplotype reference panel” combines information from multiple modern genomes to characterise the haplotypic variation at each genomic location (McCarthy et al. 2016). In the analysis of ancient data, when a sufficient number of genotypes can be identified within a region, it is possible to assign a known haplotype (or no known haplotype, as may be the case when ancient diversity has been lost in existing populations) to the ancient sample. At this point, non-sequenced genotypes within the region can be deduced based on haplotype assignment, a method called imputation. Haplotypes are relatively long in wheat (Walkowiak et al. 2020) because selfing tends not to break apart haplotypes as much as outcrossing. As a consequence, low coverage data is more likely to yield enough sites to assign an individual to a haplotype. This method allowed Scott et al. (2019) to identify genomic tracts tens of megabases long containing hundreds of genotypes that matched a modern sample in the haplotype reference panel. These included regions where important domestication QTLs had been identified, such that the domestication allele can be imputed and the phenotype inferred. In contrast, other genomic regions did not match anything in the haplotype reference panel.

Fig. 7.4
figure 4

Desiccated emmer wheat chaff from Hememiah North Spur (Egypt) 14C dated 1300–1000 BC, analysed by Scott et al. 2019. Photo by Chris J. Stevens, reproduced with permission

The data essentially confirmed that genetic changes associated with domestication were completed by 3000 years ago, prior to emmer wheat dispersal to Egypt. Nevertheless, the ancient Egyptian sample carried more “unique” haplotypes than any other domesticated sample in the dataset, indicating regions where genetic diversity has been lost. It is not yet possible to state whether this lost variation is associated with adaptation to local environmental conditions or confers other useful traits. Nevertheless, these results highlight geographic and genomic regions that may harbour genetic diversity that has been used in the past and therefore might be useful in the present and future. Moreover, while the highly repetitive nature of the wheat genome increases the chances of misalignment issues and subsequent inflated heterozygosity, Scott et al. (2019) found that the estimated heterozygosity of the ancient sample fell within the range of the modern samples. This suggests that reliable genotypes can be obtained from ancient wheat, providing appropriate quality filters are used to restrict attention to sites that do not suffer from alignment problems.

Important results from this study concern early emmer wheat dispersal. Ancient routes of dispersal generally define modern population structure and overall genetic similarity but, with the changing usage of different wheat species and the adoption of modern elite varieties, we have little grasp of historical population dispersal and replacement. Contemporary emmer wheat subpopulations (landraces) reflect the dispersal outside of SW Asia to the West (Mediterranean), to the Balkans (Eastern Europe), to Transcaucasia (Caucasus) and towards India and the Arabian peninsula (Indian Ocean) (Avni et al. 2017). The authors found that the ancient sample from Egypt resembles modern cultivars from the Indian Ocean subgroup, indicating a connection between early emmer dispersal to the East (across the Iranian Plateau and into the Indus valley) and to the South-West (Nile Valley). This is particularly interesting in light of the fact that Ethiopia currently represents a region of genetic isolation and differentiation for tetraploid wheat. This ancient Egyptian sample also has signatures of gene flow with wild populations in the Southern Levant, which could have occurred during dispersal towards Egypt or during Egyptian conquests in the Ramesside era. We expect further aDNA studies to connect historical events with changes to wheat genetics. Answering these questions will not only bring a deeper understanding of wheat evolution, but also human history, which has been intimately linked to wheat cultivation for millennia.

Overall, the field of wheat archaeogenomics has yet to reach its full potential. However, the field is primed for new advances with the availability of reference genomes and a wealth of resequenced modern landraces for comparison. While the prospects for studying DNA from charred remains are poor, many desiccated or waterlogged samples have great potential for further study. Archaeological research on waterlogged sites is increasing, which promises new material to complement the specimens currently in museums and collections.

7.4 Analysing Degraded DNA from Ancient Polyploid Wheat

Degradation and contamination are key complications for the reliable analysis of ancient DNA. To mitigate these problems, specific methods have been developed for sample preparation and downstream analysis (reviewed in Orlando et al. 2021). Even with appropriate methodology, DNA from ancient and historical samples cannot be used for all the applications that modern sequence data allows. We briefly overview these general principles of ancient DNA analysis, before discussing the specific issues posed by wheat, as all these factors should be considered during study design and analysis. We expect future methodological improvements to address these challenges, raising the possibility of resolving further important questions in the history of wheat domestication and evolution.

7.4.1 aDNA Damage

A prominent difference between ancient and modern DNA is that ancient DNA is much more fragmented prior to extraction (Fig. 7.5a). Most DNA fragmentation occurs rapidly after death (Kistler et al. 2017), as the DNA “backbone” breaks down through a process called “hydrolytic depurination”, which is biochemically predicted to occur more rapidly with exposure to water and high temperatures (Lindahl 1993). Thus, local preservation and environmental conditions are key in determining DNA yield and quality in different samples. Nevertheless, fruitful DNA sequencing has been conducted from plant tissue that is thousands of years old and from tropical and warm environments (Fornaciari et al. 2018; Mascher et al. 2016; Ramos-Madrigal et al. 2016; Renner et al. 2019). Overall, excellent DNA preservation has been reported from plant remains in desiccated and waterlogged conditions (Kistler et al. 2020).

Fig. 7.5
figure 5

Characteristic patterns of DNA degradation in sequence from a 3000-year-old emmer wheat sample (Scott et al. 2019). a Shows the raw distribution of fragments sizes and b shows misincorporations relative to the reference genome after alignment. In this case, the sequenced library was partially UDG treated such that the misincorporations caused by post-mortem damage are confined to a few base pairs at the fragment ends, which are removed for further analysis

Besides fragmentation, the DNA sequence itself undergoes modifications. Notably, a proportion of cytosine residues lose an amine group, becoming uracil residues, which code as thymine during sequencing (Briggs et al. 2007). This hydrolytic deamination occurs more commonly on the single stranded overhangs of the fragmented DNA molecules. As a result, when aligned to a reference genome, sequenced ancient DNA has a higher proportion of C-to-T misincorporations at the 5′ end of each fragment. Double-stranded DNA libraries will also show a higher proportion of the complementary misincorporation, G-to-A, at the 3′ end of each fragment after alignment.

These characteristic patterns of degradation found in ancient samples can be useful to the analysis, as they are proof of the sample antiquity. Therefore, the most common approach is to carry out a protocol developed for partial UDG treatment (Rohland et al. 2015). With this method, uracil-DNA-glycosylase (UDG) is used to remove uracils (Briggs et al. 2010) in the inner region of the fragments, but not at their ends. In this way, some amount of damage is maintained, but it is confined to the fragment ends (Fig. 7.5b). Similarly, the distribution of fragment lengths is used to confirm that the sequenced DNA is ancient, where large fragments may indicate contamination. Finally, paired-end sequencing of short fragments will often result in the same base pair being sequenced twice, which can be used to improve confidence in the sequence (Jonsson et al. 2014).

Standard bioinformatic protocols have been established for processing fragmented and damaged DNA. In general, standard approaches have been established for mapping short-read data to reference genomes and automated tools/pipelines are available for ancient genotypes calling for downstream analyses (Peltzer et al. 2016; Schubert et al. 2014). Common methods involve trimming off all the base pairs at the end of fragments that are potentially affected by damage (Jonsson et al. 2014) and verifying that analyses are unaffected when transitions (SNPs where the two alleles are either C/T or G/A and that can include post-mortem damage) are excluded (Korneliussen et al. 2014). We further note that “reference bias” (preferential alignment of reads carrying the same allele as the reference) is stronger in ancient data due to the shorter fragment size, so correction methods should be used (Günther and Nettelblad 2019).

For all these reasons, whole-genome sequencing has become the standard in ancient DNA studies, while PCR-based approaches are no longer considered unless for very specific goals such as genome identification, since they do not allow to verify the presence of these important patterns of post-mortem damage and to exclude contamination.

Contamination is a significant concern in ancient DNA studies. Because the amount of DNA preserved in ancient samples tends to be low, relatively small amounts of contamination from contemporary material can overwhelm the target DNA in the library (Renaud et al. 2019). Extraction and manipulation of ancient DNA therefore requires specialized facilities with protocols that minimize contamination by modern DNA (Fulton 2012). Standard practice is to create a control sequencing library without using the sample tissue (an “extraction blank”). The data from controls is analysed alongside the main sample to quantify the contamination and spurious signals likely to have been introduced during DNA extraction. Contamination can also come from microbial decomposers that invade tissues after death. A simple estimate for overall contamination is the percentage of reads that can be aligned to the reference genome of the targeted species, although other methods are available (Peyrégne and Prüfer 2020). So far, the percentage of endogenous DNA (the DNA of interest) reported in whole-genome studies of ancient plants has been high, compared to animal studies. For example, reported endogenous fractions have been 33–66% in emmer wheat (Scott et al. 2019), 5–90% in bread wheat (Wu et al. 2019), 7–54% (mean 44%) in common bean (Trucchi et al. 2021), and 70% in maize (Ramos-Madrigal et al. 2016).

Degradation and contamination limit the applications of ancient DNA, relative to modern DNA. Firstly, the fraction of endogenous DNA in well-preserved ancient DNA libraries is far below that of modern DNA (which usually is > 99%). Because endogenous fragments are short, the sequencer will often read through the DNA fragment and continue onto the adapter sequences used for library preparation. Sequenced adapter fragments must thus be discarded. Furthermore, if the sequencing has been performed for paired-ends, the forward and reverse reads will overlap (and are then collapsed into a consensus sequence). Given the low endogenous content and the short fragments, more sequence data is needed to reach reasonable coverage. Nevertheless, when small amounts of DNA are present in the sample, it may not be possible to keep sequencing to increase the coverage, since the library gradually yields diminishing returns as more duplicate reads are sequenced (Link et al. 2017). For all these reasons, coverage tends to be significantly lower in aDNA studies, when compared to the expectations for modern data.

Overall, due to low coverage and short fragments in ancient DNA, a typical approach is to identify variable sites (e.g. SNPs) using modern samples only, then use ancient DNA alignments to genotype the ancient samples. Fortunately, this approach often yields sufficient high-quality genotypes to perform analyses of interest, such as estimating genome-wide relatedness, introgression, and population genetic parameters.

7.4.2 Large Polyploid Wheat Genomes

The large genome of wheat (17 gigabases for bread wheat) implies that whole-genome sequencing of each wheat sample requires more resources compared to other organisms with smaller genomes. This cost is exacerbated in ancient DNA studies by the lower fraction of endogenous DNA, which requires further sequencing effort to obtain the same genomic coverage. In wheat, pre-designed probes are available for exons and promoters (Gardiner et al. 2019; Jordan et al. 2015), which reduce sequencing costs by enriching for sequences that are captured by the probes used. In ancient DNA, capture can enrich endogenous DNA (Hofreiter et al. 2015) but increase clonality and introduce biases towards the sequence on the probes (Ávila-Arcos et al. 2011). Exome-wide capture has not been reported for an ancient wheat. However, targeted capture might be useful to avoid repetitive regions since short aDNA fragments give little information about this class of DNA.

Ploidy and the high identity between subgenomes, estimated to be as high as 97–98%, supposes another challenge for ancient DNA studies. Even with modern samples, wheat resequencing studies can only reliably observe genomic regions that can be unambiguously aligned using the read lengths available. The shorter fragment length of ancient DNA places a practical limit on the portion of the genome that can be directly observed by mapping to reference genomes.

Heterozygosity is commonly used as an indicator of misalignment problems. Because wheat is predominantly selfing (Golenberg 1988), most sites should be homozygous in most individuals. However, various structural variants can cause reads from different genomic regions in the sample to be aligned to the same position in the reference genome (Fig. 7.6) with high mapping-quality scores, thus passing quality filters. As a consequence, sample heterozygosity will be inflated after calling genotypes. A common solution is to remove variants that are heterozygous in multiple samples, e.g. (Gardiner et al. 2019; He et al. 2019). Recent data indicates that undetected gene duplicates are common within wheat subgenomes on reference assemblies (Alonge et al. 2020). In general, polyploid wheat resequencing data will suffer from additional misalignments due to homeologous sequences on different subgenomes, but reliable genotypes can be obtained from both modern and ancient wheat provided appropriate quality filters are used to restrict attention to sites that do not suffer from alignment problems. Nevertheless, we emphasize that care should be taken when measuring heterozygosity in polyploid wheats, especially from ancient genomes. The limitations in estimating heterozygosity are unfortunate because it is heterozygosity that is a common indicator of outcrossing and genetic variation in the population, changes to which are key questions in the history of cultivation practices (Smith et al. 2019; Trucchi et al. 2021).

Fig. 7.6
figure 6

False heterozygosity introduced by mis-mappings to the reference. Here, we consider two genomic regions (blue and yellow), which are homeologues or duplicated regions that are relatively similar to one another. A site in each region is genotyped (coloured purple and green). In a, the sample is similar to the reference so that reads can be aligned to the correct region, and the genotype calls are all homozygous, as expected for most sites in a largely selfing species. In b, there is a difference between the reference genome and sequenced genome (indicated in grey). The sample reads from the blue genomic region in b are best aligned to the yellow region of the reference. This results in a heterozygous genotype call, while all the true genotypes are homozygous. Thus, inaccurate reference genome assemblies, deletions, insertions, or duplications can all result in spurious heterozygous genotypes

7.5 The Future of the Past: Open Questions and Prospects for Wheat aDNA

Crop archaeogenomics has already proved to be a powerful tool to investigate phenomena such as domestication, crop dispersal, and subsequent adaptation (Kistler et al. 2020; Orlando et al. 2021). Studies on bean (Trucchi et al. 2021), sunflower (Wales et al. 2019), and sorghum (Smith et al. 2019) showed that the “domestication bottleneck” (i.e. the initial loss of genetic diversity associated with domestication) may not be as intense as previously assumed. Ancient DNA analysis has been used to trace the origin of some important winemaking grape cultivars (Ramos-Madrigal et al. 2019) and brought insights on the genetic basis of potato adaptation to the European climate (Gutaker et al. 2019). In maize, adaptation to climatic constraints (selected from ancient standing variation within the domestic forms) has been identified as the main driver of modern differentiation between populations (Da Fonseca et al. 2015; Swarts et al. 2017).

7.5.1 Open Questions in Domestication

In recent years, some paradigms of domestication have been challenged by new scientific discoveries, and wheat represents a good example of such changing perspectives. Because now we know that domestic forms took thousands of years to dominate archaeological assemblages and that different wild populations seem to contribute to modern diversity, it is likely that wheat domestication was not as severe, abrupt, or geographically restricted as expected under the assumption of a “domestication bottleneck” (see Sect. 7.2). The presence of peculiar haplotypes in an ancient emmer wheat sample from Egypt showed that possibly genetic diversity has been lost after emmer wheat domestication and dispersal to Egypt (Scott et al. 2019), in line with what has been found for other species, e.g. (Trucchi et al. 2021). In the case of wheat, more ancient samples are needed to determine the association (or lack of thereof) between domestication and losses of genetic diversity.

Second, it is unclear whether there is a monophyletic “centre of domestication” for emmer wheat in the Northern Levant. The contribution of the Southern Levant gene pool to domestic emmer has been detected in several studies, but its origin remains unsolved. Whether emmer was domesticated from a proto-domestic admixed population, or if early domestic populations benefited from extensive gene flow from the wild is still to be revealed. It has been proposed that the high genetic similarity of modern domestic to Turkish wild emmer could be explained by a feralization of the very first proto-domestic population (Civáň et al. 2013; Oliveira et al. 2020). The analysis of wild and domestic samples from this region dating back to Pre-Pottery Neolithic and Neolithic could help determine the origin of the domestic pool, and its relationships with ancient and extant wild populations.

The recent genetic identification of domesticated T. timopheevii has triggered a re-evaluation of its importance and abundance in the archaeological record. This effort will be greatly aided by a genetic survey of the modern wild specimens, together with ancient seeds. In general, it will be interesting to use ancient and modern genetic data to compare the origins in space and time of parallel domestication events in wheat (emmer wheat, einkorn wheat, and T. timopheevii).

Prospects for the analysis of DNA from fully charred remains are poor, which limits the direct genetic analysis to unveil some of the earliest and most crucial events in wheat domestication. Nevertheless, we expect that improvements in the modelling of genomic evolution and the increasing availability of waterlogged remains will allow to test alternative scenarios on top of addressing questions concerning adaptation and spread of wheat.

7.5.2 Open Questions in Dispersal and Adaptation

The dispersal of wheat was accompanied by adaptation to different environments, leading to the evolutionary success of this species. An interesting example is adaptation to altitude along certain dispersal routes. Wild emmer wheat from the Northern Levant, the closest to all domestic landraces, is always found at high altitude. Its dispersal towards Egypt entailed cultivation at sea level, but emmer wheat grown on the Ethiopian plateau is cultivated at high altitudes again. There are two possible routes of dispersal leading to Ethiopia, one through Africa and another through the Iranian plateau and the Arabian Peninsula. The first one would entail a second adaptation event to high altitudes. The other would have always been cultivated at high altitudes, but there would require a longer dispersal route. How did emmer wheat arrive to Ethiopia? The analysis of desiccated specimens from the Arabian Peninsula, Sudan, and ideally Iran could help to answer this question, as well as potentially unveiling genetic mechanisms for adaptation to high altitude.

7.5.3 Open Questions in Hybridization and Speciation

Archaeological data increasingly suggests that different wheat species were used in a complex geographical mosaic that shifted through time. Given that several wheat species, i.e. emmer, einkorn, naked wheats, and T. timopheevi (and wild relatives) co-existed in the same area for millennia, we can ask how much genetic exchange was ongoing in Neolithic settlements. While the vast majority of wheat cultivated today is bread wheat, other free-threshing hexaploids such as the Indian dwarf wheat or the Yunan wheat could have arisen from different hybridization events, since the phylogeny of the A and B genomes differs from that of the D genome (Zhou et al. 2020). Furthermore, forms such as T. compactum (Club Wheat) have been described (e.g. Kaplan et al. 1992), even though it is unclear whether these morphotypes are the product of different hybridizations events or the consequence of differential selective pressures. A comparison of the D subgenome in ancient hexaploids with modern Aegilops specimens could tackle this question and narrow down the geographic origin where these hybridizations occurred.

Even more intriguingly, we can speculate whether introgressed genetic variation between different wheats was important for crop evolution and adaptation to different environments such as adaptation to northern latitudes or to heat stress. Einkorn wheat and spelt were important crops in central and northern Europe. On the other hand, hexaploid free-threshing wheats such as Indian dwarf wheat and T. compactum are more commonly found in warm environments. Studying changes in allele frequencies with the spread of these crops into new environments would identify candidate adaptive regions, whose phenotypic effects and usefulness could be analysed through crossing and genetic mapping. Learning from the phylogenetic relationship between ancient wheat specimens would greatly increase the power to detect the genomic regions conferring adaptation to those traits.

Furthermore, besides the impact that archaegenomics has on our understanding of the past, it has also the potential to set the basis for future food security (Pont et al. 2019b), conservation and breeding strategies, in the current context of climate change (di Donato et al. 2018). During the dispersal of domestic plants, crops adapted to a multitude of environments, and aDNA can reveal genetic diversity present in historical landraces but lost from the modern domestic pool (e.g. Scott et al. 2019). Detecting signals of positive selection in such lost diversity may therefore be particularly valuable, especially when it is the source of adaptations to extreme environments. After its identification, such diversity can be prioritized for preservation or introduced to modern cultivars via breeding if still present in seed banks, landraces, or wild relatives (di Donato et al. 2018). Plant aDNA studies can lead to the identification of lost crops and their wild relatives, revealing their genetic makeup. Such knowledge could set the ground for de novo domestications and ultimately aid in the diversification of our food system, which currently relies on a rather small number of domestic species (Estrada et al. 2018). Finally, aDNA can be informative of past plant-pathogens interactions and their co-evolution, e.g. (Yoshida et al. 2013), providing valuable insights for crop management (di Donato et al. 2018; Estrada et al. 2018; Przelomska et al. 2020).

In conclusion, archaeogenomics allows interrogation of a plethora of questions about wheat evolutionary history, such as population continuity and demographic changes through time, identification of climatic or cultural conditions that correspond to germplasm shifts, and relationships with other wheats. We expect these questions to be addressed in future aDNA studies. Overall, answering these questions will not only bring a deeper understanding of wheat evolution, but will also aid answering questions about human cultural evolution and trade.