Introduction

Ancient genetics is a very broad research area which can be described as the exploration of DNA of more than 70 years old, whereas archaeogenetics can also include extrapolation from modern genetic data and comparison between modern and ancient DNA. Archeogenetics is used to study many different areas of our past, including animal, plant and pathogen evolution and domestication events, but one of the first ancient DNA (aDNA) research was conducted on humans [1]. Although much of this and other early aDNA work has been later shown to have been likely to be contamination, we have still learned much about the history of our health. With the tighter anti-contamination regulations of the 2000s [2] and the technological advances of the last decades [3, 4], the field has become both much more reliable and informative. Here, we shall review the different aspects of ancient human DNA research that is relevant to the furthering of our understanding of how we have evolved and adapted, both as a species and on the smaller population level, in relation to our environment and health.

Techniques and precautions in aDNA analysis

There are two main issues with aDNA research, degradation and contamination. DNA is damaged in living cells by natural and artificial processes, such as radiation, water (hydrolosis) and chemicals, in a living cell DNA can be repaired, or if not possible, the cell can undergo autolysis. Once a cell dies, the biological destruction of DNA starts, with the release of DNases amongst other enzymes. When an animal or plant dies, cells are put under pressure and so commit autolysis, and the DNA within is purposely damaged and left unprotected by the cell walls, from further damage.

This means that after 70 years, DNA is normally very fragmented and has many base-changes; the rate of degradation depends on many factors, such as temperature, pH and the flow of water through the remains. Therefore, frozen remains have much better preservation than remains from deserts, although the biological process of death will have still damaged the DNA in even permafrost samples.

This means that aDNA techniques need to be more sensitive and be optimised for much lower concentrations and shorter DNA fragment sizes than modern genetic techniques are. Generally, aDNA researchers concentrate on fragments under 150 bp in length, and amplify by PCR for 35–60 cycles rather than the 20–30 cycles used in conventional PCR, for example.

This increased sensitivity, alongside the sub-optimal quality of aDNA, means that modern contaminating DNA can be preferentially amplified, even when in very low concentrations. Therefore, there are very strict guidelines in ancient genetics research, especially with human DNA, in order to minimise and identify contamination. These include the extensive use of negative controls, the UV-irradiation and bleaching of equipment and consumables, separate (both physically and temporally) work-spaces for samples prior to and post amplification, the lack of positive controls in parallel and protective clothing. Even with these precautions, some contamination is inevitable, such as environmental contamination from the burial environment.

Traditionally, DNA was amplified using polymerase chain reaction, whereby the two stands were separated and small sequences of complimentary bases (primers, normally 18–25 bp long) attach to the single strands, and then a polymerase adds bases to the single strand in order to make double-stranded DNA, thereby theoretically doubling the amount of the target sequence every round. Primers have to be designed specifically for each target and, for ancient DNA, tend to amplify a target of between 70 and 200 bp. In order to look at chromosomal DNA and to identify contamination in mtDNA, the PCR individual DNA strands have to be separated by cloning. Here, the DNA is attached to a vector, which is then added to Escherichia coli. The vector is antibiotic resistant, so only the bacteria with a vector will grow, and each bacterial cell will duplicate both itself and the vector with the DNA sequence and therefore replicating enough copies to be sequenced. Sequencing works in a similar way to PCR, but with only one primer and the addition of modified bases that fluoresce and terminate the replication. The fluorescence is different for each base and can be read by a detector, in order, to give the sequence.

High-throughput sequencing (HTS, or next generation sequencing, NGS) works in a similar way, but rather than designing primers that attach to a region of interest, complimentary single-stranded DNA is added to the end of all the DNA molecules in the extract that will allow for universal primers to amplify everything. Strands are separated physically in the machine and the sequencing procedure is adapted to ensure that the DNA replication is not terminated, and sequencing by synthesis occurs, which require much fewer starting molecules.

In general, HTS is more technically challenging and more expensive and time consuming per sample, but yields much more information and is therefore much cheaper per sequence.

Ancient DNA is highly degraded with only very short fragments (less than 200 bp and generally less than 100 bp) surviving in very low concentrations, and so the more sensitive technology, including HTS, whereby all DNA present in a sample can be amplified and sequenced, including fragments as short as 35 bp can be sequenced, and computational improvements, so these large data can be analysed more easily. HTS data can be acquired as ‘shotgun’ sequences where everything present is sequenced, or the DNA can be refined before, either by sequencing PCR products or by isolating the DNA of interest by attaching it to probes of the required sequences and removing the rest of the material.

Early evolution from a health perspective

Researchers use aDNA to better understand different aspects of our past. Here, we shall concentrate on those aspects that reflect on how we arrived to where we are now in regards to our health.

One of the most interesting questions that aDNA has been able to answer is ‘were Homo sapiens and Homo neanderthalis separate species?’. The initial ancient data that of mitochondrial DNA which is only inherited from the maternal line indicated that Neanderthals had no known descendants in modern humans by showing that the haplogroup split predates the modern human common ancestor and that no Neanderthal type haplogroup was recovered from modern humans [59]. This data was acquired using traditional PCR methods; this process only allows for a specific known target to be amplified and sequenced from an ancient source. Therefore, comparing a 600-bp region of the mitochondrial genome in modern and ancient samples requires multiple amplifications and is very time and cost consuming.

The development of high-throughput sequencing techniques meant that more health informative chromosomal DNA could be recovered, including the discovery of bitter taste receptors [10] which are important to ensure that poisonous plants are not accidentally consumed, and a unique allele that appears to code for lighter hair, which indicates that maybe paler skin tones also evolved in response to the lower UV levels of the much more northern range [11] allowing for more vitamin DNA to be produced. Vitamin D is an essential vitamin for a health immune system, and deficiency is associated, not only with bone problems such as rickets but also with depression [12].

By 2006, HTS had allowed for nearly entire Neanderthal genomes to be recovered. This early work showed that humans and Neanderthals split approximately 500 k years ago [13, 14], and it was still difficult to see any evidence of integression with modern humans.

Today, however, scientists have compared the increasing numbers of ancient hominid genomes (including three H. denivosian genomes, a separate species only known through aDNA research) [1517]; with modern genomes from around the globe, it could be seen that some integration across the three species had occurred in those Homo sapiens that had left Africa. The evidence for this integration included alleles that are specifically important for health and environmental adaptation. For example, the high/altitude alleles found in the Himalayan population can also been seen in Denisovan DNA, indicating that the integression was advantageous to sapiens in expanding their range [18]. Also, less advantageously, the discovery of viral insert sequences common in all three species that may be correlated with increased risk of cancer shows that this genetic exchange is a double-edged sword [19].

With the sequencing of the Homo sp. from Sima los Huesos, Spain [20, 21], which appears to be basal to the Neanderthal and Denisovian split, we should be able to understand how our genus has genetically adapted to different environments even further and where we have exchanged these adaptations to the advantage, and disadvantage, of our health today.

Micro-evolution in modern humans

There are a number of genetic adaptions that have (as far as we know) only occurred in modern human populations in reaction to different environmental factors.

For example, the first ancient African genome shows evidence that people living in the Ethiopian highlands had by already, independently, adjusted to the high altitude by 4500 years ago [22]. This adaptation is associated with being less effected by the lower oxygen levels that are present in higher altitudes. The fact that humans already had adapted to living in this extreme environment so early shows a degree of plasticity in our genome which indicates that humans can relatively quickly adapt to other extreme environments. This is a great example of how modern genetics and ancient genetics can work together to inform on how our ancestors have evolved to remain healthy in different ecological niches and of why aDNA is essential to increase our understanding of the timing of these adaptations.

In addition to moving to new ecological niches, we have adapted to niches we have formed ourselves, for example, the genetic adaption to post-weaning milk consumption. This is very much tied to after the Neolithic revolution and domesticated animals. Numerous studies have shown that in populations with a history of pastoralism, genetic changes have occurred independently that allow for the enzyme lactase, which breaks down the carbohydrate lactose, found nearly exclusively in milk [23, 24]. This adaption allows for milk to be consumed in greater quantities without processing (e.g. making into hard cheeses, whereby the carbohydrate is degraded by bacteria during the cheese making process) without the digestive problems that can cause complications, especially in individuals who are already suffering from malnutrition. This allows for essential nutrients and liquids to be obtained from the consumption of milk during times of shortage.

Ancient genetic surveys have found that the European lactase persistence allele, 13910T, was absent throughout much of the Neolithic (e.g. [25]), only starting to increase in the late-Neolithic and Bronze age (e.g. [26]), before being stable at the high modern frequency in the Medieval period (e.g. [27]). This indicates that milk consumption was a great evolutionary advantage in our recent past, whether this is due to the calorific value, the nutritional advantages (e.g. increased vitamin D in the UV poor northern regions), being a safe liquid source, or a combination of all three is still debated; however, allele frequency information from the Iron Age is needed to better understand the timing and selective advantage of this shift.

Disease susceptibility

Susceptibility to disease is multifactorial, with many genes playing a role as well as environmental factors. In the past, very few genetic alleles known to play a role in genetic diseases or susceptibility to pathogenic diseases have been studied. This is mainly due to the limitations on traditional aDNA methods meaning only a small area of the genome can be studied at one time.

For the rheumatic disease ankylosing spondylitis (AS), one genetic allele (HLA B27) is known to be associated with a majority of suffers (c 90 %); this allele has been extracted from skeletons showing signs of the disease at least twice [28, 29]. However, this is not extremely informative as in both cases, a secure diagnosis was achieved through traditional osteological techniques. This allele could be investigated in skeletons with no physical manifestations of the disease; however, testing positive would not be diagnostic of AS as the allele is present in a significant proportion (8 %) of non-AS patients. The fact that this allele and other alleles are also associated with multiple immunological and rheumatic diseases is also an issue for ancient geneticists examining disease-associated alleles, as no true diagnosis can be achieved. In addition, many such pathologies are associated with too many alleles to be easily studied.

However, the MHC/HLA region of the 6th chromosome contains many genes that are responsible for our innate immunity, and thus, many alleles that are associated with pathogenic disease susceptibility and immunological diseases. With the advances in HTS and with capture technology this four-million-base-pair region of the human genome can be studied much more effectively in ancient humans. In the near future, it is highly likely that a number of studies will be published on this area, although understanding the complex nature of the recent adaptation in this area is reliant on modern medical genetics taking the lead and confirming the selective advantages and disadvantages of different alleles in modern humans, as much of our information is from genome wide association studies which only indicate an statistical link between pathologies and genetic alleles and do not explain such a link.

One human genetic allele that we are very confident about having a link to susceptibility is the CCR5∆32 allele. The CCR5 gene encodes a 352 amino acid 7-transmembrane G-protein-coupled chemokine receptor expressed on the surface of macrophages and monocytes. The CCR5∆32 allele is a 32-bp deletion in this gene that results in the protein not being expressed properly, and can therefore no longer assist in the transfer of items through the cell wall. People who have both copies of the ∆32 allele do show a great deal of immunity to HIV infection. What is interesting, however, is that this allele is found in significantly high proportions in northern Europeans (frequency up to c 15 % [30]). Obviously, this is not due to adapting to HIV as this is a recent disease. Early calculations indicated that this allele was around 700 years old [31], and so it was associated with an adaptation to the Black Death pandemic of that period. However, there is no evidence that CCR5 deficiency protects against Yersinia sp. infection [32]. Ancient DNA studies have also shown that there is no significant difference in the frequency of the CCR5∆32 allele in individuals known to have died in the Black Death pandemic and those who died due to other reasons (most likely famine) in the same century [33]. In addition, the allele has been shown to be present in skeletons from much earlier time periods indicating that it is either a selective advantage to an older disease or the lucky result of genetic drift [34]. CCR5∆32 does not appear to be very disadvantageous to modern populations, except leading to a possible increase in West Nile Virus infection [35].

There are other genetic alleles that are known to be strongly associated with infectious diseases, and probably, the most famous examples are those associated with malarial resistance. Malaria is spread by mosquitoes and therefore is more associated with warm humid areas, which are optimal for the mosquito’s life cycle. In areas that have a history of endemic malaria, there are higher frequencies of genetic anaemia’s (i.e. thalassemia and sickle-cell anaemia) and glucose-6-dehydroganase (G6PD)-deficiency, which not only compromises the health of the individual but also protects against malarial infection. Thalassemia and sickle-cell anaemia are caused mutations in the α- and β-globin genes located on the 16th chromosome, and are generally recessive traits, i.e. the diseases only manifest when both copies of the gene are present; however, heterozygosity appears to protect against malarial infection. The G6PD gene is carried on the X-chromosome and again is generally recessive, and heterozygous individuals are at lower risk of malarial infection; however, males only have one copy of the X-chromosome and so the disease is more commonly expressed in these hemizygous individuals.

The clear correlation between malaria and these genetic diseases means that these are some of the earliest and most commonly researched evolutionary-medicine-related mutations in ancient DNA. The first reported direct genetic evidence of β-thalassemia was from an 8-year-old child dated to around 12,000 years ago [36]. Althoughs this was just a single individual, this allowed for the expansion of this area of research and since then many publications have attempted to calculate the frequency of these mutations in historic and pre-historic Mediterranean populations, with little success (e.g. [37, 38]), probably due more to the low frequency of these alleles, rather than them not being present at all.

Ancient pathogen DNA

In addition to investigating adaptation to diseases, archaeogeneticists also research the pathogens themselves. Initially, there was speculation as to whether ancient pathogen DNA would survive in the archaeological record but the first ancient pathogen DNA paper was published in 1993 [39] and described amplification of Mycobacterium tuberculosis Complex (MTBC) DNA. Early MTBC work concentrated on a multi-copy insertion sequences (IS6110 and IS1081), which could not distinguish between different strains of the group, but by the beginning of the millennium typing to determine the strain (M. tuberculosis the ‘human form’ vs. M. bovis the ‘cattle form’) and lineage were being performed (e.g. [40, 41]). The advent of HTS has allowed for the pathogen to be much more deeply investigated, Bouwman et al. [42] typed an historic strain of MTBC using a combination of traditional and HTS techniques, and entire genomes have been recovered which have clarified the phylogenetic history of the complex. We know now, for instance, that in Pre-Columbian South America that individuals with tuberculosis had a strain more commonly found in seals than in humans today [43].

Mycobacterium leprae (the causative agent of leprosy) has also been extensively studied using both conventional methods (e.g. [44, 45] to show that the pathogen is remarkably little changed in the last 1000 years.

It is thought that the Mycobacterium species survive remarkably well in the archaeological record due to a lipid rich cell wall that protects the pathogen from external degradation. Other pathogens survive less well in the archaeological record, for example, only very few papers on Plasmodium sp. (causing malaria, e.g. [46, 47]) or Treponemal sp. (causing syphilis, e.g. [48]) have been published; it is thought that DNA from these pathogens is harder to obtain due to the weaker cell walls and the pathogen load at the time of death (see Bouwman et al. [49] and von Hunnius et al. [50] for an explanation of the difficulties experienced with ancient Treponemal sp. DNA). However, with the advent of HTS, shorter, more damaged and less concentrated DNA can be extracted and so it is highly likely that ancient and historic malaria and syphilis genomes will shortly be published.

The plague bacteria is a great example of how HTS has rapidly increased our abilities to isolate ancient pathogen DNA. The first recorded Yersinia pestis DNA from archaeological remains was recovered from French plague victims from the sixteenth and eighteenth century using conventional PCR on dental pulp [51]. This work was controversial at the time, not least because one of the PCR fragments was around 300 bp long. However, the same group further refined their results using ‘suicide PCR’, whereby primer sets are used only once to minimise contamination, on sample from the fourteenth century pandemic [52]. Yet still controversy continued, with two other groups being unable to reproduce the data from the same (and additional) archaeological and protocols and showing that environmental bacteria can obscure the target DNA [53]. At the same time, the original group were extending their research scope to the first (known) pandemic of plague (Justinian Plague) and finding that there was little change in the DNA since the end of the sixth century AD [54, 55], and different teams were detecting Y. pestis DNA in archaeological remains from different parts of Europe [5658].

In 2011, the first HTS of Y. pestis was reported, looking at the pPCP1 virulence associated plasmid [59], this was the first ancient pathogen genome recovered (albeit from a plasmid not the main genome) and allowed for the flowering of ancient pathogen genomics.

Since then, multiple ancient genome papers have shown that the bacterial DNA has changed rather little in the last 1500 years (e.g. [6062]). More importantly, an examination of Bronze Age genomes has found evidence of an earlier, less virulent for Y. pestis; this ancient strain, found in six individuals from a wide range of Eurasian sites, was lacking in the ymt gene which allows the bacteria to survive in the guts of their present day vector, the flea until around 1000BC. This indicates that this devastating disease has been with us more millennia, but has increased its virulence by way of transmission possibilities, as before this gene was acquired, bubonic plague would be very unlikely [63].

Future of archaeogenetics in evolutionary medicine

Much of our understanding of human adaptation to pathogens and environment is based on modern data; this is primarily due to the importance of large sample sizes. Ancient geneticists are unlikely to ever be able to type thousands of individuals from one population; however, it is possible to track known variations in the past using ancient genetics, and it is possible to be able to study more and more data, for example, the deep sequencing of 101 Bronze Age humans allows us to identify that humans were well on the way to adapting to the consumption of unprocessed milk. In addition, we can compare host and pathogen genomes from the same individuals in the past, for example, these same data were also screened for the Plague causing bacterium Y. pestis.

With the increase in both the understanding of modern medical genetics and the ability to deep sequence ancient genetic information, the field of archaeogenetic evolutionary medicine is blossoming (see Fig. 1). Ancient DNA research is an essential tool to understanding how we evolved as a genus and how different species’ adaptations have shaped modern human’s health. It is also important in order to be able to better date when humans have adapted to specific aspects of modern life, diet and disease pressures. We will shortly understand the genetic past of ourselves in much more detail that we could have imagined understanding modern human genes 20 years ago.

Fig. 1
figure 1

Schematic graph of articles with ‘Ancient DNA’ ‘archaeogenetic’, ‘archaeogenomic’, ‘paleogenetic’ or ‘paleogenomic’ in the title or abstract according to PubMed database. x number of publications, y year of publication, a first aDNA article, b first report on ancient pathogen DNA, c first report on ancient Neanderthal DNA, d first report on ancient real-time PCR, e first report on ancient high-throughput sequencing, f first report of a Neanderthal genome, g first report of an ancient pathogen genome