Introduction

Intense research efforts have focused on identifying rare and common variants that increase disease risk in humans, for both rare and common diseases. Several, non-mutually exclusive models have been proposed to explain the functional properties of such variants and their contributions to pathological conditions, and this topic has been reviewed elsewhere [110]. These studies implicated multiple variants in disease susceptibility, but the relative importance of rare and common variants in phenotypic diversity, both benign and disease-related, has yet to be explored in detail [11]. We can use an evolutionary approach to tackle this question, as population genetics models can predict the allelic architecture of disease susceptibility [12, 13]. They are able to do so because rare and common disease-risk alleles are a subset of global human genetic diversity, and their occurrence, frequency, and population distribution is governed by evolutionary forces, such as mutation, genetic drift (e.g., migration, admixture, and changes in population size), and natural selection.

The plethora of genetic information generated in the last ten years, thanks largely to the publication of sequencing datasets for both modern human populations and ancient DNA samples [1418], is making it possible to reconstruct the genetic history of our species, and to define the parameters characterizing human demographic history: expansion out of Africa, the loss of genetic diversity with increasing distance from Africa (i.e., the “serial founder effect”), demographic expansions over different time scales, and admixture with ancient hominins [1621]. These studies are also revealing the extent to which selection has acted on the human genome, providing insight into the way in which selection removes deleterious variation and the potential of human populations to adapt to the broad range of climatic, nutritional, and pathogenic environments they have occupied [2228]. It has thus become essential to dissect the role of selection, in its diverse forms and intensities, in shaping the patterns of population genetic diversity (Fig. 1a), not only to improve our understanding of human evolutionary history, but also to obtain insight into phenotypic diversity and differences in the risk of developing rare and common diseases [12, 13, 24, 2932].

Fig. 1
figure 1

Modes in which selection or admixture can remove, maintain, or increase genetic diversity. a Schematic representation of the different types of natural selection. Purifying selection removes deleterious alleles (in black) from the population, and genes evolving under strong purifying selection are usually associated with rare, severe disorders. Conversely, mutations conferring a selective advantage (e.g., increased resistance to complex infectious disease) can increase in frequency in the population, or be maintained, through different forms of positive and balancing selection. Positive selection is represented here by the classic hard-sweep model where, following an environmental change, a newly arisen advantageous mutation or a mutation at very low frequency (in red) will be immediately targeted by positive selection and will ultimately reach fixation. Balancing selection is illustrated here by the case of heterozygote advantage (or overdominance), where the presence of heterozygotes (in blue) is favored in the population. b Long-term balancing selection. Advantageous genetic diversity can be maintained over long periods of time and survive speciation, resulting in “trans-species polymorphism” (represented by black and red arrows). In this example, a trans-species polymorphism that is present in the modern European population (where it has survived the known bottleneck out of Africa) is shared with other primates, such as chimpanzees and gorillas. c Modern humans can also acquire genetic diversity (whether advantageous or not) through admixture with other hominins, such as Neanderthals or Denisovans (Box 2). The green and blue arrows represent the direction and estimated magnitude of admixture between modern humans and Neanderthals and Denisovans, respectively (see [17])

The removal of mutations deleterious to human health

Studies of the occurrence, frequency, and population distribution of deleterious mutations are of fundamental importance if we are to understand the genetic architecture of human disease. Theoretical and empirical population genetics studies have shown that most new mutations resulting in amino acid substitutions (non-synonymous) are rapidly culled from the population through purifying selection (Fig. 1a) [33, 34]. Indeed, the small number of non-synonymous variants observed relative to the rate of non-synonymous mutation indicates that most non-synonymous mutations are lethal or highly deleterious, strongly compromising the reproductive success of their carriers [3436]. Purifying selection—the most common form of selection—refers to the selective removal of alleles that are deleterious, such as those associated with severe Mendelian disorders, or their maintenance at low population frequencies (i.e., mutation–selection balance) [32, 37]. The efficacy of purifying selection for eliminating deleterious mutations from a population depends not only on the selection coefficient(s), but also on population size (N), which determines the magnitude of genetic drift. Unlike highly deleterious mutations, variants subject to weaker selection (i.e., weakly deleterious mutations) behave like “nearly neutral mutations”; they may, therefore, reach relatively high population frequencies [3840]. In large outbred populations, with low levels of drift, deleterious mutations will eventually be eliminated. By contrast, in small populations, deleterious mutations behave very much like neutral mutations and may be subject to strong drift, resulting in moderate-to-high frequencies, or even fixation [39].

Rare variants are widespread in the human genome

Recent deep-sequencing studies are showing a surprisingly high proportion of rare and low-frequency variants in different human populations [14, 15, 4147]. The Exome Variant Server, for example, reports frequency information from 6515 exomes of individuals of African American and European American ancestry [46]. The most recent release of the 1000 Genomes Project, based on full-genome information for 2504 individuals from 26 populations from around the world, revealed that there was a large number of rare variants in the global dataset (~64 million autosomal variants have a frequency <0.5 %, and only ~8 million have a frequency >5 %), with each individual genome harboring between 40,000 and 200,000 rare variants [15]. A more recent report of high-quality exome data from 60,706 individuals of diverse geographic ancestry, generated as part of the Exome Aggregation Consortium (ExAC), has provided unprecedented resolution for the analysis of low-frequency variants as well as an invaluable resource for the clinical interpretation of genetic variants observed in disease patients [47].

The contribution of rare variants to human disease is a matter of considerable debate, together with the distribution of these variants in the population, as they may underlie early-onset disease and increase susceptibility to common diseases [1, 44, 45, 4850]. Most rare variants are private to a population, whereas common variants tend to be shared by different populations [51]. Rare variants, particularly those specific to a particular population, tend to have stronger deleterious effects than common variants [42, 52, 53]. Consequently, as shown by population genetics studies, most variants with large functional effects tend to be rare and private, and only a small proportion of variants with large effects are common to different populations. Genome-wide association studies (GWAS), which focus on common variants, have been only moderately successful in explaining the genetic basis of complex diseases [3]. Furthermore, theoretical studies have shown that a large proportion of the so-called “missing heritability” is explained by rare variants, particularly those that affect fitness as well as causing disease [54].

The increasing amount of sequence-based datasets available, both in basic and medically oriented research, is accelerating the investigation into the contribution of rare variants to disease susceptibility. In this context, diverse variant annotation tools and predictive algorithms have been developed to systematically evaluate the potential functional impacts of genetic variants (e.g., PolyPhen, SIFT, and GERP) [5557], helping to prioritize the study of putative causal variants in further detail. These methods, which use different statistics and types of information, generally assess the “deleteriousness” of each genetic variant by considering different measures, such as evolutionary conservation scores, changes in amino acid sequence, or potential effect on protein function and structure [58]. Novel methods are increasingly being developed, providing improved power and resolution. For example, CADD, which integrates both evolutionary and functional importance, generates a single prediction from multiple annotation sources, including other variant effect predictors [59]. Likewise, MSC provides gene-level and gene-specific phenotypic impact cutoff values to improve the use of existing variant-level methods [60].

Quantification of the burden of deleterious, mostly rare, variants across human populations and an understanding of the ways in which this burden has been shaped by demographic history are now key issues in medical research, because they could help to optimize population sampling and, ultimately, to identify disease risk variants.

Expansion out of Africa and the patterns of rare, deleterious variants

The sizes of human populations have changed radically over the last 100,000 years, due to range expansions, bottlenecks, and rapid growth over different timescales [1821]. Several studies have evaluated the impact of such demographic events on the distribution of deleterious variants and have shown that populations that have experienced bottlenecks, such as non-Africans, have higher proportions of deleterious variants of essential genes than African populations. This pattern has been interpreted as resulting from weaker purifying selection due to the Out-of-Africa bottleneck [45, 52, 61]. Nevertheless, an absolute increase in the number of rare functional variants has been observed in populations of African and European descent, relative to neutral expectations, due to the combined effects of an explosive expansion over recent millennia and weak purifying selection [4146]. Furthermore, ~85 % of known deleterious variants appear to have arisen during the last 5000 to 10,000 years, and these variants are enriched in mutations with a (relatively) large effect as there has not yet been sufficient time for selection to eliminate them from the population [46]. Furthermore, deleterious mutations in Europeans appear to have occurred after those in Africans (~3000 vs. 6200 years ago, respectively) [46], highlighting the effects of demographic history on the distribution of deleterious variants within the population.

However, some studies have suggested that demographic history may have a less straightforward impact on the mean burden of deleterious variants [6264]. Simons and coworkers concluded that individual mutation load is insensitive to recent population history [64], and Do and coworkers suggested that selection is equally effective across human populations [62]. Several factors underlie these apparently conflicting conclusions, including differences in the choice of statistics and the features of genetic variation used to assess the burden of deleterious variation, and differences in the choice of predictive algorithms for defining deleteriousness, together with differences in the interpretations of the results; these factors have been reviewed in detail elsewhere [22, 65]. Nevertheless, all these studies converge to suggest that demographic history affects deleterious and neutral variants differently (Fig. 2), and that mutation and drift have stronger effects on the frequency of weakly deleterious mutations in bottlenecked populations than in large, expanding populations.

Fig. 2
figure 2

Demographic history affects the proportion of deleterious variants in the human population. The proportion of deleterious variants currently segregating in the population can vary depending on the past demographic regime of each population. Under a regime of demographic expansions alone, populations display higher levels of genetic diversity (in total absolute counts) and lower proportions of deleterious variants (in brown) than under regimes in which populations have experienced bottlenecks or recent founder events, where the opposite patterns are observed. The schematic demographic models presented here illustrate the broad demographic history of some modern human populations (e.g., Africans, Europeans, and French Canadians), but they do not attempt to capture their precise changes in population size over time

Founder effects and bottlenecks increase the burden of deleterious variation

Besides the impact of long-term population demographics (i.e., African vs. non-African populations) on the distribution of deleterious variants, a few studies have evaluated the effects of more recent, or stronger, changes in population demography. For example, it has been shown that French Canadians have both lower levels of diversity and a larger proportion of deleterious variants than the present-day French population. These findings highlight how a recent major change in population demographics (i.e., a small founder population of ~8500 French settlers subsequently growing by about 700-fold to attain its present size) can profoundly affect the population’s genetic landscape within as little as 400 years [66]. Likewise, the Finnish population, which experienced a recent population bottleneck estimated to have occurred ~4000 years ago, has larger proportions of rare deleterious alleles, including loss-of-function variants and complete gene knockouts, than other populations in Europe or of European descent [67].

Henn and coworkers investigated the consequences of a serial founder effect model for the distribution of deleterious mutations using a set of African populations and several groups located at different geographic distances from Africa [68]. Using explicit demographic models and considering different selection coefficients and dominance parameters, they found that non-African individuals carried larger proportions of deleterious alleles, mostly of modest effect, than African individuals, and that the number of homozygous deleterious genotypes carried by individuals increased with distance from Africa [68]. These results highlight the interaction between drift and purifying selection by showing that deleterious alleles previously maintained at low frequencies by purifying selection may have surfed to higher frequencies in populations at the edge of the wave expanding out of Africa, due to stronger drift [53, 68, 69]. Together, these studies suggest that demographic history has played a central role in shaping differences in the genetic architecture of disease between human populations through its effects on the frequency of deleterious alleles [64, 70].

Favoring advantageous variants to increase adaptation

Besides the interplay between drift and selection to remove deleterious mutations, other de novo or already existing variants can be advantageous and can increase in population frequency through various forms of positive and balancing selection [2328, 71, 72]. Humans occupy diverse habitats and have gone through many different cultural and technological transitions; human populations have had to adapt to such shifts in habitat and mode of subsistence [25]. Dissecting the legacy of past genetic adaptation is thus key to identifying the regions of the genome underlying the broad morphological and physiological diversity observed across populations, and to increasing our understanding of the genetic architecture of adaptive phenotypes in health and disease.

Positive selection targets mendelian and complex traits

Positive selection can manifest in different guises: from the classic, hard-sweep model, in which a new mutation can confer an immediate fitness benefit (Fig. 1a), to alternative models of genetic adaptation, such as selection on standing variation or polygenic adaptation [73, 74], with each type of selection leaving a specific molecular signature in the targeted region (reviewed in [23, 26]). Most studies have focused on signals of positive selection according to the hard-sweep model, providing insight into the nature of adaptive phenotypes (see [23, 24, 26, 29, 31, 72, 7577] and references therein). These phenotypes range from Mendelian traits (or almost so)—including the largely supported lactase persistence trait in various populations [7882] and traits relating to infectious disease resistance (e.g.,G6PD, DARC, FUT2) in particular (reviewed in [76])—to complex traits, such as skin pigmentation [8386], adaptation to climate variables or high altitude [8793], and the immune response and host–pathogen interactions [24, 29, 31, 77, 94107]. These examples reveal the potent selective pressures that have been exerted by nutritional resources, climatic conditions, and infectious agents since humans first began to spread over the globe [29, 31, 72, 77, 96, 108].

Many selection signals were detected by candidate-gene approaches, based on a priori choices of the genes and functions to be investigated. However, a large number of genome-wide scans for positive selection have identified several hundred genomic regions displaying selection signals, consistent with the likely presence in these regions of beneficial, functional variants [28, 37, 109124]. For example, Grossman and coworkers identified about 400 candidate regions subject to selection, using whole-genome sequencing data from the 1000 Genomes Project [28]. These regions either contain genes involved in skin pigmentation, metabolism and infectious disease resistance, or overlap with elements involved in regulatory functions, such as long intergenic noncoding RNAs and expression quantitative trait loci (eQTL). The presence of non-synonymous variants in less than 10 % of the candidate-selected regions suggests that regulatory variation has played a predominant role in recent human adaptation and phenotypic variation [28], as previously suggested [125128].

The large number of studies searching for selection signals contrasts with the much smaller number of studies trying to determine when selection effects occurred [83, 129, 130]. Nevertheless, such studies could identify specific time periods corresponding to abrupt changes in environmental pressures. Studies aiming to date the lactase persistence allele in Europe have suggested that this allele was selected in farmers some 6000 to 11,000 years ago [79, 81, 95, 129, 130], although estimates based on ancient DNA point to a more recent time [131, 132] (see below). A recent study, using an approximate Bayesian computation framework, found that skin pigmentation alleles were generally much older than alleles involved in autoimmune disease risk, whose ages are consistent with selection during the spread of agriculture [129]. A report suggesting that many selective events targeting innate immunity genes have occurred in the last 6000 to 13,000 years [95] provides additional support for the notion that the adoption of agriculture and animal domestication modified human exposure to pathogens, leading to genetic adaptations of immune response functions.

Selection studies have thus increased our knowledge of the nature of several adaptive phenotypes at different timescales (Box 1), but the relative importance of selection according to the classic sweep model remains unclear. Several studies have reported the prevalence of classic sweeps for human adaptation to be non-negligible [28, 109113, 115118, 122], whereas others have suggested that such sweeps are rare and that the corresponding signals probably result from background selection [74, 93, 123, 124]. There is also increasing evidence to suggest that other, largely undetected forms of genetic adaptation, such as selection on standing variation, polygenic adaptation, and adaptive introgression [73, 74], may have occurred more frequently in the course of human evolution than previously thought (see for example [108, 130, 133135]).

Maintaining diversity through balancing selection

Balancing selection can preserve functional diversity, through heterozygote advantage (or overdominance; Fig. 1a), frequency-dependent selection, advantageous diversity fluctuating over time and space in specific populations or species, and pleiotropy [27, 136, 137]. Unlike other forms of selection, balancing selection can maintain functional diversity over periods of millions of years because selection conditions remain constant over time and are strong enough to avoid the loss of selected polymorphisms due to drift. In some cases, polymorphisms subject to balancing selection can persist during speciation events, resulting in trans-species polymorphism (long-term balancing selection; Fig. 1b). In other cases, balancing selection may occur only in particular species or populations, owing to specific environmental pressures (see [27, 136] and references therein). Until a few years ago, evidence for the action of balancing selection was restricted to a few loci, including the sickle cell hemoglobin polymorphism (HbS), which protects against malaria in the heterozygous state [138], and several genes of the major histocompatibility complex (MHC, or HLA in humans), which presents intracellular peptides to cells involved in immune surveillance and triggers immune responses against diverse pathogens [139141].

Recent studies, bolstered by the whole-genome sequence data published for humans and other species, have suggested that balancing selection is more prevalent than previously thought (see [27] for a review). Several studies searching for the occurrence of trans-species polymorphism have shown that advantageous variants in the human population may have been inherited from distant ancestral species [142145]. For example, functional diversity in ABO blood group has been maintained across primates for millions of years, probably due to host–pathogen coevolution [142]. Likewise, a scan of long-term balancing selection in the genomes of humans and chimpanzees has detected 125 regions containing trans-species polymorphisms, principally in genes involved in immune function, such as IGFBP7 and membrane glycoprotein genes; these findings suggest that there has long been functional variation in response to pressures exerted by pathogens in these species [144]. Other studies have searched for balancing selection within humans through the use of genome-wide approaches or by focusing on particular gene families. Selection signatures have been detected in multiple regions, including the KIR gene regions (KIR genes are known to co-evolve with their HLA ligands [146]), and regions encoding various molecules involved in cell migration, host defense, or innate immunity [146155]. These studies indicate that, despite its low occurrence, balancing selection has maintained functional diversity at genes involved in functions relating to the immune response, as observed for other types of selection [24, 29, 31, 77, 103].

Tracking selection signatures from ancient DNA data

Population genetics methods can be used to estimate the approximate age and selection coefficient of adaptive mutations from data from modern human populations, with various degrees of confidence. However, the use of ancient human samples from different time periods is making it possible to determine how rapidly the frequency of adaptive mutations has increased in populations. Until a few years ago, ancient DNA data were available only for single individuals or specimens, limiting the analysis to questions of comparative genomics. We learned a great deal about the degree of admixture between modern humans and ancient hominins, such as Neanderthals and Denisovans, a topic that has been reviewed elsewhere [16, 17, 156158]. These studies have also revealed the existence of advantageous “archaic” variants in the genomes of modern humans [16, 158]. These variants, which were acquired through admixture with archaic humans, have improved adaptation and survival in modern humans (Fig. 1c, Box 2).

However, much less is known about genetic diversity levels in populations of modern humans from different eras, such as the Paleolithic and Neolithic periods. Deep sequencing is making it possible to sequence multiple samples per species or population, opening up new possibilities for the analysis of ancient DNA data within a population genetics framework (see [156] for a review). For example, in one recent study, 230 human samples from West Eurasia dating from between 8500 and 2300 years ago were sequenced [132]. The authors searched for abrupt changes in allele frequencies over time across the genome. They identified 12 loci containing variants with frequencies that rapidly increased over time, consistent with positive selection. The lactase persistence variant yielded one of the strongest signals and appeared to have reached appreciable frequencies in Europe only recently (less than 4000 years ago), as previously suggested [131]. The other strong signals identified were either directly or indirectly related to diet, corresponding to genes encoding proteins involved in fatty acid metabolism, vitamin D levels, and celiac disease, or corresponded to genes involved in skin pigmentation [132]. Interestingly, the authors also detected strong selection signals in immunity-related genes, such as the TLR1–TLR6–TLR10 gene cluster, which is essential for the induction of inflammatory responses and is associated with susceptibility to infectious diseases [159, 160]. Thus, ancient DNA studies can help us to understand the mode of selection following changes in human lifestyle, and the extent to which such selective events increased the frequency of functional alleles associated with specific traits or disease conditions [131, 132, 161, 162].

Insight into rare and common diseases from natural selection

Genes associated with Mendelian or complex diseases would be expected to be subject to unequal selective pressures. We can therefore use selection signatures to predict the involvement of genes in human disease [11, 12, 32, 37, 115, 163]. Mendelian disorders are typically severe, compromising survival and reproduction, and are caused by highly penetrant, rare deleterious mutations. Mendelian disease genes should therefore fit the mutation–selection balance model, with an equilibrium between the rate of mutation and the rate of risk allele removal by purifying selection [12]. The use of population genetics models is less straightforward when it comes to predicting the genes involved in complex disease risk. Models of adaptive evolution based on positive or balancing selection apply to a few Mendelian traits or disorders, most notably, but not exclusively, those related to malaria resistance (reviewed in [76, 98]). However, the complex patterns of inheritance observed for common diseases, including incomplete penetrance, late onset and gene-by-environment interactions, make it more difficult to decipher the connection between disease risk and fitness [12].

Purifying selection, rare variants, and severe disorders

According to population genetics theory, strongly deleterious mutations are rapidly removed from the population by purifying selection, whereas mildly deleterious mutations generally remain present, albeit at low frequencies, depending on population sizes and fitness effects. Genome-wide studies are providing increasing amounts of support for these predictions, as “essential” genes—identified as such on the basis of association with Mendelian diseases or experimental evidence from model organisms—are enriched in signs of purifying selection [32, 37, 115, 164]. Purifying selection has also been shown to be widespread in regulatory variation, acting against variants with large effects on transcription, conserved noncoding regions of the genome, and genes that are central in regulatory and protein–protein interaction networks [8, 10, 165171].

Mutations associated with Mendelian diseases or with deleterious effects on the phenotype of the organism are generally rare and display familial segregation, but such mutations may also be restricted to specific populations [11]. This restriction, in some cases, may be due to a selective advantage provided by the disease risk allele (e.g., the sickle cell allele in populations exposed to malaria [98]), but it mostly reflects a departure from the mutation–selection balance. Small population sizes or specific demographic events may randomly increase the frequency of some disease risk alleles, because too little time has elapsed for purifying selection to remove them from the population, as observed in French Canadians, Ashkenazi Jews, or Finns [11, 66, 67].

According to these principles of population genetics, searches for genes or functional elements evolving under strong purifying selection can be used to identify the genes of major relevance for survival, mutations of which are likely to impair function and lead to severe clinical phenotypes. In this context, the immune response and host defense functions appear to be the prime targets of purifying selection [37, 95, 102]. For example, a recent study based on whole-genome sequences from the 1000 Genomes Project estimated the degree to which purifying selection acted on ~1500 innate immunity genes. The genes of this class, taken as a whole, were found to have evolved under globally stronger purifying selection than the rest of the protein-coding genome [95]. This study also assessed the strength of selective constraints in the different innate immunity modules, organizing these constraints into a hierarchy of biological relevance, and providing information about the degree to which the corresponding genes were essential or redundant [95].

Population genetics has also facilitated the identification of immune system genes and signaling pathways that fulfill essential, non-redundant functions in host defense, variants of which are associated with severe, life-threatening infectious diseases (for examples, see [94, 95, 101, 106], and for reviews [29, 103, 172, 173]). This is well illustrated by the cases of STAT1 and TRAF3; they belong to the 1 % of genes presenting the strongest signals of purifying selection at the genome-wide level [95], and mutations in these genes have been associated with severe viral and bacterial diseases, Mendelian susceptibility to mycobacterial disease, and herpes simplex virus 1 encephalitis [174, 175]. Using the paradigm of immunity and infectious disease risk, these studies highlight the value of population genetics as a complement to clinical and epidemiological genetic studies, for determining the biological relevance of human genes in natura and in predicting their involvement in human disease [29, 103, 173, 176].

Genetic adaptation, common variants, and complex disease

The relationship between selection and complex disease risk is less clear than for Mendelian disorders, but patterns are beginning to emerge. Genes associated with complex disease display signs of less pervasive purifying selection than Mendelian disease genes [32, 173], and are generally enriched in signals of positive selection [23, 28, 32, 37, 110, 122, 169]. There is also increasing evidence to suggest that genetic adaptations can alter complex disease susceptibility, and the population distribution of common susceptibility alleles is unlikely to result from neutral processes alone [12, 91, 177179]. For example, the difference in susceptibility to hypertension and metabolic disorders between populations is thought to result from past adaptation to different environmental pressures [91, 179, 180]. Another study characterized the structure of complex genetic risk for 102 diseases in the context of human migration [178]. Differences between populations in the genetic risk of diseases such as type 2 diabetes, biliary liver cirrhosis, inflammatory bowel disease, systemic lupus erythematosus, and vitiligo could not be explained by simple genetic drift, providing evidence of a role for past genetic adaptation [178]. Likewise, Grossman and coworkers found overlaps between their candidate positively selected regions and genes associated with traits or diseases in GWAS [28], including height, and multiple regions associated with infectious and autoimmune disease risks, including tuberculosis and leprosy.

Like purifying selection, positive selection is prevalent among genes related to immunity and host defense [24, 37, 95, 109, 112, 115, 181]. Notable examples of immunity-related genes evolving in an adaptive manner, through different forms of positive or balancing selection, and reported to be associated with complex traits or diseases include:TLR1 and TLR5, which have selection signals that seem to be related to decreases in NF-kB signaling in Europe and Africa, respectively [28, 94, 95]; many genes involved in malaria resistance in Africa and Southeast Asia [98, 100]; type-III interferon genes in Europeans and Asians, related to higher levels of spontaneous viral clearance [101, 182]; LARGE and IL21, which have been implicated in Lassa fever infectivity and immunity in West Africans [181]; and components of the NF-kB signaling pathway and inflammasome activation related to cholera resistance in a population from the Ganges river delta [97]. These cases of selection related to infectious disease and many others (see [2931, 96, 103] for reviews and references therein) indicate that the pressures imposed by infectious disease agents have been paramount among the different threats faced by humans [183]. They also highlight the value of population genetics approaches in elucidating the variants and mechanisms underlying complex disease risk.

Changes in selective pressures and advantageous/deleterious variants

Most of the rare and common variants associated with susceptibility to disease in modern populations have emerged through neutral selection processes [184]. However, there is increasing evidence to suggest that, following changes in environmental variables or human lifestyle, alleles that were previously adaptive can become “maladaptive” and associated with disease risk [12, 13, 29, 30, 105]. For example, according to the popular “thrifty genotype” hypothesis based on epidemiological data, the high prevalence of type 2 diabetes and obesity in modern societies results from the selection of alleles associated with efficient fat and carbohydrate storage during periods of famine in the past. Increases in food abundance and a sedentary lifestyle have rendered these alleles detrimental [185]. The strongest evidence that past selection can lead to present-day maladaptation and disease susceptibility is provided by infectious and inflammatory disorders [12, 2931, 77, 105]. According to the hygiene hypothesis, decreases in the diversity of the microbes we are exposed to, following improvements in hygiene and the introduction of antibiotics and vaccines, have led to an imbalance in the immune response, with alleles that helped us to fight infection in the past now being associated with a higher risk of inflammation or autoimmunity [105].

Population genetics studies have provided strong support for the hygiene hypothesis, by showing that genetic variants associated with susceptibility to certain autoimmune, inflammatory, or allergic diseases, such as inflammatory bowel disease, celiac disease, type 1 diabetes, multiple sclerosis, and psoriasis, also display strong positive selection signals [29, 30, 106, 186188]. For example, genes conferring susceptibility to inflammatory diseases have been shown to be enriched in positive selection signals, with the selected loci forming a highly interconnected protein–protein interaction network, suggesting that a shared molecular function was adaptive in the past but now affects susceptibility to various inflammatory diseases [187]. Greater protection against pathogens is thought to be the most likely driver of past selection, but it has been suggested that other traits, such as anti-inflammatory conditions in utero, skin color, and hypoxic responses, might account for the past selective advantage of variants, contributing to the higher frequencies of chronic disease risk alleles in current populations [30]. Additional molecular, clinical, and epidemiological studies are required to support this hypothesis, but these observations highlight, more generally, the evolutionary trade-offs between past selection and current disease risk in the context of changes in environmental pressures and human lifestyle.

Conclusions and future directions

Population genetics offers an alternative approach, complementary to clinical and epidemiological genetic studies, for the identification of disease risk alleles/genes, the characterization of their properties, and the understanding of the relative contributions of human genetic variation to rare, severe disorders and complex disease phenotypes. Recent studies have shown that both ancient and recent demographic changes have modified the burden of rare, deleterious variants segregating in the population, whereas the population frequencies of other variants have increased because they conferred advantages in terms of better survival and reproduction.

These studies have made a major contribution, but further theoretical and empirical work is needed. Rare-variant studies should consider different fitness and dominance effects, epistatic interactions, and detailed demographic modeling to evaluate the potential impact of local changes in population size and admixture on the efficiency of purifying selection. Furthermore, rare-variant association studies involving complex traits or diseases should seek to account for the evolutionary forces that affect genetic architecture, such as selection and population demography, and integrate elaborated models of population genetics that consider the relationship between allele frequency and effect size and the distribution of phenotypes, as recently reported [189]. Independently of the complex interactions between demography and selection, additional sequence-based studies are required to catalog rare variants in different worldwide populations (including isolated populations), focusing not only on point mutations but also on indels, inversions, or copy-number variation, and evaluate their contribution to disease risk.

Studies of genetic adaptation, particularly those aiming to make connections with disease in populations historically exposed to different environmental variables, should generate whole-genome data for different worldwide populations with greatly contrasting demographic histories, lifestyles, and subsistence strategies. There is also a need to develop and improve statistical approaches to facilitate the detection of positive selection following alternative modes of genetic adaptation, such as selection on standing variation, polygenic adaptation, and adaptive introgression. These selection studies, if combined with data for molecular phenotypes (e.g., gene expression, protein and metabolite levels, epigenetic marks) and organismal phenotypes (in health and disease), should provide great insight into adaptive phenotypes of major relevance in human evolution and the genetic architecture of rare and common human diseases.