Variation of the adaptive substitution rate between species and within genomes

The importance of adaptive mutations in molecular evolution is extensively debated. Recent developments in population genomics allow inferring rates of adaptive mutations by fitting a distribution of fitness effects to the observed patterns of polymorphism and divergence at sites under selection and sites assumed to evolve neutrally. Here, we summarize the current state-of-the-art of these methods and review the factors that affect the molecular rate of adaptation. Several studies have reported extensive cross-species variation in the proportion of adaptive amino-acid substitutions (α) and predicted that species with larger effective population sizes undergo less genetic drift and higher rates of adaptation. Disentangling the rates of positive and negative selection, however, revealed that mutations with deleterious effects are the main driver of this population size effect and that adaptive substitution rates vary comparatively little across species. Conversely, rates of adaptive substitution have been documented to vary substantially within genomes. On a genome-wide scale, gene density, recombination and mutation rate were observed to play a role in shaping molecular rates of adaptation, as predicted under models of linked selection. At the gene level, it has been reported that the gene functional category and the macromolecular structure substantially impact the rate of adaptive mutations. Here, we deliver a comprehensive review of methods used to infer the molecular adaptive rate, the potential drivers of adaptive evolution and how positive selection shapes molecular evolution within genes, across genes within species and between species.


Introduction
After Darwin proposed that natural selection acts as a main driver of evolution, a major goal of evolutionary biologists has been to understand how beneficial mutations shape species adaptation to their environment. Over the years, the number of approaches used to detect positive selection has increased substantially, making use of the increasing amount of genome data available. In particular, methods have been developed to pinpoint genes, or positions within these genes, that exhibit a pattern of genetic variation statistically incompatible with a pure nearly-neutral scenario (Ohta 1992), where mutations are considered to be neutral, nearly neutral or deleterious (i.e. Nielsen et al. 2005;Ometto et al. 2005;Kosiol et al. 2008). The ecological relevance of such candidate genes can be further tested using functional annotations, when available, or experimentally, for instance, by using reverse genetics and ancestral allele reconstruction (i.e. Hilson et al. 2004;Nielsen et al. 2005;Voight et al. 2006;Roux et al. 2014). This allowed to detect instances of adaptive evolution in many functional categories, such as immune genes in ants (Roux et al. 2014) and in hominids , virulence associated genes in pathogens (Stukenbrock et al. 2011;Dong et al. 2014), and coat-color related genes in hares (Jones et al. 2018) and mice (Hoekstra et al. 2006). While such methods allow a detailed understanding of case-studies, they do not enable one to assess the genome-wide distribution of the fitness effects of mutations.
By contrast, mutation accumulation (MA) experiments are specifically designed to estimate a genome-wide rate of mutation and distribution of effects of mutations on fitness (i.e. Shaw et al. 2002;Bataillon 2003;Rutter et al. 2012). With this approach, one can infer (1) the number of mutations that led to the divergence between MA lines, and (2) the fitness effects of these mutations on the (fitness-related) trait of interest (i.e. viability or lifetime reproductive success; see Glossary). Previous studies have inferred the presence of beneficial mutations in MA line experiments both in the field and in greenhouse studies of A. thaliana (Shaw et al. 2002;Rutter et al. 2012). Nonetheless, MA approaches can only give insight on recent adaptive events, and, therefore, provide little information regarding the proportion of adaptive genetic differences between species. Furthermore, MA experiments yield too few beneficial mutations to be able to test for the occurrence of genomic regions where adaptive mutations are more likely to occur. Conversely, population genomic approaches only offer indirect insights on mutation rates and fitness effects but can leverage patterns of sequence variation between and within species to infer rates of adaptive evolution, thus providing knowledge on the drivers of adaptation at deeper scales of evolution.
The role of positive (a.k.a. Darwinian) selection in molecular evolution is still widely debated (Hey 1999;Gillespie 2000;Kern and Hahn 2018;Jensen et al. 2019). The neutral theory of molecular evolution (Kimura 1968) states that the bulk of segregating polymorphisms is either neutral or deleterious and that the genetic differences between species are explained mainly by neutral substitutions (see Glossary), while beneficial mutations are considered to be too rare to contribute much to the observed polymorphism and divergence. With an increasing amount of data becoming available, however, the question of whether adaptive mutations play a role in molecular evolution can be investigated with a greater precision. "How much of the genetic variation can be explained by adaptive evolution? What is the frequency of adaptive mutations along the genome? Are there regions where adaptive mutations are more likely to occur?" are some of the questions that can now be addressed with population genomics data and statistical methods for the inference of selection.
Here, we present the current state-of-the-art methods used to model the distribution of fitness effects (DFE) and infer the frequency of adaptive mutations. We then review evidence for variation in the rate of adaptive evolution within genes, within genomes and between species.

Synthesis of methods
In the following section, we review the methods that can be used to estimate the rate of adaptive evolution from sequence data. We distinguish two main approaches: phylogenetic methods, based on the divergence between multiple species; and population genetics approaches, which contrast within-species polymorphism to the divergence with an outgroup species.
α: proportion of amino-acid substitutions that are adaptive. Genetic drift: random changes in allele frequencies produced by the sampling of the genetic variants that compose a population every new generation. Genetic draft: a process that induces allele frequency changes through recurrent selective sweeps at linked positions. Selective sweep: the process by which a beneficial substitution reduces genetic diversity at linked positions. Background selection: the process by which negatively selected deleterious mutations reduce neutral genetic diversity at linked positions. ω a : rate of adaptive amino-acid non-synonymous substitutions relative to the mutation rate. K a+ : rate of adaptive amino-acid substitutions, denoted as: αK a , where K a represents an alternative notation of d N .

Quantifying the proportion of adaptive substitutions
(1) Phylogenetic methods The strength and direction of selection on the branch of a phylogenetic tree can be measured by contrasting the nonsynonymous (d N ) and synonymous divergence (d S ) in a given gene (e.g. Miyata et al. 1979;Li et al. 1985;Yang and Nielsen 2002;Eyre-Walker 2006). The d N /d S ratio, noted as ω, provides an estimate of the rate of nonsynonymous substitutions relative to the rate of synonymous substitutions. Assuming that mutation rates at synonymous and non-synonymous sites are constant and equal, and that synonymous substitutions are selectively neutral, genes with ω > 1 are considered to be evolving under positive selection, while genes with ω < 1 are evolving under negative selection. Because ω is based on averages of substitution rates across multiple nucleotide sites that undergo both positive and negative selection, this statistic can only detect strong positive selection (e.g. Yang and Nielsen 2002;Eyre-Walker 2006). As most nonsynonymous mutations are expected to be either neutral or deleterious, d N will tend to be much lower than d S , hence ω will tend to be globally lower than one (i.e. Yang and Nielsen 2002;Eyre-Walker 2006).
In order to consider variation in selective constraints in space and time, models have been developed to account for variable selective pressure among sites Yang et al. 2000Yang et al. , 2005, branches , or both (so-called branch-site models; Yang and Nielsen 2002;Kosakovsky Pond et al. 2011). In site-based models, the ω ratio varies across sites and positive selection is inferred at a specific site if the average d N is higher than d S over all lineages. In branch-based models, the ω ratio varies among lineages and positive selection is detected if the average d N is higher than d S across all sites in a certain branch or a series of branches defining a lineage in a phylogenetic tree. In turn, branch-site models allow the ω ratio to vary both across sites and lineages. Using this framework, distinct models can be compared to test for the occurrence of positive selection at particular sites or branches (e.g. Yang and Nielsen 2002;. Although these methods detect adaptation at the site level, it has been shown that they are conservative in measuring selection over a certain region and/or lineage (Rodrigue and Lartillot 2017). This higher conservatism could be due to adaptive processes not being concentrated on a small number of sites but rather scattered across a large number of positions in a certain genomic region (Rodrigue and Lartillot 2017). Moreover, 1 3 branch-site models assume that evolution on the majority of branches is neutral and that adaptive processes are rare and usually isolated. Hence, events of frequent adaptation over long evolutionary periods would not be captured, leading to underestimates of the rate of adaptive evolution in the tested proteins Yang et al. 2000Yang et al. , 2005Rodrigue and Lartillot 2017). Besides, as these approaches are based on multiple-species alignments, the analysis is focused on genes that are shared by all species, which are more ancient and typically more conserved. Rapidly evolving genes are typically discarded from such analysis since their alignment becomes less reliable as the divergence between species increases.
(2) Population genetics methods a. The McDonald and Kreitman (MK) test Population genetic methods pioneered by Hudson, Kreitman, and Aguadé (1987) test a neutral evolution scenario by comparing the number of polymorphic sites within a population with the number of substitutions with a distinct species (HKA test). Under a neutral scenario, the relative amount of polymorphism and divergence is constant between loci. The HKA test compares these values between at least two genomic regions to test this prediction (Hudson et al. 1987). McDonald and Kreitman (1991) first extended this approach to detect adaptive protein evolution (Fig. 1). The so-called MK test requires data from as little as two closely-related species, typically including several individuals in the study species and one individual from an outgroup species. It compares the number of polymorphisms to the number of substitutions for a locus in two classes of sites: synonymous, which are assumed to evolve neutrally, and non-synonymous, which are potentially under selection (McDonald and Kreitman 1991). The number of nonsynonymous substitutions is denoted as D n , the number of synonymous substitutions as D s , the number of nonsynonymous polymorphisms as P n and the number of synonymous polymorphisms as P s (see Glossary), leading to the so-called MK- Under a scenario where all mutations are either strongly deleterious or neutral, D n /D s is expected to be equal to P n /P s . Conversely, D n /D s higher than P n /P s is taken as a signature of positive selection, and D n /D s lower than P n /P s can be observed in case of balancing selection. As a beneficial mutation reaches fixation at a faster rate than a neutral mutation, it contributes comparatively more to divergence than to polymorphism levels (McDonald and Kreitman 1991;Eyre-Walker 2006). b. Extensions of the MK-test: Estimation of the proportion of amino-acid substitutions (α) By applying a derivative of the MK-table, Charlesworth (1994) estimated the proportion of amino-acid substitutions that are driven by positive selection, a measure referred to as α ( Fig. 1; see Glossary) (Charlesworth 1994;Smith and Eyre-Walker 2002): α = 1 − (D s P n )/ (D n P s ). However, as the levels of nucleotide diversity and amino-acid divergence are generally low, the numbers of polymorphic sites and nonsynonymous substitutions are very small for most genes taken individually. Hence, estimates of α for single genes have inherently large sampling variances, leading to the need for pooling data across many genes (Stoletzki and Eyre-Walker 2011). Such pooling is often done by summing counts of polymorphisms and divergence in each category (Fay et al. 2001) or by taking the average across genes (Smith and Eyre-Walker 2002). By using a different parametrization of the MK test, Sawyer and Hartl (1992) used a Poisson random field (PRF) model to derive expectations for the counts of D n , D s , P n and P s by considering the processes of mutation, selection, and genetic drift (see Glossary) acting independently and simultaneously at multiple sites (Sawyer and Hartl 1992). From the PRF model, one can relate the scaled selection coefficient (γ = N e S , where N e represents the effective population size and s the selection coefficient) and counts of polymorphism and divergence. Based on this approach, Bayesian models were developed where the posterior distribution of scaled selection coefficients for a given locus is inferred either by assuming a fixed-effects model, where γ is constant across sites (Bustamante et al. 2002); or a random-effects model, where γ of each new mutation is drawn from a single underlying normal distribution (Sawyer et al. 2003).
However, a limitation of these approaches is that they do not account for the segregation of slightly deleterious mutations, which can bias estimates of α in a demography-dependent 1987 1991 MK test (2)

Fig. 1
Timeline presenting the state-of-the-art population genetic methods to infer the rate of adaptive evolution (top) and the major findings on the factors impacting the variation of the molecular adaptive rate across species, along the genome, between and within genes (bottom). References for DFE methods can be found in Table 1. Light orange boxes correspond to the variation of the molecular adaptive rate between species; dark orange boxes represent the variation along the genome; blue boxes represent variation between protein-coding genes; and the green box correspond to the factors impacting the molecular adaptive rate at the intra-genic level. References for these studies can be found in the corresponding section in the main text. α: proportion of adaptive amino-acid substitutions; N e : effective population size; s: selection coefficient; ω a : rate of adaptive non-synonymous substitutions; RSA: relative solvent accessibility. manner (Eyre-Walker and Keightley 2009). On the one hand, α can be underestimated if the population size has been relatively constant or decreased since the divergence from the outgroup species, because slightly deleterious mutations may be observed as polymorphisms while having a much lower chance of fixation when compared to neutral mutations. This, however, can be controlled by removing polymorphisms segregating at low frequencies (Charlesworth 1994;Smith and Eyre-Walker 2002). On the other hand, α can be overestimated if the tested population experienced a demographic expansion: as the level of polymorphism is much lower, it leads to an apparent excess of substitutions (Eyre-Walker 2002). Modelling of the full range of the fitness effects of mutations and proper accounting of the underlying demography of the sample is, therefore, needed to achieve more accurate estimates of α.

Inferring α and the distribution of fitness effects (DFE) from the site frequency spectrum (SFS)
In the following, we briefly present methods that are specifically designed to infer the distribution of fitness effects from the frequency of the derived alleles across the genome in order to estimate the rate of adaptive evolution.

a. The folded/unfolded Site Frequency Spectrum (SFS)
The site frequency spectrum (SFS) is used to summarize the levels of polymorphisms in a sample of individuals. It represents the empirical distribution of the allelic frequencies for a given set of loci in the population. If the information on the ancestral allele at each variable position is available, the unfolded SFS can be computed, where the set of counts of the derived allele will be given. Conversely, if the ancestral allele cannot be inferred, the folded SFS may be calculated instead, representing the distribution of the minor allele frequencies. In these approaches, the SFS of potentially selected sites is compared to a neutral SFS. Most methods do so by comparing a non-synonymous to a synonymous SFS, however, this can also be done by contrasting genic with intergenic regions (Racimo and Schraiber 2014) or protein-binding with non-binding sites (Jenkins et al. 1995). The shape of both SFS provides crucial information on the underlying population genetic processes, such as demography and selection (Schraiber and Akey 2015; Barroso et al. 2019). For instance, slightly deleterious mutations segregate more often at low frequencies relative to neutral ones, while positively selected mutations are typically segregating at a higher frequency. But demography can also impact the SFS. For example, an expanding population has an excess of rare variants relative to what is expected in a stable population (Tajima 1989; Schraiber and Akey 2015; Barroso et al. 2019). The challenge is, therefore, to distinguish between the effect of selection and demography. This is done by assuming a neutral reference, for instance, the synonymous SFS, to which a demographic model is fitted. Selection is then inferred from the non-synonymous SFS. This assumption, together with the assumption of site independence is central to all methods inferring the distribution of fitness effects from the SFS.
b. The use of divergence data The number of substitutions is usually computed at the codon level, distinguishing nonsynonymous from synonymous substitutions, or an equivalent if non-coding DNA is used, 1 3 by comparing the study species with at least one outgroup species. The outgroup sequences have to be selected with care. First, a closely-related outgroup species can potentially bias estimates of the rate of adaptive substitutions due to potentially shared polymorphisms. Second, a distantly-related outgroup species may lead to an underestimation of the divergence, and consequently of the rate of adaptive evolution, due to the possible presence of multiple "invisible" substitutions between the two species. One can potentially overcome this limitation by using multiple outgroup species, in order to span several levels of divergence and get more accurate estimates of the local substitution rate (Keightley and Jackson 2018). Moreover, if the divergence between the outgroup and the ingroup species is too high, we may suffer from the same bias as phylogenetic methods towards the more conserved genes, as fast evolving genes will not yield reliable sequence alignments. This would potentially underestimate the rate of adaptive substitutions by losing information on lineage-specific genes.
c. First likelihood models of DFE accounting for slightly deleterious mutations The first likelihood model used to estimate the molecular rate of adaptive evolution was developed by Bierne and Eyre-Walker (2004) (Fig. 1). The authors developed an extension of the MK test allowing nonsynonymous mutations to be potentially strongly advantageous. This model assumes that, for a given gene, estimates of D n , D s , P n and P s are Poisson distributed and infers the number of adaptive amino-acid substitutions (η) and α by assuming that the selection parameters are either constant across all loci or that they follow a certain DFE, in this case, a Gamma or a Beta distribution (see Box 1). Welch (2006) extended the method developed by Bierne and Eyre-Walker (2004) by including models with a continuous distribution of selection coefficients and a two weighted spikes probability distribution of α, where α takes the value α 0 or α 1 with probabilities q and 1 − q (Eqs. 4 and 8, respectively ;Welch 2006). This likelihood framework has the advantage of enabling the comparison between nested models (Mangel and Hilborn 1996; Barton 2000): to test the occurrence of positive selection, we compare a model that potentially includes adaptive substitutions (η or α > 0) with a neutral model (η or α = 0) (Bierne and Eyre-Walker 2004;Welch 2006).
Further extensions of these methods model a deleterious DFE in the form of a Gamma distribution Keightley and Eyre-Walker 2007;Eyre-Walker and Keightley 2009). Each mutation arising at a site is ascribed a scaled selection coefficient, 4N e s, where the effective population size (N e ) is constant among loci, and s is drawn from an underlying DFE to be estimated from the data. Moreover, the SFS jointly estimates demographic parameters that allow for temporal changes in the effective population size Keightley and Eyre-Walker 2007;Eyre-Walker and Keightley 2009). These models come together in two of the most widely used inference methods: DoFE and dfe-alpha (Fig. 1, Table 1).

d. Extensions accounting for beneficial mutations
The fitness effect of new mutations is unlikely to be uniform within a given gene, but is rather expected to vary according to the sequence context and the nature of the functional changes that are incurred. It is, therefore, also important to consider the contribution of beneficial mutations to the SFS in addition to deleterious mutations. Some model-based inference methods account for mutations with positive effects in the DFE. Some of these  (2014) for a review of theoretically plausible distributions) while others are motivated by statistical convenience (to fit the data with a flexible distribution). An extension of the dfe-alpha method described above (Schneider et al. 2011) uses the unfolded SFS together with divergence data to model a Gamma DFE that also accounts for positively selected mutations (Table 1, Fig. 1). The Grapes method (Galtier 2016) can be used with both unfolded and folded SFS combined with divergence data (which is optional when the unfolded SFS is used) to model five different DFE, including the traditional Gamma distribution of deleterious mutations and four other models that account for mutations with beneficial effects (Table 1, Fig. 1). Galtier (2016) analyzed the performance of these models over 44 different datasets and observed that the GammaExponential model, which combines a Gamma distribution of deleterious mutations with an exponential distribution of beneficial mutations, and the ScaledBeta model, which uses a Beta-shaped distribution of slightly deleterious and advantageous mutations, were the ones with the best AIC scores, thus highlighting the important role of beneficial mutations in shaping the SFS. Using a similar framework, polyDFE (Tataru et al. 2017) infers the DFE from an unfolded SFS but does not require divergence data, thus allowing the estimation of the molecular adaptive rate on the branch of the study species. PolyDFE can model different DFE, including a model comprising a combination of gamma and exponential distributions to model mutations with negative and positive effects, respectively (Table 1, Fig. 1). At the level of noncoding DNA, INSIGHT (Gronau et al. 2013) contrasts the unfolded SFS and divergence The method developed by Bierne and Eyre-Walker (2004) represents the first likelihood model that extends the MK test to estimate the rate of adaptive evolution. We further describe the parameters and the underlying assumptions of this model, which constitute the foundation for the methods developed hereafter For a sample of n i sequences at a locus i the expected numbers of synonymous polymorphisms P si and substitutions D si and numbers of nonsynonymous polymorphisms P ni and substitutions D ni are denoted as

Polymorphisms
Substitutions By assuming that sites evolve independently (i.e. are in linkage equilibrium), this method uses a likelihood framework to model the data for n loci where observed data at each locus is summarized via the statistics P si ,P ni ,D si andD ni that are each Poisson distributed. This model has four parameters per locus and a maximum of 4n parameters. It is possible to reduce the number of parameters by assuming that, either some parameters are constant across loci, or selection parameters follow a certain probability density function, which constitutes the distribution of fitness effects. The authors evaluated different models where η and α are constant over all loci, or where η follows a gamma distribution and αis beta distributed in the non-coding elements of interest with those in flanking neutral sites. This method applies a generative probabilistic model by pooling data across non-coding elements considering the within-genome variation in mutation rates and coalescent times. INSIGHT models a categorical DFE, where each site is assumed to evolve under one of four different selective processes: neutral drift, strong negative selection, weak negative selection or positive selection (Table 1).
Despite their similarity, the methods above make slightly different assumptions when modeling polymorphism (SFS counts) and divergence (divergent sites relative to an outgroup). All methods assume a Poisson random field model and that the polymorphism data can be summarized by counts of the unfolded or folded SFS. Grapes, dfe-alpha and DoFE assume that the SFS is known without error, while polyDFE can model an independent rate of misorientation in the data, and INSIGHT uses a low dimensional projection of the SFS, by treating the ancestral allele as a hidden random variable in the model. Demography is either modeled via a set of nuisance parameters (Grapes, polyDFE) or assuming a fixed demographic model featuring a specific change of population size back in time that is also estimated (DFE-alpha, DoFE). Last but not least, most methods model a single SFS (synonymous versus non-synonymous) across genes, but a recent extension of polyDFE allows for fitting jointly several SFS datasets simultaneously (Tataru and Bataillon 2019). This can be used to determine whether distinct genomic regions and/or species share a common DFE, or provide evidence for differences in DFE among genomic regions/species.

e. aMK and ABC-MK models
The previously described methods assume that sites evolve independently. However, there has been growing evidence that selection at linked sites might be shaping genomewide patterns of polymorphism (Barton 1995;Andolfatto 2007;Macpherson et al. 2007). Theoretical and empirical studies showed that, besides genetic drift and purifying selection, the frequency of a given allele can also be affected by recurrent selective sweeps at closely linked positions, a process known as genetic draft (see Glossary) (Gillespie 2000). Moreover, background selection (see Glossary) can also affect polymorphism levels at neutral sites if slightly deleterious mutations are segregating, creating interference at linked sites (Charlesworth et al. 1993;Bustamante et al. 2005;Keightley and Eyre-Walker 2007;Charlesworth 2012). Messer and Petrov (2012) developed an extension of the MK test that accounts for the effects of background selection and genetic draft on the levels of polymorphisms. They define α(x) as a function of the frequency of the derived mutation: α(x) = 1 − (d 0 · p (x) )/d · p 0(x) , where p (x) and p 0(x) represent the polymorphism levels at nonsynonymous and synonymous sites, for a specific derived allele frequency x. Here, any bias affecting the synonymous and nonsynonymous SFS, either demography or selection at linked sites, will be excluded, as α(x) only depends on the ratio p (x) /p 0(x) . The asymptotic value of α(x) is then estimated in the limit x → 1, where it should converge to the true value of α under the MK assumptions: in practice, this is done by fitting an exponential function to the data, given by: α(x) ≈ α + bexp(− cx). This function, however, assumes that all deleterious mutations have the same selection coefficient and that levels of nonsynonymous mutations decrease roughly exponentially with increasing frequency of neutral polymorphisms. Uricchio et al. (2019) extended this method by exploring the impact of background selection on the rate of adaptation using an approximate Bayesian computation (ABC) method, which the authors call ABC-MK (Table 1, Fig. 1). As in the αMK approach, this model is less sensitive to the demography of the population. Besides, it separately infers α for both weakly and strongly beneficial alleles, thus accounting for the strength of selection. To do so, ABC-MK assumes that deleterious mutations are gamma-distributed and allows α to follow a continuous distribution, from weakly to strongly beneficial mutations. As these models are less sensitive to the uncertainty associated with the demography of the population, they have the power to deliver more robust estimates of the molecular rate of adaptation on non-model organisms.
f. Statistics used to infer the rate of adaptive substitutions From the above-described methods, three major statistics are often used to qualify the rate of adaptive non-synonymous substitutions: ω a , α and K a+ . The rate of adaptive nonsynonymous substitutions relative to the mutation rate, denoted as ω a , is given by ω − ω na , where ω na represents the fraction of the ω ratio contributed by neutral and deleterious mutations. The proportion of positively selected amino-acid substitutions, α, is then estimated as ω a /ω. Finally, K a+ represents the rate of adaptive amino-acid substitutions and is given by αK a , where K a is an alternative symbol of d N , which is the number of nonsynonymous substitutions per site. Each of these statistics has its limitations. For instance, α depends both on ω a and ω na , thus differences in α may be due to variations in any of the two rates or both, making it unsuitable for distinguishing the impact of negative and positive selection. On the other hand, ω a is normalized by the mutation rate and, therefore, cannot be used to assess the impact of the mutation rate itself, which is an important varying factor along the genome. In this case, K a+ is more appropriate (Castellano et al. 2016).

Between-species variation in the molecular adaptive rate
Several studies investigated the prevalence of positive selection in the evolution of distinct species. Here, we provide a summary of their main conclusions.

a. Drosophila
Building on a long history of genetic studies, the Drosophila species complex was used in some of the pioneering research on adaptive evolution (Haudry et al. 2019). Brookfield and Sharp (1994) were the first to use the MK test to scan for signs of positive selection in Drosophila. They reported that three out of the seven genes analyzed had an excess of non-synonymous substitutions, thus suggesting that adaptive evolution was pervasive. By studying 35 genes, Smith and Eyre-Walker (2002) confirmed this hypothesis by reporting that ~ 45% of the amino-acid substitutions between D. simulans and D. yakuba were driven by positive selection. In the same year, Fay et al. (2002) estimated that ~ 70% of the amino-acid substitutions between D. simulans and D. melanogaster were adaptive. Further genome-wide studies also reported similar levels of adaptive evolution in the Drosophila genome (reviewed in Sella et al. 2009): 25 ± 20% (Bierne and Eyre-Walker 2004;Shapiro et al. 2007); 40 ± 10% (Welch 2006b); ~ 50% (Andolfatto 2007). Looking at the divergence between D. pseudoobscura and D. affinis, Haddrill et al. (2010) estimated even higher values of α, suggesting that 70-90% of the amino-acid substitutions differentiating the two species were driven by positive selection. By applying a Bayesian approach (Sawyer and Hartl 1992;Bustamante et al. 2001), Sawyer et al. (2003 estimated that ~ 94% of the substitutions were adaptive, although weakly selected (N e s ≈ 5, where s is the selection coefficient). It has been suggested, however, that these values of α could be overestimated if the current N e is larger than the ancestral species (Eyre-Walker 2006; Rousselle et al. 2018). Nonetheless, analyses across the Drosophila genus led to similar estimates of α and, at least for D. melanogaster, the population size was inferred to have decreased (Akashi 1996;Haudry et al. 2019). Moreover, a recent study considering the past demography of the ancestral species found similar values of α to those previously reported in D. melanogaster (~ 49%, Zhen et al. 2018). These studies, therefore, provide evidence that positive selection may indeed be a prevalent mode of evolution in Drosophila genus.

b. Hominids
Alongside Drosophila, humans and apes have been focal species for studies of adaptive evolution. Fay et al. (2001) reported that ~ 35% of the fixed amino-acid differences between humans and old-world monkeys were positively selected. This study, however, had the shortcoming of using a very conserved set of polymorphisms, which can overestimate the rate of non-synonymous substitutions, and consequently α (Eyre-Walker 2006). Conversely, several studies proposed that the rate of adaptive evolution is almost zero in chimpanzees (Mikkelsen et al. 2005;Hvilsom et al. 2012;Castellano et al. 2019) and within hominids (Zhang and Li 2005;Boyko et al. 2008;Eyre-Walker and Keightley 2009), suggesting that only ~ 10% of the fixed differences between humans and chimpanzees are adaptive Boyko et al. 2008). In turn, Enard et al. (2014) found genomewide signals of positive selection in the human genome after correcting for the effects of background selection and suggested that adaptation in humans is mainly driven by regulatory rather than by coding differences. A recent study using an improved modeling of segregating weakly deleterious mutations and accounting for the demographic history of the ancestral species reported an α value around 20%, which is consistent when using the chimpanzee or the macaque as the outgroup species (Zhen et al. 2018). The authors argued that considering the same population size for the outgroup and ancestral species could bias estimations of α, especially in humans, where the human ancestral population is known to be much smaller than that of, for example, chimpanzees or macaques. We discuss in more detail these differences across studies in the last section of this topic (f).
c. Non-primate mammals Halligan et al. (2010) reported that 57% of the amino-acid substitutions were adaptively driven in Mus musculus castaneus, a species of murid rodents. In two subspecies of the European rabbit, Oryctolagus cuniculus algirus and O. c. cuniculus, more than 60% of the amino-acid substitutions were found to be adaptive (Carneiro et al. 2012). Furthermore, a study performed on 44 non-model organisms, reported a mean value of α of around 50% in twelve mammal species (Galtier 2016).

d. Plants
Studies of plants led to a huge variation in the inferred rate of molecular adaptation across species. High rates of adaptive evolution have been measured for the grand sheperd's-purse (Slotte et al. 2010), the European aspen (Ingvarsson 2010) and species of sunflowers (Gossmann et al. 2010;Strasburg et al. 2011), where more than 30% of the aminoacid substitutions were estimated to be driven by positive selection. For the majority of 1 3 plant species studied, though, α was observed to be close to zero (Gossmann et al. 2010). For example, in Arabidopsis thaliana, amino-acid substitutions are predominantly deleterious (Bustamante et al. 2002) with an average adaptive substitution rate very close to zero (Slotte et al. 2011). Authors proposed that this could be due to the Arabidopsis mating system, which by having a high frequency of inbreeding makes it harder to remove deleterious mutations (Bustamante et al. 2002). There are studies, however, reporting signs of adaptive evolution in the Arabidopsis genome. Barrier et al. (2003) found signs of positive selection in ~ 5% of the genes and Moutinho et al. (2019) showed that rates of adaptive evolution of sites at the surface of proteins are higher than the average across the genome, thus suggesting that some regions of the Arabidopsis genome are undergoing positive selection.
Slightly deleterious mutations were also observed to be prevalent in the genomes of A. lyrata (Barnaud et al. 2008;Foxe et al. 2008), Sorghum bicolor (Hamblin et al. 2006), and Zea species, (Bijlsma et al. 1986;Ross-Ibarra et al. 2009), thus suggesting very low rates of adaptive evolution also for these organisms. The reason behind such low rates of adaptive evolution in plant species is still unclear and further studies are needed to link plant adaptation at the ecological and molecular levels.

e. Other species
The rate of adaptive evolution was also studied in a wide range of other organisms. For yeast (Liti et al. 2009) and the giant Galapagos tortoise (Loire et al. 2013), α was observed to be close to zero. Conversely, studies on the sea squirt (Tsagkogeorga et al. 2012) and enterobacteria (Charlesworth and Eyre-Walker 2006) reported that ~ 50% of the aminoacid substitutions are adaptive. For viruses, a high rate of adaptive substitutions is also observed: Williamson (2003) suggested that ~ 50% of the substitutions in the env gene of HIV-1 were positively selected. By accounting for the distribution of d N /d S across codons, Nielsen and Yang (2003) inferred slightly higher rates of adaptive evolution (75%). Moreover, they reported an α of about 85% in the hemagglutinin gene of the human influenza virus.
f. What causes the across species variation of the rate of molecular adaptive evolution?
In the previous sections, we gave an overview of the wide range of data obtained across taxa, highlighting the great variation in the inferred rate of adaptive evolution across species (Fig. 2a). The factors determining this variability, however, remain unclear. Several studies have proposed that cross-species variation is explained by differences in effective population size (Eyre-Walker 2006; Eyre-Walker and Keightley 2009;Gossmann et al. 2012). According to this hypothesis, species with smaller N e accumulate more weakly deleterious mutations simply by chance, thus increasing ω na and consequently reducing estimates of α. Conversely, large-N e species are under more efficient purifying selection, hence removing mutations with negative effects from the allele pool at a faster rate. By performing a study on 44 different species, Galtier (2016) confirmed this hypothesis by showing that N e was positively correlated with α and ω na , but not ω a .
On the other hand, if the population size decreases, α can also be strongly underestimated due to segregating slightly deleterious mutations, which will remain within the population (Eyre-Walker and Keightley 2009;Zhen et al. 2018). Such a scenario was reported to be the cause of very low rates of adaptive evolution in the human genome (Zhen et al. 2018). By considering the demography of the ancestral population, Zhen et al. (2018) revealed an α value of around 30%, higher than previous estimates for this species (Boyko et al. 2008;Eyre-Walker and Keightley 2009). Moreover, they found more strongly selected and/or more abundant advantageous mutations in humans when compared with mice and fruit flies. The authors proposed that these differences could reflect the number of traits under selection (Lourenço et al. 2013;Zhen et al. 2018). According to this hypothesis, larger long-lived organisms, such as humans, have less capacity to adapt to new environments, due to the greater number of traits under selection. Such organisms are theoretically expected to need more consecutive beneficial mutations to reach their fitness optimum, and thus a higher proportion of beneficial mutations should be accordingly detected in these Variation of the rate of adaptive non-synonymous substitutions (ω a ; in black) and the rate of nonadaptive non-synonymous substitutions (ω na ; in grey) between species (a), within genomes (b) and within genes (c). The R 2 Pearson's correlation coefficient is given along with significance denoted by asterisks (**P value < 0.01, ***P-value < 0.001). a Relationship between ω a and ω na with the level of species nucleotide diversity (π), used as a proxy for effective population size, obtained from Galtier (2016). Each sample point represents one species. Dots with bigger sizes correspond to D. melanogaster (data from Moutinho et al. 2019), which is the focus species of plots (b) and (c). b Relationship between ω a and ω na with the recombination rate in cM/Mb, taken from Moutinho et al. (2019). Each dot represents the mean value of ω a or ω na for each recombination rate class. c Relationship between ω a and ω na with the relative solvent accessibility (RSA), obtained from Moutinho et al. (2019). Each dot represents the mean value of ω a or ω na for each RSA class 1 3 species (Lourenço et al. 2013;Rousselle et al. 2018Rousselle et al. , 2019b. More studies are needed to clarify what is causing the observed differences between species.

Within-genome variation of the molecular rate adaptation
Several studies provided evidence for a substantial variation in the rate of adaptive substitutions along the genome. In this section, we summarize the factors that were found to influence the distribution of adaptive substitutions within species (Fig. 1).

a. Genome-wide variables
At the genome level, recombination, mutation and gene density are important determinants of the rate of adaptive substitutions (ω a ) (Marais and Charlesworth 2003;Campos et al. 2014;Castellano et al. 2016). Recombination rate is predicted to favor the fixation of adaptive substitutions (Fig. 2b) by breaking down linkage disequilibrium (Marais and Charlesworth 2003;Campos et al. 2014;Castellano et al. 2016). Advantageous mutations occurring at linked sites but in distinct individuals will interfere, so that only one will ultimately reach fixation unless a recombination event creates a haplotype carrying both of them (Hill-Robertson interference, HRi;Hill and Robertson 1966;Felsenstein 1974). As a result, genes in low recombining regions are expected to have overall lower rates of adaptive substitutions. Following a similar rationale, genes present in regions with high gene density may be subject to stronger HRi and slow rates of adaptive evolution (Castellano et al. 2016). In turn, genes with high mutation rates potentially adapt faster because they increase the levels of genetic diversity, which, consequently, increases the chance of selection operating such that adaptive processes may occur. Interestingly, Castellano et al. (2016) found that the positive correlation between mutation rate and the rate of adaptive substitutions no longer holds for genes located in regions with low recombination rate and high gene density, thus suggesting a strong effect of HRi in the presence of a large number of selected mutations with a small genetic distance between them. Similarly, Gossmann et al. (2011) observed that variations in N e resulting from linked selection along the genome significantly impact the efficiency of natural selection in C. grandiflora and A. thaliana, where regions with larger N e are subject to stronger purifying selection.
b. Protein-coding: gene-wide variables On a gene-wide scale, it has been reported that protein function strongly influences the rate of adaptive evolution, with genes involved in the immune response presenting the highest rates of adaptation in Drosophila (Sackton et al. 2007;Obbard et al. 2009), Arabidopsis (Slotte et al. 2011), hominids (Nielsen et al. 2005Kosiol et al. 2008) and other mammals (Kosiol et al. 2008). Sex-related genes were also reported to present higher rates of adaptive evolution in Drosophila (Pröschel et al. 2006;Haerty et al. 2007) chimpanzees (Hvilsom et al. 2012) and in plants (Gossmann et al. 2014;Crowson et al. 2017). Moreover, a recent study showed that genes involved in protein biosynthesis and signaling for protein degradation exhibit the highest rates of adaptive substitutions in Drosophila and Arabidopsis . Cytochrome P450 proteins, which are involved in defense response in plants, were also characterized by high rates of adaptation in Arabidopsis ). Several studies have described that host-pathogen interactions act as key drivers of protein evolution in several taxa (Sackton et al. 2007;Obbard et al. 2009;Enard et al. 2016;Ebel et al. 2017;Mauch-Mani et al. 2017;Uricchio et al. 2019;Grandaubert et al. 2019), which could explain the observed high levels of adaptive evolution in the functions described above. Moreover, mean gene expression levels and the breadth of expression negatively impact the rate of adaptive evolution in Drosophila, where the two factors may be acting together (Duret and Mouchiroud 2000;Salvador-Martínez et al. 2018;Moutinho et al. 2019). This relationship with expression may be a consequence of stronger purifying selection in highly expressed genes, where selection acts by favoring proteins with the lowest probability of misfolding, which occurs if the protein sequence accumulates translational missense errors (Drummond et al. 2005). Additionally, the macromolecular structure of the protein was also observed to substantially impact the rate of protein adaptation in humans (Afanasyeva et al. 2018), Drosophila and Arabidopsis . In this case, proteins with a higher proportion of disordered regions (Afanasyeva et al. 2018;Moutinho et al. 2019) and/or exposed residues ) are prone to accumulate more adaptive mutations, acting as important targets of positive selection.

c. Protein-coding: intra-molecular factors
There is growing evidence that adaptive substitution rates also vary significantly at the intra-genic level. Studies both at the population and divergence level, have shown that the relative solvent accessibility (RSA) significantly impacts the rate of amino-acids substitutions (Fig. 2c), with exposed residues accumulating more adaptive mutations than buried ones (Goldman et al. 1998;Mirny and Shakhnovich 1999;Franzosa and Xia 2009;Liberles et al. 2012;Moutinho et al. 2019). When contrasted with the effect of residue intrinsic disorder, RSA was observed to contribute with most of the variation in ω a (95% and 87% of variance explained for A. thaliana and D. melanogaster, respectively; Moutinho et al. 2019). This suggests that solvent exposure is the main determinant of adaptive evolution at the level of protein structure, and that protein intrinsic disorder contributes with a mere additive small effect to the rate of protein adaptation . Furthermore, the type of amino-acid mutation was also reported to be an important factor affecting the rate of adaptive evolution, with more similar amino-acid changes presenting higher rates of adaptive substitutions (Grantham 1974;Miyata et al. 1979;Bergman and Eyre-Walker 2019).

d. Non-coding DNA
While much attention has been given to the study of the adaptive evolution of proteincoding genes, there is increasing evidence that the non-coding regions of the genome are also key targets of positive selection. By using an MK-like approach, contrasting numbers of polymorphisms and substitutions at protein-binding and non-binding sites, Jenkins et al. (1995) reported signatures of adaptive change in the control for gene expression in D. melanogaster. Kohn et al. (2004) estimated that ~ 50% of all substitutions in the 5′ region of eight Drosophila genes were adaptively driven. By extending these approaches, Andolfatto (2005) investigated patterns of molecular evolution in multiple classes of non-coding DNA in D. melanogaster and found that around 60% and 20% of the total nucleotide divergence with D. simulans were fixed by positive selection, in UTRs and intronic/intergenic regions respectively. These findings suggest that the noncoding regions of the D. melanogaster genome are key determinants of adaptive evolution. Likewise, Haddrill et al. (2008) found signs of adaptive evolution in the non-coding regions of the D. simulans genome. These patterns go beyond the Drosophila genus since there is evidence of widespread positive selection in noncoding conserved regions along the Brassicaceae phylogeny (Williamson et al. 2014). In hominids, however, the opposite pattern is observed. Keightley et al. (2005) analyzed the downstream and upstream regions of protein-coding genes using an MK approach and found no signs of adaptive evolution. This result might reflect the overall low levels of adaptive evolution in hominid genomes due to the lower effective population sizes. With the thrive of full genome sequence data, adaptive evolution can now be more extensively studied outside the coding regions (Gronau et al. 2013), which, until now, were the focus of most studies.

Current limitations and future perspectives
In the last two decades, numerous methods have been developed to detect and quantify adaptive evolution. This, together with the availability of datasets spanning many genes and species, increased our knowledge of the factors underlying the heterogeneity of rates of molecular adaptation within genomes and between species. However, existing methods rely on several assumptions that can create biases in the estimates of adaptive evolution when not met. For instance, the methods reviewed here assume that synonymous mutations are neutral, which may not always be a valid approximation, especially in species with large effective population sizes (Lawrie et al. 2013). Several studies have documented that selection for codon usage also affects the rate of synonymous substitutions in several species, including Drosophila (Akashi 1994;Comeron et al. 1999), the European aspen (Ingvarsson 2010) and non-model animals (Galtier et al. 2018), mammals and birds (Rousselle et al. 2019a). Finding a proper neutral reference remains a challenging goal. Yet, a similar approach to that used in codon models (Yang and Nielsen 2008;Spielman and Wilke 2016;Rodrigue and Lartillot 2017) could, in principle, be considered for methods inferring the rate of adaptive evolution by accounting for the evolution of synonymous sites. This would lead to a more realistic null model of neutral evolution and, consequently, less biased estimates of the molecular rate of adaptation (Rodrigue and Lartillot 2017).
Another challenge consists of better accounting for the confounding effects of demography. Some methods fit a simplified demographic model (DFE-alpha, DoFE) while others correct for demography by adding extra parameters, one per frequency category of the SFS (Grapes, polyDFE). The number of such parameters, therefore, increases with the sample size and can quickly lead to model overparameterization issues. Extending the methods to use a continuous SFS constitutes one perspective to accommodate increasingly larger datasets. Alternatively, the demography of the population could also be estimated from the currently available coalescent methods (i.e. the SMC ++, Terhorst et al. 2017;or ∂a∂i, Gutenkunst et al. 2009).
Besides, current models often assume a constant DFE across the whole genome. This can lead to a bad model fit because selection varies within and between genomic regions. Such an assumption can be relaxed by allowing DFE parameters to vary along the genome. Moreover, the use of an outgroup species to infer the ancestral allele (polymorphism orientation) can lead to biases in the estimates of adaptive evolution, whether the outgroup is a very closely-related species or a very distantly-related one (Hernandez et al. 2007). This can be alleviated by using multiple outgroup species and probabilistic ancestral allele reconstructions (e.g. Keightley and Jackson 2018). Furthermore, by using only one outgroup sequence, these methods are estimating divergence on the total branch separating the focal and the outgroup species. Using a second outgroup species and a phylogenetic approach, however, would allow restricting the estimation of the divergence parameters to the branch of the study species.
Furthermore, these methods assume that all sites are equally sampled in all individuals and do not intrinsically account for the possibility of missing data. Pre-processing of the data is therefore required, which can introduce biases if too many sites have to be discarded. Finally, methods relying on patterns of polymorphism cannot track positively selected mutations of individual sites, limiting the power of these analyses in detecting positive selection at the site level. Combing such population genetics approaches with mutation accumulation experiments is a promising avenue to further understand the fitness effect of particular mutations. This, however, would have to be done across several generations so that enough mutations could be generated.

Conclusions
The development of statistical approaches based on the pioneering work of McDonald and Kreitman (1991), together with the increasing availability of genome sequences at the population level, paved the way for the qualitative and quantitative assessment of rates of adaptive evolution, both between species and within genomes. Growing evidence suggests a substantial variation of the molecular adaptive rate at distinct levels of molecular evolution, emphasizing the multitude of factors that can influence the rate of adaptation. These studies introduced a conceptual and theoretical framework that, we posit, will serve as a basis for increasingly realistic models that will strengthen our understanding of the fitness effect of new mutations and, therefore, the molecular basis of adaptation.