# Divergence and Polymorphism Under the Nearly Neutral Theory of Molecular Evolution

## Authors

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s00239-008-9146-9

- Cite this article as:
- Welch, J.J., Eyre-Walker, A. & Waxman, D. J Mol Evol (2008) 67: 418. doi:10.1007/s00239-008-9146-9

- 13 Citations
- 209 Views

## Abstract

The nearly neutral theory attributes most nucleotide substitution and polymorphism to genetic drift acting on weakly selected mutants, and assumes that the selection coefficients for these mutants are drawn from a continuous distribution. This means that parameter estimation can require numerical integration, and this can be computationally costly and inaccurate. Furthermore, the leading parameter dependencies of important quantities can be unclear, making results difficult to understand. For some commonly used distributions of mutant effects, we show how these problems can be avoided by writing equations in terms of special functions. Series expansion then allows for their rapid calculation and, also, illuminates leading parameter dependencies. For example, we show that if mutants are gamma distributed, the neutrality index is largely independent of the effective population size. However, we also show that such results are not robust to misspecification of the functional form of distribution. Some implications of these findings are then discussed.

### Keywords

Genetic driftDistribution of mutant effectsNeutrality indexSpecial functions## Introduction

The neutral and nearly neutral theories of molecular evolution placed interpopulation divergence and intrapopulation polymorphism within a single explanatory framework, attributing both to the action of genetic drift (Kimura 1983; Ohta and Gillespie 1996). While alternative theories, emphasizing positive selection, and/or linkage effects, continue to receive attention (e.g., Gillespie 1991, 2001), the drift-based theories remain central to the study of molecular evolution.

Neutral theory, in the strict sense, assumes that most mutants are either strongly deleterious or wholly neutral, with only the latter contributing to divergence or polymorphism. This assumption yields tractable equations, and easily interpretable results, but is almost certainly unrealistic. The nearly neutral theory, by contrast, assumes that selection coefficients are drawn from a continuous range, with a large class of mildly deleterious mutants—an assumption that has a great deal of empirical support (e.g., Eyre-Walker et al. 2002, 2006; Piganeau and Eyre-Walker 2003; Yampolsky et al. 2005; Loewe and Charlesworth 2006; Loewe et al. 2006; Eyre-Walker and Keightley 2007).

When selection coefficients are drawn from a continuous distribution, the equations for many quantities of interest involve an integral over this distribution, and this has some unfortunate consequences. First, on a purely practical level, when equations are implemented in a likelihood framework for parameter estimation, the estimation procedure must involve numerical integration, and this can impose a significant computational burden, especially when multidimensional integrals are required (e.g., Nielsen and Yang 2003; Williamson et al. 2004). Second, the inclusion of a continuous distribution makes the equations more difficult to understand, and the predicted parameter dependencies of important quantities less transparent, than under strict neutrality. Finally, and more fundamentally, there is continuing debate about the functional form of the distribution of selection coefficients in nature (e.g., Nielsen and Yang 2003; Loewe and Charlesworth 2006; Eyre-Walker and Keightley 2007), and this calls into question the generality of conclusions reached by imposing an arbitrarily chosen functional form (Tachida 1996; Eyre-Walker 2002; Sawyer et al. 2003; Loewe and Charlesworth 2006; Woodhams 2006; Eyre-Walker and Keightley 2007).

Here we investigate the expected levels of divergence and polymorphism under the nearly neutral theory, when selection coefficients are drawn from a continuous distribution. The study has three main aims. First, it is shown that, for some commonly used distributions, the relevant likelihood equations can be written in terms of special functions; this means that the quantities of interest can be calculated rapidly without the need for numerical integration. Second, approximate forms of these expressions are shown to follow directly from the definitions of the special functions, and these approximations show clearly the leading dependencies on the parameters of biological interest. Finally, results from some different distributions of selection coefficients are compared, to test the robustness of the conclusions.

## Expected Levels of Polymorphism and Divergence

Consider, first, expected levels of polymorphism and divergence at a collection of independent sites, when all mutants are subject to a common strength of selection. These results were derived in detail by Kimura (1962, 1969) and others (Ewens 1979; Sawyer and Hartl 1992; Hartl et al. 1994). Here we just give brief heuristic derivations.

*d*, along a lineage of

*t*generations, at a site where all mutants have the same selection coefficient,

*s*, is the product of the number of mutants expected to appear, and their probability of reaching fixation, and so takes the form

*N*is the census population size,

*N*

_{e}the effective population size,

*μ*the mutation rate per generation, and \( \pi \,(s,\,N_{e} ,\,N) \) the fixation probability. (Both here and elsewhere, we use a zero subscript to indicate a model where all mutants have the same selection coefficient). The expected level of polymorphism at the site is

In the last equation, the representation of \( p_{0} (\gamma ) \) as an infinite sum follows from expanding the term \( e^{ - \gamma (1 - x)} \) in powers of \( - \gamma (1 - x) \) and using \( \int_{0}^{1} {x^{n - 1} (1 - x)^{m - 1} dx = B(n,m)} ,\) which is the beta function (Abramowitz and Stegun 1965), which for integer arguments is \( B(n,m) = (n - 1)!(m - 1)!/(n + m - 1)!. \)

The following sections evaluate these expressions for some important forms of \( F_{i} (\gamma). \)

### Strict Neutrality

*f*of mutants is selectively neutral, with \( \gamma = 0, \) while the remaining proportion, \( (1 - f), \) is severely deleterious, and so contribute nothing to either divergence or polymorphism. With these assumptions, we reproduce standard results (Kimura 1983):

### Single-Sided Gamma Distribution

*β*. The parameter

*β*is related to the coefficient of variation of the distribution via CV (\( \gamma \)) =

*β*

^{−1/2}, and to the excess kurtosis via \( \kappa \left( \gamma \right) = 6/\beta. \) This distribution was used in the earliest work on the nearly neutral theory by Ohta (1977), who used the single-sided exponential distribution (equivalent to setting

*β*= 1 in Eq. 14), and by Kimura (1979), who introduced the arbitrary shape parameter,

*β*.

#### Exact Results

*j*, and so for numerical calculation, approximations of arbitrary accuracy can be obtained by truncating the series at a suitable point.

*β*, but

*β*< 1 requires analytic continuation (Fine 1951), and for

*β*= 1 (Ohta 1977), we use \( \lim_{\beta \to 1} \left[ {\zeta \left( {\beta ,a} \right) - 1/\left( {\beta - 1} \right)} \right] = - \Psi_{0} \left( a \right), \) where \( \Psi_{0} \left( \bullet \right) \) is the digamma function (Abramowitz and Stegun 1965) yielding the exact result,

#### Approximations

To derive approximate expressions for the divergence and polymorphism, we require information about the typical magnitudes of *β* and \( \left| {\overline{\gamma } } \right|.\) Studies fitting gamma distributions to data from various taxa and loci have almost all agreed that *β* < 1 provides the best fit (Keightley 1994; Piganeau and Eyre-Walker 2003; Eyre-Walker et al. 2006; Loewe et al. 2006; Loewe and Charlesworth 2006; but see Nielsen and Yang 2003). This finding accords with the high kurtosis and a high concentration of mutants of negligible effect inferred from more direct approaches to estimating the distribution (Davies et al. 1999; Lynch et al. 1999; Eyre-Walker and Keightley 2007). Estimates of \( \left| {\overline{\gamma } } \right| \) from bioinformatic studies have tended to be large, often of order 100, which again accords with results from more direct approaches (Keightley and Eyre-Walker 1999; Lynch et al. 1999; Loewe et al. 2006) and with broader surveys of selective constraint (Eyre-Walker et al. 2002; Subramanian and Kumar 2006). Seemingly contradictory estimates of order 1 have appeared in the literature (e.g., Bustamante et al. 2002; Sawyer et al. 2003), but this disagreement is only apparent, because these authors estimated a different quantity: the mean value of *N*_{e} |*s*| for mutants eligible to become polymorphic or fixed, i.e., excluding severely deleterious mutants.

*NI >*1 will always hold (Rand and Kann 1996). This reflects the presence of weakly deleterious mutants, which are able to contribute to transient polymorphism but unlikely to reach fixation.

*m =*1 and

*n =*8, such as might be used to model the frequency of singletons in a sample of eight alleles (Hartl et al. 1994). The figure shows that the approximations are good when \( \left| {\overline{\gamma } } \right| \) is not too small and

*β*is not too large, and this is generally consistent with the empirical results discussed above.

*NI*

_{2}is shown to increase linearly with the shape parameter,

*β*, but to be largely independent of the mean strength of selection \( \left| {\overline{\gamma } } \right|. \) Second, the effective population size,

*N*

_{e}, appears solely in the parameters \( \left| {\overline{\gamma } } \right| \) and

*θ*(Eqs. 4 and 5), and so we have

*β*, is very small; this is because when

*β*≪ 1, divergence and polymorphism will be relatively insensitive to changes in

*N*

_{e}, and the neutrality index will remain close to its neutral value of unity. This behavior can be understood by recalling that the kurtosis of the gamma distribution is given by \( \kappa \left( \gamma \right) = 6/\beta, \) and that a highly leptokurtic distribution approximates the situation under strict neutrality, in that mutations are concentrated in a peak around \( \gamma \) = 0 and in a tail of large negative values.

### Partially Reflected Gamma Distribution

#### Results

*H*

_{3}(Eq. 20) are found to be

### Single-Sided Lognormal Distribution

The gamma distributions investigated above are flexible and widely used. They also have limited theoretical justification, arising from simple models of selection on quantitative traits (Martin and Lenormand 2006; Gu 2007a, b). But a theoretical case can also be made for other distributions. For example, it follows from the central limit theorem that normal or lognormal distributions might apply if each mutant has many pleiotropic effects on independent components of fitness (Sawyer et al. 2003; Loewe and Charlesworth 2006).

There is also evidence from bioinformatic studies that the gamma distribution might not be the most appropriate choice. For example, Nielsen and Yang ((2003); Yang et al. 2000) fitted various distributions to divergence data from animal mitochondrial genes and found that the best-fitting distribution was normal, with a class of invariable sites (see also Sawyer et al. 2003). However, the normal distribution was not a significant improvement over a gamma with a shape parameter of *β *≈ 3, and the gamma distribution generally approximates a normal when *β* ≫ 1 (although in this case our approximate results would not apply).

A different conclusion was drawn by Loewe and Charlesworth (2006), who fitted various distributions to polymorphism data and estimates of the lethal mutation rate from *Drosophila*. They found that a normal distribution could not adequately fit the polymorphism data and that a gamma distribution regularly underpredicted the lethal mutation rate. A lognormal distribution, by contrast, gave a good fit to both kinds of data (although, again, its improvement over the gamma was not formally significant).

Because of the success of the lognormal distribution in the study by Loewe and Charlesworth (2006), and to investigate the robustness of the conclusions reached above, we now examine expected levels of divergence and polymorphism when scaled selection coefficients are lognormally distributed.

^{2}, which is the variance of the associated normal distribution, functions like the shape parameter,

*β*, of the gamma distribution, with the coefficient of variation and excess kurtosis both increasing with σ

^{2}. In detail, we have \( {\text{CV(}}\gamma )\,=\,{\text{(e}}^{{\sigma^{2} }} - 1 )^{1/2} ,\,{\text{and}}\,\kappa (\gamma ) = \left[ {\left| {\overline{\gamma } } \right|({\text{e}}^{{\sigma^{2} }} - 1)} \right]^{ - 4} ( {\text{e}}^{{6\sigma^{2} }} - 4 {\text{e}}^{{3\sigma^{2} }}\,+\,{\text{6e}}^{{\sigma^{2} }} - 3 ), \) which, unlike the equivalent expression for the gamma distribution, depends on the mean as well as the shape parameter.

Equation 35 shows that when ln \( \left| {\overline{\gamma } } \right| \) and σ^{2} are very close in value, then linear approximations relating ln(*d*_{i}/*μt*) to ln \( \left | {\overline{\gamma } } \right| \) are quite similar in the gamma and lognormal cases, with the shape parameters governing the slopes. But in the lognormal case, this linear approximation will be accurate over a very limited range of parameter space, and so the slope of the relationship will vary with *N*_{e} in general. This is confirmed in Fig. 2, where exact results are shown, with σ^{2} values chosen to match the curves in Fig. 1. In addition to the curvature in the plots for the divergence and polymorphism, it is clear that the neutrality index does not approach a constant value when selection coefficients are lognormally distributed but, instead, continues to increase with \( \left| {\overline{\gamma } } \right|, \) and therefore *N*_{e}.

## Discussion

We have derived expressions for the expected levels of divergence and polymorphism under the nearly neutral theory, with the assumption that the distribution of scaled selection coefficients is either gamma (containing only deleterious mutants) or partially reflected gamma (including back mutations of beneficial effect). These results have been given in exact forms (Eqs. 17–21 and 29–31) which can be calculated rapidly and accurately, without the need for numerical integration. Results have also been presented in approximate forms (Eqs. 23–27 and 32–33), which show clearly the leading dependencies on important parameters.

*β*is the shape parameter of the gamma distribution and K is a constant determined by the way in which polymorphism is measured (see Eqs. 3, 24, and 33). Under these assumptions, therefore, the neutrality index is quite independent of the strength and efficacy of selection on deleterious mutants, and this has a number of interesting implications.

For example, Presgraves (2005) studied a set of 98 protein-coding loci from *Drosophila melanogaster* and showed that the neutrality index correlated negatively with the local recombination rate. The level of recombination is an important determinant of local *N*_{e} values, and so this correlation was interpreted as an effect of within-genome variation in the efficacy of selection (Presgraves 2005). Equation 36 allows us to make a further inference, because it shows that if deleterious mutations are gamma distributed, with or without back mutation, then a correlation between *NI* and *N*_{e} cannot be attributed to mildly deleterious mutants alone. However, the correlation could be explained if a nonnegligible fraction of substitutions was strongly adaptive. (This can be shown by adding a constant number of adaptive substitutions to the divergence, before calculating the neutrality index as in
Eq. 24 or 33). The conclusion that adaptive substitutions would create a dependency of the neutrality index on *N*_{e} follows under quite general conditions. For example, it does not depend on the rate of adaptive substitution itself increasing with *N*_{e} (Gillespie 2001), nor does it rely on the direct effects of genetic hitchhiking – although both effects would increase the reported correlation (Gillespie 2001). If Presgraves’ (2005) result does indeed imply high rates of adaptive substitution, then this would be consistent with other lines of evidence suggesting widespread adaptive substitution in *D. melanogaster* (Eyre-Walker 2006).

*N*

_{e}remained constant throughout the divergence, and that polymorphism is at equilibrium. However, because we have derived the leading dependencies of all quantities on

*N*

_{e}(Eqs. 25–27), our results can also show how certain tests of the neutral theory are misled by demographic change. Consider, for example, the behavior of the neutrality index under a simple model of population expansion. Let us assume that

*N*

_{e}took a single constant value for a proportion 0 ≤

*q*≤ 1 of the total divergence time, and then increased by a factor

*z*, to

*zN*

_{e}, for the remaining period. This higher

*N*

_{e}value will then govern the levels of polymorphism observed. Under this scenario, our results show that the neutrality index is expected to be

*z*is large or

*q*close to unity), then the neutrality index is expected to be <1. This has important implications, because

*NI <*1 is generally taken as a signature of widespread adaptive substitution (McDonald and Kreitman 1991; Rand and Kann 1996). Such demographic artifacts in tests of positive selection have previously been investigated numerically (e.g., Eyre-Walker 2002; Charlesworth and Eyre-Walker 2007), but Eq. 37 allows them to be studied analytically. In addition, because Eq. 37 contains just three parameters, only one of which (

*β*) could be locus-specific, it could also be used to test the hypothesis of population expansion in a formal likelihood framework.

While expressions such as Eqs. 36 and 37 are pleasingly simple, another conclusion of the present work is that such relationships might not hold unless the distribution of mutant effects is adequately described by a gamma distribution. In particular, we have shown that the behavior of polymorphism and divergence is qualitatively different when selection coefficients are lognormally distributed (Fig. 2, Eq. 35).

It is important to ask, therefore, how far empirical evidence allows us to choose between the various functional forms. As has been mentioned, the lognormal was preferred over the gamma distribution in the recent study by Loewe and Charlesworth (2006), and its greater success was attributed to its weightier tail, with a correspondingly higher concentration of lethal mutants (Loewe and Charlesworth 2006). This feature of the distribution has wide empirical support, both from bioinformatic approaches (Nielsen and Yang 2003; Sawyer et al. 2003; Loewe and Charlesworth 2006), and from mutation accumulation studies (Keightley 1994; Elena et al. 1998; Sanjuan et al. 2004; Eyre-Walker and Keightley 2007).

But high concentrations of severely deleterious mutants can be modeled using distributions other than the lognormal, by adding a parameter such as *f* that appears in Eqs. 11 and 12 (Sawyer et al. 2003; Nielsen and Yang 2003; Eyre-Walker et al. 2006). And while less elegant than fitting a continuous distribution to all mutants, this approach may be more realistic, as direct studies have sometimes found the distribution to be bimodal, containing peaks of both weakly and strongly deleterious effects (Elena et al. 1998; Sanjuan et al. 2004; Eyre-Walker and Keightley 2007). Furthermore, Nielsen and Yang (2003) and Eyre-Walker et al. (2006) found that the inclusion of such a parameter made little difference to estimates of the gamma shape parameter, *β*, obtained from divergence and polymorphism data.

However, the qualitative differences between Fig. 1 and Fig. 2 cannot be attributed to differences in the proportion of strongly deleterious mutants. Instead, these differences are probably best explained by another feature of the lognormal distribution: its suppression of probability density around the origin, in the region of strict neutrality. Unlike the concentration of effectively lethal mutants, empirical evidence of this aspect of the distribution is much more equivocal. Loewe and Charlesworth’s (2006) own method could not reject models where a discrete class of strictly neutral mutants was added to the lognormal, and experimental approaches have limited power to resolve very small selection coefficients (Eyre-Walker and Keightley 2007).

Because of this uncertainty, at present we have reason to remain skeptical of any quantitative conclusion that relies on the assumption that the distribution of selection coefficients can be well described by any single functional form (Tachida 1996; Eyre-Walker 2002; Sawyer et al. 2003; Loewe and Charlesworth 2006; Woodhams 2006; Eyre-Walker and Keightley 2007). Clearly, we also require a more detailed knowledge of the distribution of mutant effects in nature. Numerical methods, such as those reported here, combined with model selection techniques, should aid progress toward this goal.

## Acknowledgments

It is a pleasure to thank Laurence Loewe, Andrea Betancourt, and Fraser Lewis for their help with this work.