Journal of Molecular Evolution

, Volume 67, Issue 4, pp 418–426

Divergence and Polymorphism Under the Nearly Neutral Theory of Molecular Evolution


    • Institute of Evolutionary Biology, School of Biological SciencesUniversity of Edinburgh
    • Centre for the Study of Evolution, School of Life SciencesUniversity of Sussex
  • Adam Eyre-Walker
    • Centre for the Study of Evolution, School of Life SciencesUniversity of Sussex
  • David Waxman
    • Centre for the Study of Evolution, School of Life SciencesUniversity of Sussex

DOI: 10.1007/s00239-008-9146-9

Cite this article as:
Welch, J.J., Eyre-Walker, A. & Waxman, D. J Mol Evol (2008) 67: 418. doi:10.1007/s00239-008-9146-9


The nearly neutral theory attributes most nucleotide substitution and polymorphism to genetic drift acting on weakly selected mutants, and assumes that the selection coefficients for these mutants are drawn from a continuous distribution. This means that parameter estimation can require numerical integration, and this can be computationally costly and inaccurate. Furthermore, the leading parameter dependencies of important quantities can be unclear, making results difficult to understand. For some commonly used distributions of mutant effects, we show how these problems can be avoided by writing equations in terms of special functions. Series expansion then allows for their rapid calculation and, also, illuminates leading parameter dependencies. For example, we show that if mutants are gamma distributed, the neutrality index is largely independent of the effective population size. However, we also show that such results are not robust to misspecification of the functional form of distribution. Some implications of these findings are then discussed.


Genetic driftDistribution of mutant effectsNeutrality indexSpecial functions


The neutral and nearly neutral theories of molecular evolution placed interpopulation divergence and intrapopulation polymorphism within a single explanatory framework, attributing both to the action of genetic drift (Kimura 1983; Ohta and Gillespie 1996). While alternative theories, emphasizing positive selection, and/or linkage effects, continue to receive attention (e.g., Gillespie 1991, 2001), the drift-based theories remain central to the study of molecular evolution.

Neutral theory, in the strict sense, assumes that most mutants are either strongly deleterious or wholly neutral, with only the latter contributing to divergence or polymorphism. This assumption yields tractable equations, and easily interpretable results, but is almost certainly unrealistic. The nearly neutral theory, by contrast, assumes that selection coefficients are drawn from a continuous range, with a large class of mildly deleterious mutants—an assumption that has a great deal of empirical support (e.g., Eyre-Walker et al. 2002, 2006; Piganeau and Eyre-Walker 2003; Yampolsky et al. 2005; Loewe and Charlesworth 2006; Loewe et al. 2006; Eyre-Walker and Keightley 2007).

When selection coefficients are drawn from a continuous distribution, the equations for many quantities of interest involve an integral over this distribution, and this has some unfortunate consequences. First, on a purely practical level, when equations are implemented in a likelihood framework for parameter estimation, the estimation procedure must involve numerical integration, and this can impose a significant computational burden, especially when multidimensional integrals are required (e.g., Nielsen and Yang 2003; Williamson et al. 2004). Second, the inclusion of a continuous distribution makes the equations more difficult to understand, and the predicted parameter dependencies of important quantities less transparent, than under strict neutrality. Finally, and more fundamentally, there is continuing debate about the functional form of the distribution of selection coefficients in nature (e.g., Nielsen and Yang 2003; Loewe and Charlesworth 2006; Eyre-Walker and Keightley 2007), and this calls into question the generality of conclusions reached by imposing an arbitrarily chosen functional form (Tachida 1996; Eyre-Walker 2002; Sawyer et al. 2003; Loewe and Charlesworth 2006; Woodhams 2006; Eyre-Walker and Keightley 2007).

Here we investigate the expected levels of divergence and polymorphism under the nearly neutral theory, when selection coefficients are drawn from a continuous distribution. The study has three main aims. First, it is shown that, for some commonly used distributions, the relevant likelihood equations can be written in terms of special functions; this means that the quantities of interest can be calculated rapidly without the need for numerical integration. Second, approximate forms of these expressions are shown to follow directly from the definitions of the special functions, and these approximations show clearly the leading dependencies on the parameters of biological interest. Finally, results from some different distributions of selection coefficients are compared, to test the robustness of the conclusions.

Expected Levels of Polymorphism and Divergence

Consider, first, expected levels of polymorphism and divergence at a collection of independent sites, when all mutants are subject to a common strength of selection. These results were derived in detail by Kimura (1962, 1969) and others (Ewens 1979; Sawyer and Hartl 1992; Hartl et al. 1994). Here we just give brief heuristic derivations.

The expected divergence, d, along a lineage of t generations, at a site where all mutants have the same selection coefficient, s, is the product of the number of mutants expected to appear, and their probability of reaching fixation, and so takes the form
$$ d_{0} \, = \,2N\mu t\, \times \,\pi \,(s,\,N_{e} ,\,N) $$
where N is the census population size, Ne the effective population size, μ the mutation rate per generation, and \( \pi \,(s,\,N_{e} ,\,N) \) the fixation probability. (Both here and elsewhere, we use a zero subscript to indicate a model where all mutants have the same selection coefficient). The expected level of polymorphism at the site is
$$ p_{0} \, = \,4N_{e} \mu \int_{0}^{1} {\psi (x;\,s,\,N_{e} ,\,N)k(x)dx} $$
where \( \psi (x;\,s,\,N_{e} ,\,N) \)\( dx \) is the probability of mutant alleles segregating in the population between frequency \( x \) and frequency \( x + dx, \) and the function \( k(x) \) describes the sampling of alleles from the population. The exact form of \( k(x) \) will vary according to the measure of polymorphism that is required (e.g., mean heterozygosity, number of singletons, or total number of polymorphic sites). However, most quantities of interest can be represented by a sampling function comprising one or more terms of the form
$$ k(x)\, \propto \,x^{n} \,(1 - x)^{m} $$
where \( n \) and \( m \) are nonnegative integers. For example, to model heterozygosity, we would set \( n\, = \,m = 1 \) and specify \( k(x)\, = 2x(1 - x) \) (e.g., Kimura 1979). Sawyer and Hartl (1992) and Hartl et al. (1994) describe more complex sampling functions of the same form but with different values of \( n \) and \( m, \) and Nielsen et al. (2004) describe corrections that may be appropriate for real-world data. All results below use the generalized form of the sampling function, Eq. 3, and so apply to each of these particular cases.
Approximate forms of the remaining functions, \( \pi (s,N_{e} ,\,N) \) and \( \psi (x;s,N_{e} ,\,N) ,\) were obtained using diffusion analysis by Kimura (1962, 1969, Eqs. 13 and 37). Using these results allows us to write the expressions for \( d_{0} \) and \( p_{0} \) in terms of the scaled parameters:
$$ \gamma \equiv \,4N_{e} s $$
$$ \theta \equiv \,4N_{e} \mu $$
$$ d_{0} (\gamma ) = \mu t\frac{\gamma }{{1 - e^{ - \gamma } }} $$
$$ p_{0} (\gamma ) = \theta \int_{0}^{1} {\frac{{1 - e^{ - \gamma (1 - x)} }}{{1 - e^{ - \gamma } }}} x^{n - 1} (1 - x)^{m - 1} dx $$
$$\quad = \frac{\theta }{{1 - e^{ - \gamma } }} - \sum\limits_{j = 1}^{\infty } {\frac{B(n,m + j)}{j!}} ( - \gamma )^{j} $$

In the last equation, the representation of \( p_{0} (\gamma ) \) as an infinite sum follows from expanding the term \( e^{ - \gamma (1 - x)} \) in powers of \( - \gamma (1 - x) \) and using \( \int_{0}^{1} {x^{n - 1} (1 - x)^{m - 1} dx = B(n,m)} ,\) which is the beta function (Abramowitz and Stegun 1965), which for integer arguments is \( B(n,m) = (n - 1)!(m - 1)!/(n + m - 1)!. \)

To relax the assumption that all mutants are subject to the same strength of selection, we generalize Eqs. 6 and 7 by incorporating a distribution of scaled selection coefficients, denoted \(F_{i}(\gamma){:}\)
$$ d_{i} \, = \,\int_{ - \infty }^{\infty } {d_{0} (\gamma )F_{i} (\gamma )d\gamma} $$
$$ p_{i} \, = \,\int_{ - \infty }^{\infty } {p_{0} (\gamma )F_{i} (\gamma )d\gamma } $$

The following sections evaluate these expressions for some important forms of \( F_{i} (\gamma). \)

Strict Neutrality

To derive results for strict neutrality (Kimura 1983), assume that a proportion f of mutants is selectively neutral, with \( \gamma = 0, \) while the remaining proportion, \( (1 - f), \) is severely deleterious, and so contribute nothing to either divergence or polymorphism. With these assumptions, we reproduce standard results (Kimura 1983):
$$ d_{1} \, = \,\mu t\,f $$
$$ p_{1} \, = \theta \,f\,B(n,m + 1) $$
with the subscript 1 denoting strict neutrality. The extent to which results from other distributions deviate from these neutral expectations can be quantified using the “neutrality index” (Rand and Kann 1996). The index, \( N\,I_{i}, \) is defined as the ratio of polymorphism to divergence when both quantities are standardized by their strictly neutral equivalents:
$$ N\,I_{i} \, = \,\frac{{p_{i} }}{{p_{1} }}\frac{{d_{1} }}{{d_{i} }}, $$
and so is equal to unity under strict neutrality.

Single-Sided Gamma Distribution

The distribution most commonly used to describe deleterious mutations is the single-sided gamma distribution:
$$ F_{2} \left( {\gamma ;\left| {\overline{\gamma } } \right|,\beta } \right) = \frac{{\left| \gamma \right|^{\beta - 1} e^{{ - \left| \gamma \right|\beta /\left| {\overline{\gamma } } \right|}} \left( {\beta /\left| {\overline{\gamma } } \right|} \right)^{\beta } }}{\Gamma \left( \beta \right)},\quad \gamma \le 0. $$
We have parameterized the distribution in terms of the absolute value of its mean, \( \left| {\overline{\gamma } } \right|, \) and a dimensionless shape parameter, β. The parameter β is related to the coefficient of variation of the distribution via CV (\( \gamma \)) = β−1/2, and to the excess kurtosis via \( \kappa \left( \gamma \right) = 6/\beta. \) This distribution was used in the earliest work on the nearly neutral theory by Ohta (1977), who used the single-sided exponential distribution (equivalent to setting β = 1 in Eq. 14), and by Kimura (1979), who introduced the arbitrary shape parameter, β.

Exact Results

From Eqs. 9 and 14, we have
$$ d_{2} = \mu t\int_{0}^{\infty } {\frac{\gamma }{{e^{\gamma } - 1}}} \frac{{\gamma^{\beta - 1} e^{{ - \gamma \beta /\left| {\overline{\gamma } } \right|}} \left( {\beta /\left| {\overline{\gamma } } \right|} \right)^{\beta } }}{\Gamma \left( \beta \right)}d\gamma $$
which can be expressed in terms of a special function. To obtain this we use
$$ \left( {e^{\gamma } - 1} \right)^{ - 1}\,=\,e^{ - \gamma } \sum\limits_{j = 0}^{\infty } {e^{ - j\gamma } } $$
and interchange the order of summation and integration to obtain
$$ \begin{aligned}d_{2}& = \mu t\frac{{\left({\beta /\left| {\overline{\gamma } } \right|}\right)^{\beta }}}{\Gamma \left( \beta \right)}\sum\limits_{j = 0}^{\infty }{\int_{0}^{\infty }{e^{{ - \gamma \left[ {1 + j + \beta /\left|{\overline{\gamma } } \right|}\right]}} } } \gamma^{\beta }d\gamma \\ & = \mu t\frac{{\Gamma \left( {\beta + 1}\right)}}{\Gamma \left( \beta \right)}\left( {\beta /\left|{\overline{\gamma } } \right|}\right)^{\beta } \sum\limits_{j =0}^{\infty } {\left[ {1 + j +\beta /\left| {\overline{\gamma } }\right|} \right]}^{{ - \left( {1 + \beta } \right)}} \\ & = \mu t\beta \left( {\beta /\left| {\overline{\gamma } } \right|}\right)^{\beta } \zeta \left( {1 + \beta ,1 + \beta /\left|{\overline{\gamma } } \right|} \right).\\ \end{aligned} $$
$$ \zeta \left( {\Upsilon ,\alpha } \right) = \sum\limits_{j = 0}^{\infty } {\left( {\alpha + j} \right)^{ - \Upsilon } } $$
is the Hurwitz zeta function (Abramowitz and Stegun 1965). Equation 17, first obtained by Kimura (1979, Eq. 8), can be calculated rapidly using various well-established series approximations. The relevant numerical methods are implemented in commercially available symbolic mathematics software and in publicly available software libraries, such as the GNU Scientific Library (Galassi et al. 2006), which is also implemented in the R environment (R Development Core Team 2006).
A result similar to Eq. 17 can be obtained for polymorphism in terms of the infinite series:
$$ p_{2} = \theta \left( {\beta /\left| {\overline{\gamma } } \right|}\right)^{\beta } \sum\limits_{j = 1}^{\infty } {\frac{{B\left( {n,m + j}\right)}}{jB(\beta ,j)}} \zeta \left( {\beta + j,1+ \beta /\left| {\overline{\gamma } } \right|} \right). $$
The terms of this sum decrease in magnitude with j, and so for numerical calculation, approximations of arbitrary accuracy can be obtained by truncating the series at a suitable point.
An alternative is to replace the double integral of Eq. 10 with a one-dimensional integral. This can be done by defining a function \( H_{i} \left( x \right) \):
$$ H_{i} \left( x \right) \equiv \int_{ - \infty }^{\infty } {\frac{{1 - e^{{ - \gamma \left( {1 - x} \right)}} }}{{1 - e^{ - \gamma } }}} F_{i} \left( \gamma \right)d\gamma $$
such that \( p_{i} = \theta \int_{0}^{1} {H\left( x \right)} x^{n - 1} \left( {x} \right)^{m - 1} dx. \) For the single-sided gamma distribution, this function can be calculated in closed form:
$$ H_{2} \left( x \right) = \left( {\beta /\left| {\overline{\gamma } } \right|} \right)^{\beta } \left[ {\zeta \left( {\beta ,\beta /\left| {\overline{\gamma } } \right| + x} \right) - \zeta \left( {\beta ,\beta /\left| {\overline{\gamma } } \right| + 1} \right)} \right]. $$
This exact result is defined for all positive β, but β < 1 requires analytic continuation (Fine 1951), and for β = 1 (Ohta 1977), we use \( \lim_{\beta \to 1} \left[ {\zeta \left( {\beta ,a} \right) - 1/\left( {\beta - 1} \right)} \right] = - \Psi_{0} \left( a \right), \) where \( \Psi_{0} \left( \bullet \right) \) is the digamma function (Abramowitz and Stegun 1965) yielding the exact result,
$$ \mathop {\lim }\limits_{\beta \to 1} H_{2} \left( x \right) = \left| {\overline{\gamma } } \right|^{ - 1} \left[ {\Psi_{0} \left( {1/\left| {\overline{\gamma } } \right| + 1} \right) - \Psi_{0} \left( {1/\left| {\overline{\gamma } } \right| + x} \right)} \right]. $$


To derive approximate expressions for the divergence and polymorphism, we require information about the typical magnitudes of β and \( \left| {\overline{\gamma } } \right|.\) Studies fitting gamma distributions to data from various taxa and loci have almost all agreed that β < 1 provides the best fit (Keightley 1994; Piganeau and Eyre-Walker 2003; Eyre-Walker et al. 2006; Loewe et al. 2006; Loewe and Charlesworth 2006; but see Nielsen and Yang 2003). This finding accords with the high kurtosis and a high concentration of mutants of negligible effect inferred from more direct approaches to estimating the distribution (Davies et al. 1999; Lynch et al. 1999; Eyre-Walker and Keightley 2007). Estimates of \( \left| {\overline{\gamma } } \right| \) from bioinformatic studies have tended to be large, often of order 100, which again accords with results from more direct approaches (Keightley and Eyre-Walker 1999; Lynch et al. 1999; Loewe et al. 2006) and with broader surveys of selective constraint (Eyre-Walker et al. 2002; Subramanian and Kumar 2006). Seemingly contradictory estimates of order 1 have appeared in the literature (e.g., Bustamante et al. 2002; Sawyer et al. 2003), but this disagreement is only apparent, because these authors estimated a different quantity: the mean value of Ne |s| for mutants eligible to become polymorphic or fixed, i.e., excluding severely deleterious mutants.

Together, these studies suggest that \( \beta /\left| {\overline{\gamma } } \right| \ll 1 \) will tend to hold in nature. When this is so, the Hurwitz zeta function in Eq. 17 is well approximated by the Riemann zeta function, \( \zeta \left( {1 + \beta } \right) = \sum\nolimits_{j = 1}^{\infty } {j^{{ - \left( {1 + \beta } \right)}} \approx 1 + \left( {2/3} \right)^{\beta } } /\beta, \) and this yields
$$ d_{2} \approx \mu t\beta^{\beta + 1} \zeta \left( {1 + \beta } \right)\left| {\overline{\gamma } } \right|^{ - \beta } $$
(Kimura 1979; see also Gillespie 1991).
Equation (23) shows that the expected divergence will be approximately loglinear in \( \left| {\overline{\gamma } } \right|, \) with the slope determined by the shape parameter, i.e., that ln\( d_{2} \approx - \beta\,{\text{ln}}\left | {\overline{\gamma}}\right| \,+\,const. \) Furthermore, applying the same approximation to Eq. 19 shows that the same applies to polymorphism. Together, this means that we can approximate the neutrality index, Eq. 13, as follows:
$$ \begin{aligned} NI_{2} \approx & 1 + \sum\limits_{j = 2}^{\infty } {\frac{{\zeta \left( {\beta + j} \right)}}{{j\beta B\left( {\beta ,j} \right)\zeta \left( {\beta + 1} \right)}}\frac{{B\left( {m + j,n} \right)}}{{B\left( {m + 1,n} \right)}}} \\ \approx 1 + \beta \sum\limits_{j = 2}^{\infty } {\frac{\zeta (j)}{j}\frac{{B\left( {m + j,n} \right)}}{{B\left( {m + 1,n} \right)}}} + o(\beta^{2} ) \\ \end{aligned} $$
Equation 24 confirms the general finding that under the assumptions of the nearly neutral theory, NI > 1 will always hold (Rand and Kann 1996). This reflects the presence of weakly deleterious mutants, which are able to contribute to transient polymorphism but unlikely to reach fixation.
Figure 1 plots the approximate and exact forms of the divergence, polymorphism, and neutrality index, arbitrarily setting the polymorphism sampling parameters (Eq. 3) to m = 1 and n = 8, such as might be used to model the frequency of singletons in a sample of eight alleles (Hartl et al. 1994). The figure shows that the approximations are good when \( \left| {\overline{\gamma } } \right| \) is not too small and β is not too large, and this is generally consistent with the empirical results discussed above.
Fig. 1

Expected values of divergence, polymorphism, and the neutrality index under the assumptions of the nearly neutral theory, with deleterious selection coefficients drawn from a single-sided gamma distribution, Eq. 14. Solid lines show exact results obtained from Eqs. 1113 and 17 and by numerical integration from Eqs. 10 and 2022. Dashed lines show analytical approximations obtained from Eq. 23, and from Eqs. 8 and 24, truncating the series expansions after three terms in both cases. Results are shown for four different values of the shape parameter, β, and are plotted as a function of the mean absolute scaled selection coefficient, Eqs. 4 and 14

From our approximations, Eqs. 23 and 24, some leading parameter dependencies follow directly. First, the neutrality index, NI2 is shown to increase linearly with the shape parameter, β, but to be largely independent of the mean strength of selection \( \left| {\overline{\gamma } } \right|. \) Second, the effective population size, Ne, appears solely in the parameters \( \left| {\overline{\gamma } } \right| \) and θ (Eqs. 4 and 5), and so we have
$$ d_{2}\,\propto\,N_{e}^{ - \beta } $$
$$ p_{2}\,\propto\,N_{e}^{1 - \beta } $$
$$ NI_{2} \approx {\text{independent}}\,{\text{of }}N_{e} $$
Equations (24) and (25)–(26) also show that the differences between neutrality and near neutrality are least marked when the shape parameter, β, is very small; this is because when β ≪ 1, divergence and polymorphism will be relatively insensitive to changes in Ne, and the neutrality index will remain close to its neutral value of unity. This behavior can be understood by recalling that the kurtosis of the gamma distribution is given by \( \kappa \left( \gamma \right) = 6/\beta, \) and that a highly leptokurtic distribution approximates the situation under strict neutrality, in that mutations are concentrated in a peak around \( \gamma \) = 0 and in a tail of large negative values.

Partially Reflected Gamma Distribution

Single-sided distributions, such as Eq. 14, are unrealistic in that they contain no beneficial mutations and, so, embody the implicit assumption that populations will degenerate indefinitely (Gillespie 1995; Tachida 1996). A related model that avoids this assumption is the “partially reflected” gamma distribution introduced by Piganeau and Eyre-Walker ((2003); Bulmer 1991). This distribution was derived from a mechanical model of evolution, in which deleterious mutations are generated from a gamma distribution, but in which back mutations from the deleterious state to the wild type are also permitted. These assumptions lead to the following equilibrium distribution of scaled selection coefficients:
$$ F_{3} \left( {\gamma ;\left| {\overline{\gamma } } \right|,\beta } \right) = \frac{{\left| \gamma \right|^{\beta - 1} e^{{ - \left| \gamma \right|\beta /\left| {\overline{\gamma } } \right|}} \left( {\beta /\left| {\overline{\gamma } } \right|} \right)^{\beta } }}{{\Gamma \left( \beta \right)\left( {1 + e^{\gamma } } \right)}} $$
which applies to both negative and positive \( \gamma \) (see plots in Piganeau and Eyre-Walker 2003). While Eq. 28 has been parameterized to match the single-sided Eq. 14, \( \left| {\overline{\gamma } } \right| \) now represents the mean selection coefficient only when all loci are fixed for the beneficial allele, but because only weakly deleterious mutants can reach fixation, it will differ little from the true mean of Eq. 28.


The divergence for the partially reflected gamma distribution is
$$ \begin{aligned} d_{2} & = 2\mu t\frac{{\left( {\beta /\left| {\overline{\gamma } } \right|} \right)^{\beta } }}{\Gamma \left( \beta \right)}\int_{0}^{\infty } {\frac{{\gamma^{\beta } e^{{ - \gamma \beta /\left| {\overline{\gamma } } \right|}} }}{{e^{\gamma } - e^{ - \gamma } }}} d\gamma \\ & = \mu t\beta \left( {\frac{\beta }{{2\left| {\overline{\gamma } } \right|}}} \right)^{\beta } \zeta \left( {\beta + 1,(1 + \beta /\left| {\overline{\gamma } } \right|)/2} \right). \\ \end{aligned} $$
where we used the expansion \( (e^{\gamma } - e^{ - \gamma } )^{ - 1} = e^{ - \gamma } \sum\nolimits_{j = 0}^{\infty } {e^{ - 2j\gamma } }. \) By similar means, expected polymorphism and the function H3 (Eq. 20) are found to be
$$ p_{3} = \theta \left( {\frac{\beta }{{2\left| {\overline{\gamma } } \right|}}} \right)^{\beta } \sum\limits_{j = 0}^{\infty } {\frac{{2^{1 - j}B (n,m + j)}}{jB(\beta ,j)}} \,\zeta \,\left( {\beta + j,\left( {1 + \beta /\left| {\overline{\gamma } } \right|} \right)/2} \right) $$
$$ H_{3} (x) = \left( {\frac{\beta }{{2\left| {\overline{\gamma } } \right|}}} \right)^{\beta } \left[ {\zeta \left( {\beta ,(\beta /\left| {\overline{\gamma } } \right| + x)/2} \right) - \zeta \left( {\beta ,1 + (\beta /\left| {\overline{\gamma } } \right| - x)/2} \right)} \right] $$
Approximations for these expressions can be derived using \( \zeta (x,1/2 + \varepsilon ) \approx \zeta (x,1/2) = (2^{x} - 1)\zeta (x) \) (Truesdell 1950). Then, following the same procedures used for the single-sided case, we find
$$ d_{3} \approx (2 - 2^{ - \beta } )d_{2} $$
$$ NI_{3} \approx 1 + \beta \sum\limits_{j = 2}^{\infty } {\frac{{2(1 - 2^{ - j} )\zeta (j)}}{j}} \frac{B(m + j,n)}{B(m + 1,n)} $$
As such, the leading parameter dependencies are identical to the single-sided cases. This shows that the conclusions above are robust to the inclusion in the model of weakly beneficial substitutions.

Single-Sided Lognormal Distribution

The gamma distributions investigated above are flexible and widely used. They also have limited theoretical justification, arising from simple models of selection on quantitative traits (Martin and Lenormand 2006; Gu 2007a, b). But a theoretical case can also be made for other distributions. For example, it follows from the central limit theorem that normal or lognormal distributions might apply if each mutant has many pleiotropic effects on independent components of fitness (Sawyer et al. 2003; Loewe and Charlesworth 2006).

There is also evidence from bioinformatic studies that the gamma distribution might not be the most appropriate choice. For example, Nielsen and Yang ((2003); Yang et al. 2000) fitted various distributions to divergence data from animal mitochondrial genes and found that the best-fitting distribution was normal, with a class of invariable sites (see also Sawyer et al. 2003). However, the normal distribution was not a significant improvement over a gamma with a shape parameter of β ≈ 3, and the gamma distribution generally approximates a normal when β ≫ 1 (although in this case our approximate results would not apply).

A different conclusion was drawn by Loewe and Charlesworth (2006), who fitted various distributions to polymorphism data and estimates of the lethal mutation rate from Drosophila. They found that a normal distribution could not adequately fit the polymorphism data and that a gamma distribution regularly underpredicted the lethal mutation rate. A lognormal distribution, by contrast, gave a good fit to both kinds of data (although, again, its improvement over the gamma was not formally significant).

Because of the success of the lognormal distribution in the study by Loewe and Charlesworth (2006), and to investigate the robustness of the conclusions reached above, we now examine expected levels of divergence and polymorphism when scaled selection coefficients are lognormally distributed.

The single-sided lognormal distribution is nonzero for \( \gamma \) < 0 and in this range has the form
$$ F_{4} (\gamma ;\left| {\overline{\gamma } } \right|,\sigma^{2} ) = \frac{1}{{\sqrt {2\pi \sigma^{2} \left| \gamma \right|} }}\exp \left( {\frac{{[{\text{ln}}\left| \gamma \right| - {\text{ln}}\left| {\overline{\gamma } } \right|] + \sigma^{2} /2]^{2} }}{{2\sigma^{2} }}} \right). $$
We have parameterized this distribution with its mean, \( \left| {\overline{\gamma } } \right|, \) but it is commonly parameterized with the mean of the associated normal distribution, \( E[\hbox{ln}\left| \gamma \right|] = \hbox{ln}\left| {\overline{\gamma } } \right| - \sigma^{2} /2, \) or in other ways (Loewe and Charlesworth 2006). The second parameter, σ2, which is the variance of the associated normal distribution, functions like the shape parameter, β, of the gamma distribution, with the coefficient of variation and excess kurtosis both increasing with σ2. In detail, we have \( {\text{CV(}}\gamma )\,=\,{\text{(e}}^{{\sigma^{2} }} - 1 )^{1/2} ,\,{\text{and}}\,\kappa (\gamma ) = \left[ {\left| {\overline{\gamma } } \right|({\text{e}}^{{\sigma^{2} }} - 1)} \right]^{ - 4} ( {\text{e}}^{{6\sigma^{2} }} - 4 {\text{e}}^{{3\sigma^{2} }}\,+\,{\text{6e}}^{{\sigma^{2} }} - 3 ), \) which, unlike the equivalent expression for the gamma distribution, depends on the mean as well as the shape parameter.
To understand the expected divergence under the lognormal distribution, consider the following crude approximation:
$$ \begin{aligned} {\text{ln(}}d_{4} /\mu t ) \approx & {\text{ln}}\left( {\int_{0}^{1} {F_{4} (\gamma ;\left| {\overline{\gamma } } \right|,\sigma^{2} )d\gamma } } \right) \\ \approx - \frac{{{\text{ln}}\left| {\overline{\gamma } } \right|}}{\sigma }\sqrt {\frac{2}{\pi }} \left[ {1 + \frac{{{\text{ln}}\left| {\overline{\gamma } } \right| - \sigma^{2} }}{{\sqrt {2\pi \sigma } }}} \right] + const, \\ \end{aligned} $$
where the constant is \( \sigma /\sqrt {2\pi } \left( {1 - \sigma /\sqrt {2\pi } } \right) - {\text{ln}}2 \) (Fig. 2). This approximation was obtained from Eq. 9 by treating mutants with \( \left| \gamma \right| \) < 1 as strictly neutral, and all others as severely deleterious, and then using a series expansion of the complementary error function (Abramowitz and Stegun 1965).
Fig. 2

Expected values of divergence, polymorphism, and the neutrality index when selection coefficients are drawn from a single-sided lognormal distribution, Eq. 34. Solid lines show exact results obtained by numerical integration, and the dashed line shows the crude approximation of Eq. 35. Other details match Fig. 1

Equation 35 shows that when ln \( \left| {\overline{\gamma } } \right| \) and σ2 are very close in value, then linear approximations relating ln(di/μt) to ln \( \left | {\overline{\gamma } } \right| \) are quite similar in the gamma and lognormal cases, with the shape parameters governing the slopes. But in the lognormal case, this linear approximation will be accurate over a very limited range of parameter space, and so the slope of the relationship will vary with Ne in general. This is confirmed in Fig. 2, where exact results are shown, with σ2 values chosen to match the curves in Fig. 1. In addition to the curvature in the plots for the divergence and polymorphism, it is clear that the neutrality index does not approach a constant value when selection coefficients are lognormally distributed but, instead, continues to increase with \( \left| {\overline{\gamma } } \right|, \) and therefore Ne.


We have derived expressions for the expected levels of divergence and polymorphism under the nearly neutral theory, with the assumption that the distribution of scaled selection coefficients is either gamma (containing only deleterious mutants) or partially reflected gamma (including back mutations of beneficial effect). These results have been given in exact forms (Eqs. 1721 and 2931) which can be calculated rapidly and accurately, without the need for numerical integration. Results have also been presented in approximate forms (Eqs. 2327 and 3233), which show clearly the leading dependencies on important parameters.

A particularly interesting result is the expected value of the neutrality index, Eq. 13, which we have shown to be of the form
$$ NI \approx 1 + \beta K $$
where β is the shape parameter of the gamma distribution and K is a constant determined by the way in which polymorphism is measured (see Eqs. 3, 24, and 33). Under these assumptions, therefore, the neutrality index is quite independent of the strength and efficacy of selection on deleterious mutants, and this has a number of interesting implications.

For example, Presgraves (2005) studied a set of 98 protein-coding loci from Drosophila melanogaster and showed that the neutrality index correlated negatively with the local recombination rate. The level of recombination is an important determinant of local Ne values, and so this correlation was interpreted as an effect of within-genome variation in the efficacy of selection (Presgraves 2005). Equation 36 allows us to make a further inference, because it shows that if deleterious mutations are gamma distributed, with or without back mutation, then a correlation between NI and Ne cannot be attributed to mildly deleterious mutants alone. However, the correlation could be explained if a nonnegligible fraction of substitutions was strongly adaptive. (This can be shown by adding a constant number of adaptive substitutions to the divergence, before calculating the neutrality index as in Eq. 24 or 33). The conclusion that adaptive substitutions would create a dependency of the neutrality index on Ne follows under quite general conditions. For example, it does not depend on the rate of adaptive substitution itself increasing with Ne (Gillespie 2001), nor does it rely on the direct effects of genetic hitchhiking – although both effects would increase the reported correlation (Gillespie 2001). If Presgraves’ (2005) result does indeed imply high rates of adaptive substitution, then this would be consistent with other lines of evidence suggesting widespread adaptive substitution in D. melanogaster (Eyre-Walker 2006).

It is important to note that Eq. 36, like all the results above, assumed demographic stability, i.e., that Ne remained constant throughout the divergence, and that polymorphism is at equilibrium. However, because we have derived the leading dependencies of all quantities on Ne (Eqs. 2527), our results can also show how certain tests of the neutral theory are misled by demographic change. Consider, for example, the behavior of the neutrality index under a simple model of population expansion. Let us assume that Ne took a single constant value for a proportion 0 ≤ q ≤ 1 of the total divergence time, and then increased by a factor z, to zNe, for the remaining period. This higher Ne value will then govern the levels of polymorphism observed. Under this scenario, our results show that the neutrality index is expected to be
$$ NI \approx \frac{1 + \beta K}{{1 + q(z^{\beta } - 1)}}. $$
It follows that when population expansion is substantial and/or recent (i.e., when z is large or q close to unity), then the neutrality index is expected to be <1. This has important implications, because NI < 1 is generally taken as a signature of widespread adaptive substitution (McDonald and Kreitman 1991; Rand and Kann 1996). Such demographic artifacts in tests of positive selection have previously been investigated numerically (e.g., Eyre-Walker 2002; Charlesworth and Eyre-Walker 2007), but Eq. 37 allows them to be studied analytically. In addition, because Eq. 37 contains just three parameters, only one of which (β) could be locus-specific, it could also be used to test the hypothesis of population expansion in a formal likelihood framework.

While expressions such as Eqs. 36 and 37 are pleasingly simple, another conclusion of the present work is that such relationships might not hold unless the distribution of mutant effects is adequately described by a gamma distribution. In particular, we have shown that the behavior of polymorphism and divergence is qualitatively different when selection coefficients are lognormally distributed (Fig. 2, Eq. 35).

It is important to ask, therefore, how far empirical evidence allows us to choose between the various functional forms. As has been mentioned, the lognormal was preferred over the gamma distribution in the recent study by Loewe and Charlesworth (2006), and its greater success was attributed to its weightier tail, with a correspondingly higher concentration of lethal mutants (Loewe and Charlesworth 2006). This feature of the distribution has wide empirical support, both from bioinformatic approaches (Nielsen and Yang 2003; Sawyer et al. 2003; Loewe and Charlesworth 2006), and from mutation accumulation studies (Keightley 1994; Elena et al. 1998; Sanjuan et al. 2004; Eyre-Walker and Keightley 2007).

But high concentrations of severely deleterious mutants can be modeled using distributions other than the lognormal, by adding a parameter such as f that appears in Eqs. 11 and 12 (Sawyer et al. 2003; Nielsen and Yang 2003; Eyre-Walker et al. 2006). And while less elegant than fitting a continuous distribution to all mutants, this approach may be more realistic, as direct studies have sometimes found the distribution to be bimodal, containing peaks of both weakly and strongly deleterious effects (Elena et al. 1998; Sanjuan et al. 2004; Eyre-Walker and Keightley 2007). Furthermore, Nielsen and Yang (2003) and Eyre-Walker et al. (2006) found that the inclusion of such a parameter made little difference to estimates of the gamma shape parameter, β, obtained from divergence and polymorphism data.

However, the qualitative differences between Fig. 1 and Fig. 2 cannot be attributed to differences in the proportion of strongly deleterious mutants. Instead, these differences are probably best explained by another feature of the lognormal distribution: its suppression of probability density around the origin, in the region of strict neutrality. Unlike the concentration of effectively lethal mutants, empirical evidence of this aspect of the distribution is much more equivocal. Loewe and Charlesworth’s (2006) own method could not reject models where a discrete class of strictly neutral mutants was added to the lognormal, and experimental approaches have limited power to resolve very small selection coefficients (Eyre-Walker and Keightley 2007).

Because of this uncertainty, at present we have reason to remain skeptical of any quantitative conclusion that relies on the assumption that the distribution of selection coefficients can be well described by any single functional form (Tachida 1996; Eyre-Walker 2002; Sawyer et al. 2003; Loewe and Charlesworth 2006; Woodhams 2006; Eyre-Walker and Keightley 2007). Clearly, we also require a more detailed knowledge of the distribution of mutant effects in nature. Numerical methods, such as those reported here, combined with model selection techniques, should aid progress toward this goal.


It is a pleasure to thank Laurence Loewe, Andrea Betancourt, and Fraser Lewis for their help with this work.

Copyright information

© Springer Science+Business Media, LLC 2008