An introduction to the mathematical structure of the Wright–Fisher model of population genetics
- 3.5k Downloads
- 18 Citations
Abstract
In this paper, we develop the mathematical structure of the Wright–Fisher model for evolution of the relative frequencies of two alleles at a diploid locus under random genetic drift in a population of fixed size in its simplest form, that is, without mutation or selection. We establish a new concept of a global solution for the diffusion approximation (Fokker–Planck equation), prove its existence and uniqueness and then show how one can easily derive all the essential properties of this random genetic drift process from our solution. Thus, our solution turns out to be superior to the local solution constructed by Kimura.
Keywords
Random genetic drift Wright–Fisher model Fokker–Planck equationIntroduction
In population genetics, one considers the effects of recombination, selection, mutation, and perhaps others like migration on the distribution of alleles in a population, see e.g. (Ewens 2004; Bürger 2000; Rice 2004) as mathematical textbook references. The most basic and at the same time important model is the Wright–Fisher model for random genetic drift [developed implicitly by Fisher (1922) and explicitly by Wright (1931)]. In its simplest version—the one to be treated in the present paper—it is concerned with the evolution of the relative frequencies of two alleles at a single diploid locus in a finite population of fixed size with non-overlapping generations under the sole force of random genetic drift, without any other influences like mutations or selection. The model can be generalised—and so can our approach—to multiple alleles, several loci, with mutations, selections, spatial population structures, etc, see the above references. To find an exact solution (for the approximating diffusion process for the probability densities of the allele frequencies described by a Fokker–Planck equation) from which the properties of the resulting stochastic process can be deduced, however, is difficult. For the basic two-allele case, this was first achieved in the important work of Kimura (1955), and he then went on to treat the case of several alleles (Kimura 1955, 1956). His solution, however, is local in the sense that it does not naturally incorporate the transitions resulting from the irreversible loss of one or several of the alleles initially present in the population. Consequently, the resulting probability distribution does not integrate to 1, and it is difficult to read off the quantitative properties of the process from his solution.
In the present paper, we introduce and describe a new global approach. This approach is mathematically more transparent than Kimura’s scheme. We prove the existence of a unique such global solution (see Theorem 3.7), and we can deduce all desired quantities of the underlying stochastic process from our solution. The purpose of the present paper thus is to display the method in the simplest case, that of two alleles at a single locus, so that the structure becomes clear. The case of multiple alleles is presented in our companion paper (Tran et al. 2000) on the basis of the first author’s thesis, and further generalisations will be systematically developed elsewhere within the mathematical framework of information geometry (Amari and Nagaoka 2000) and more specifically (Ay and Jost 2000; Jost 2000) on the basis of the second author’s thesis.
The Wright–Fisher model
This is the basic model. One can then derive expressions for the expected time for the allele A _{1} to become either fixed, that is, Y _{ n } = 2N, or become extinct, Y _{ n } = 0, given its initial number Y _{0}.
Let us also explain the interpretation of (8) for those not sufficiently versed in this mathematical formalism. The initial condition u(x,0) = δ_{ p }(x) then simply says that at time 0, the relative frequency of allele A _{1} is precisely p, without any uncertainty (this assumption is not essential, however, and the scheme works also for more general initial condition involving uncertainty about the initial distribution of the alleles). Subsequently, this allele frequence evolves stochastically, according to the equation \(u_t(x,t)=\frac{1}{2}\frac{\partial^2}{\partial x^2}\left(x(1-x)u(x,t)\right),\) and therefore, for t > 0, we no longer know the precise value of this relative frequency, but only its probability density given by u(x, t). That is, for every x, the probability density that the allele frequency at time t has the value x is given by u(x, t).
This leads to our concept of a solution of the Fokker–Planck equation in
Definition 2.1
This solution concept will allow us to prove the existence of a unique solution from which we can then derive all features of interest of the Wright–Fisher process. We should point out that (12) is not just the integration by parts of (10), but also includes the boundary behaviour (of course, this may not be overt, but the mathematical trick here is to represent this boundary behaviour in an implicit form best suited for formal manipulation). It, thus, reflects transitions from the presence of both alleles to the irreversible loss of one of them. This is the crucial difference to Kimura’s (1955) solution concept and the key for the properties of our solution.
Existence and uniqueness of solutions
We shall now apply a familiar mathematical scheme for the construction of a solution of a differential equation, an expansion in terms of eigenfunctions of the differential operator involved. For our problem, as formalised in Definition 2.1, these eigenfunctions can be constructed from a classical family of polynomials, the Gegenbauer polynomials, which we shall now introduce.
Preliminaries
Lemma 3.1
- The Gegenbauer polynomials satisfy the recurrence relation$$ \begin{aligned} Y_0(z) &= 1\\ Y_1(z) &= 3 z\\ Y_n (z) &=\frac{1}{n}\left[2z(n+\frac{1}{2})Y_{n-1} (z) - (n+1)Y_{n-2} (z)\right]. \end{aligned} $$
- The Gegenbauer polynomials solve the differential equation$$ (1-z^{2})y^{\prime\prime}-4zy^{\prime}+n(n+3)y=0. $$(15)
Lemma 3.2
Auxiliaries
Lemma 3.3
Proof
Lemma 3.4
If X is an eigenvector of L corresponding to the eigenvalue λ then wX is an eigenvector of L ^{*} corresponding to the eigenvalue λ.
Proof
Lemma 3.5
Proof
Construction of the solution
In this subsection, we construct the solution and prove its uniqueness. We shall firstly find the general solution of the Fokker–Planck equation (10) by the separation of variables method. Then we shall construct a solution depending on parameters. We shall use (11, 12) to determine the parameters. Finally, we shall verify the solution.
Therefore, u is a solution of the Fokker–Planck equation associated with the Wright–Fisher model, indeed.
Altogether, we obtain our main result.
Theorem 3.7.
The Fokker–Planck equation associated with Wright–Fisher model possesses a unique solution.
This behaviour coincides with the discrete one (Figs. 2, 3): Open image in new window Open image in new window
Applications
Our global solution readily yields the quantities of interest of the evolution of the process (X _{ t })_{ t ≥ 0} such as the expectation and the second moment of the absorption time, mth moments, fixation probabilities, the probability of coexistence, or the probability of heterogeneity.
Absorption time
Let V _{0} : = {0,1} be the domain representing a population of 1 allele. Here, 0 corresponds to the loss of A _{1}, that is, the fixation of A _{2}, and 1 corresponds to the opposite situation. Either of these irreverible events is called an absorption.
Remark 4.1
nth moments
Fixation probabilities and probability of coexistence of 2 alleles
Remark 4.2
- (i)
\({\mathbb{P}(X_t\in [0,1]|X_0=p)=\mathbb{P}(X_t=0|X_0=p)+\mathbb{P}(X_t=1|X_0=p)+ \mathbb{P}(X_t\in (0,1)|X_0=p)=1;}\)
- (ii)
\({\mathbb{P}(X_t=0|X_0=p)}\) and \({\mathbb{P}(X_t=1|X_0=p)}\) increase quickly in \(t\in(0,5) \)(10N generations) from 0 and then tend slowly to 1 − p and p, respectively;
- (iii)
When p = 0.5, the situation is symmetric between the two alleles, that is, \({\mathbb{P}(X_t=0|X_0=0.5)=\mathbb{P}(X_t=1|X_0=0.5). }\)
Heterogeneity
Conclusion
We have constructed a unique global solution of the Fokker–Planck equation associated with the Wright–Fisher model. This solution leads to explicit formulae for the absorption time, fixation probabilities, the probability of coexistence, nth moments, heterogeneity, and other quantities.
Footnotes
- 1.
Here is a remark for readers not familiar with this mathematical construction: This is a formal definition, as δ _{ p } defined in this manner is not a function itself, but rather operates on continuous functions by assigning to them their value at the particular point p. Thus, while the product (f, g) had been first defined for square integrable functions f, g, we now apply it to the pair (δ _{ p }, ϕ) where δ _{ p } is a more general object and in turn ϕ is a more restricted function (continuous instead of simply square integrable).
- 2.
The Gegenbauer polynomials generalise other important classes of polynomials, like the Legendre and the Chebyshev polynomials, and they constitute in turn special cases of the Jacobi polynomials.
References
- Abramowitz M, Stegun I (1965) Handbook of mathematical functions with formulas, graphs, and mathematical tables. Dover, New YorkGoogle Scholar
- Amari S, Nagaoka H (2000) Methods of information geometry. In: Translations of mathematical monographs, vol 191. American Mathematical Society, ProvidenceGoogle Scholar
- Ay N, Jost J (2012) Information geometry (in preparation)Google Scholar
- Bürger R (2000) The mathematical theory of selection, recombination, and mutation. Wiley, New YorkGoogle Scholar
- Ewens WJ (2004) Mathematical population genetics I. Theoretical introduction. In: Interdisciplinary applied mathematics, 2nd edn. Springer, New YorkGoogle Scholar
- Fisher RA (1922) On the dominance ratio. Proc. R. Soc. Edinb 42:321–341Google Scholar
- Jost J (2012) Mathematical methods in biology and neurobiology (in preparation)Google Scholar
- Kimura M (1955) Solution of a process of random genetic drift with a continuous model. PNAS–USA 41(3):144–150PubMedCrossRefGoogle Scholar
- Kimura M (1955) Random genetic drift in multi-allele locus. Evolution 9:419–435CrossRefGoogle Scholar
- Kimura M (1956) Random genetic drift in a tri-allelic locus; exact solution with a continuous model. Biometrics 12:57–66CrossRefGoogle Scholar
- Rice S (2004) Evolutionary theory. Sinauer, SunderlandGoogle Scholar
- Suetin PK (2001) Ultraspherical polynomials. In Hazewinkel M (ed) Encyclopaedia of mathematics. Springer, BerlinGoogle Scholar
- Tran TD, Hofrichter J, Jost J (2012) A general solution of the Wright–Fisher model of random genetic drift. arxiv.org/abs/1207.6623Google Scholar
- Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159PubMedGoogle Scholar
- Wright S (1945) The differential equation of the distribution of gene frequencies. Proc. Natl. Acad. Sci. USA 31:382–389PubMedCrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.