Skip to main content

Phase-Type Distribution Approximations of the Waiting Time Until Coordinated Mutations Get Fixed in a Population

  • Conference paper
  • First Online:
Stochastic Processes and Applications (SPAS 2017)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 271))

Abstract

In this paper we study the waiting time until a number of coordinated mutations occur in a population that reproduces according to a continuous time Markov process of Moran type. It is assumed that any individual can have one of \(m+1\) different types, numbered as \(0,1,\ldots ,m\), where initially all individuals have the same type 0. The waiting time is the time until all individuals in the population have acquired type m, under different scenarios for the rates at which forward mutations \(i\rightarrow i+1\) and backward mutations \(i\rightarrow i-1\) occur, and the selective fitness of the mutations. Although this waiting time is the time until the Markov process reaches its absorbing state, the state space of this process is huge for all but very small population sizes. The problem can be simplified though if all mutation rates are smaller than the inverse population size. The population then switches abruptly between different fixed states, where one type at a time dominates. Based on this, we show that phase-type distributions can be used to find closed form approximations for the waiting time law. Our results generalize work by Schweinsberg [60] and Durrett et al. [20], and they have numerous applications. This includes onset and growth of cancer for a cell population within a tissue, with type representing the severity of the cancer. Another application is temporal changes of gene expression among the individuals in a species, with type representing different binding sites that appear in regulatory sequences of DNA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the EM algorithm. Scand. J. Stat. 23, 419–441 (1996)

    MATH  Google Scholar 

  2. Axe, D.D.: The limits of complex adaptation: an analysis based on a simple model of structured bacterial populations. BIO-Complex. 2010(4) (2010)

    Google Scholar 

  3. Barton, N.H.: The probability of fixation of a favoured allele in a subdivided population. Genet. Res. 62, 149–158 (1993)

    Article  Google Scholar 

  4. Beerenwinkel, N., Antal, T., Dingli, D., Traulsen, A., Kinzler, K.W., Velculescu, V.W., Vogelstein, B., Nowak, M.A.: Genetic progression and the waiting time to cancer. PLoS Comput. Biol. 3(11), e225 (2007)

    Article  MathSciNet  Google Scholar 

  5. Behe, M., Snoke, D.W.: Simulating evolution by gene duplication of protein features that require multiple amino acid residues. Protein Sci. 13, 2651–2664 (2004)

    Article  Google Scholar 

  6. Behe, M., Snoke, D.W.: A response to Michael Lynch. Protein Sci. 14, 2226–2227 (2005)

    Article  Google Scholar 

  7. Behrens, S., Vingron, M.: Studying evolution of promoter sequences: a waiting time problem. J. Comput. Biol. 17(12), 1591–1606 (2010)

    Article  MathSciNet  Google Scholar 

  8. Behrens, S., Nicaud, C., Nicodéme, P.: An automaton approach for waiting times in DNA evolution. J. Comput. Biol. 19(5), 550–562 (2012)

    Article  MathSciNet  Google Scholar 

  9. Bobbio, A., Horvath, Á., Scarpa, M., Telek, M.: A cyclic discrete phase type distributions: properties and a parameter estimation algorithm. Perform. Eval. 54, 1–32 (2003)

    Article  Google Scholar 

  10. Bodmer, W.F.: The evolutionary significance of recombination in prokaryotes. Symp. Soc. General Microbiol. 20, 279–294 (1970)

    Google Scholar 

  11. Carter, A.J.R., Wagner, G.P.: Evolution of functionally conserved enhancers can be accelerated in large populations: a population-genetic model. Proc. R. Soc. Lond. 269, 953–960 (2002)

    Article  Google Scholar 

  12. Cao, Y., et al.: Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys. 124, 44109–44119 (2006)

    Article  Google Scholar 

  13. Chatterjee, K., Pavlogiannis, A., Adlam, B., Nowak, M.A.: The time scale of evolutionary innovation. PLOS Comput. Biol. 10(9), d1003818 (2014)

    Article  Google Scholar 

  14. Christiansen, F.B., Otto, S.P., Bergman, A., Feldman, M.W.: Waiting time with and without recombination: the time to production of a double mutant. Theor. Popul. Biol. 53, 199–215 (1998)

    Article  Google Scholar 

  15. Crow, J.F., Kimura, M.: An Introduction to Population Genetics Theory. The Blackburn Press, Caldwell (1970)

    MATH  Google Scholar 

  16. Desai, M.M., Fisher, D.S.: Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798 (2007)

    Article  Google Scholar 

  17. Durrett, R.: Probability Models for DNA Sequence Evolution. Springer, New York (2008)

    Book  Google Scholar 

  18. Durrett, R., Schmidt, D.: Waiting for regulatory sequences to appear. Ann. Appl. Probab. 17(1), 1–32 (2007)

    Article  MathSciNet  Google Scholar 

  19. Durrett, R., Schmidt, D.: Waiting for two mutations: with applications to regulatory sequence evolution and the limits of Darwinian evolution. Genetics 180, 1501–1509 (2008)

    Article  Google Scholar 

  20. Durrett, R., Schmidt, D., Schweinsberg, J.: A waiting time problem arising from the study of multi-stage carinogenesis. Ann. Appl. Probab. 19(2), 676–718 (2009)

    Article  MathSciNet  Google Scholar 

  21. Ewens, W.J.: Mathematical Population Genetics. I. Theoretical Introduction. Springer, New York (2004)

    Book  Google Scholar 

  22. Fisher, R.A.: On the dominance ratio. Proc. R. Soc. Edinb. 42, 321–341 (1922)

    Article  Google Scholar 

  23. Fisher, R.A.: The Genetical Theory of Natural Selection. Oxford University Press, Oxford (1930)

    Book  Google Scholar 

  24. Gerstung, M., Beerenwinkel, N.: Waiting time models of cancer progression. Math. Popul. Stud. 20(3), 115–135 (2010)

    Article  MathSciNet  Google Scholar 

  25. Gillespie, D.T.: Approximate accelerated simulation of chemically reacting systems. J. Chem. Phys. 115, 1716–1733 (2001)

    Article  Google Scholar 

  26. Gillespie, J.H.: Molecular evolution over the mutational landscape. Evolution 38(5), 1116–1129 (1984)

    Article  Google Scholar 

  27. Gillespie, J.H.: The role of population size in molecular evolution. Theor. Popul. Biol. 55, 145–156 (1999)

    Article  Google Scholar 

  28. Greven, A., Pfaffelhuber, C., Pokalyuk, A., Wakolbinger, A.: The fixation time of a strongly beneficial allele in a structured population. Electron. J. Probab. 21(61), 1–42 (2016)

    MathSciNet  MATH  Google Scholar 

  29. Gut, A.: An Intermediate Course in Probability. Springer, New York (1995)

    Book  Google Scholar 

  30. Haldane, J.B.S.: A mathematical theory of natural and artificial selection. Part V: selection and mutation. Math. Proc. Camb. Philos. Soc. 23, 838–844 (1927)

    Article  Google Scholar 

  31. Hössjer, O., Tyvand, P.A., Miloh, T.: Exact Markov chain and approximate diffusion solution for haploid genetic drift with one-way mutation. Math. Biosci. 272, 100–112 (2016)

    Article  MathSciNet  Google Scholar 

  32. Iwasa, Y., Michor, F., Nowak, M.: Stochastic tunnels in evolutionary dynamics. Genetics 166, 1571–1579 (2004)

    Article  Google Scholar 

  33. Iwasa, Y., Michor, F., Komarova, N.L., Nowak, M.: Population genetics of tumor suppressor genes. J. Theor. Biol. 233, 15–23 (2005)

    Article  MathSciNet  Google Scholar 

  34. Kimura, M.: Some problems of stochastic processes in genetics. Ann. Math. Stat. 28, 882–901 (1957)

    Article  MathSciNet  Google Scholar 

  35. Kimura, M.: On the probability of fixation of mutant genes in a population. Genetics 47, 713–719 (1962)

    Google Scholar 

  36. Kimura, M.: Average time until fixation of a mutant allele in a finite population under continued mutation pressure: studies by analytical, numerical and pseudo-sampling methods. Proc. Natl. Acad. Sci. USA 77, 522–526 (1980)

    Article  Google Scholar 

  37. Kimura, M.: The role of compensatory neutral mutations in molecular evolution. J. Genet. 64(1), 7–19 (1985)

    Article  Google Scholar 

  38. Kimura, M., Ohta, T.: The average number of generations until fixation of a mutant gene in a finite population. Genetics 61, 763–771 (1969)

    Google Scholar 

  39. Knudson, A.G.: Two genetic hits (more or less) to cancer. Nat. Rev. Cancer 1, 157–162 (2001)

    Article  Google Scholar 

  40. Komarova, N.L., Sengupta, A., Nowak, M.: Mutation-selection networks of cancer initiation: tumor suppressor genes and chromosomal instability. J. Theor. Biol. 223, 433–450 (2003)

    Article  MathSciNet  Google Scholar 

  41. Lambert, A.: Probability of fixation under weak selection: a branching process unifying approach. Theor. Popul. Biol. 69(4), 419–441 (2006)

    Article  MathSciNet  Google Scholar 

  42. Li, T.: Analysis of explicit tau-leaping schemes for simulating chemically reacting systems. Multiscale Model. Simul. 6, 417–436 (2007)

    Article  MathSciNet  Google Scholar 

  43. Lynch, M.: Simple evolutionary pathways to complex proteins. Protein Sci. 14, 2217–2225 (2005)

    Article  Google Scholar 

  44. Lynch, M., Abegg, A.: The rate of establishment of complex adaptations. Mol. Biol. Evol. 27(6), 1404–1414 (2010)

    Article  Google Scholar 

  45. MacArthur, S., Brockfield, J.F.Y.: Expected rates and modes of evolution of enhancer sequences. Mol. Biol. Evol. 21(6), 1064–1073 (2004)

    Article  Google Scholar 

  46. Maruyama, T.: On the fixation probability of mutant genes in a subdivided population. Genet. Res. 15, 221–225 (1970)

    Article  Google Scholar 

  47. Maruyama, T., Kimura, M.: Some methods for treating continuous stochastic processes in population genetics. Jpn. J. Genet. 46(6), 407–410 (1971)

    Article  Google Scholar 

  48. Maruyama, T., Kimura, M.: A note on the speed of gene frequency changes in reverse direction in a finite population. Evolution 28, 161–163 (1974)

    Article  Google Scholar 

  49. Moran, P.A.P.: Random processes in genetics. Proc. Camb. Philos. Soc. 54, 60–71 (1958)

    Article  MathSciNet  Google Scholar 

  50. Neuts, M.F.: Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. John Hopkins University Press, Baltimore (1981)

    MATH  Google Scholar 

  51. Nicodéme, P.: Revisiting waiting times in DNA evolution (2012). arXiv:1205.6420v1

  52. Nowak, M.A.: Evolutionary Dynamics: Exploring the Equations of Life. Belknap Press, Cambridge (2006)

    MATH  Google Scholar 

  53. Phillips, P.C.: Waiting for a compensatory mutation: phase zero of the shifting balance process. Genet. Res. 67, 271–283 (1996)

    Article  Google Scholar 

  54. Radmacher, M.D., Kelsoe, G., Kepler, T.B.: Predicted and inferred waiting times for key mutations in the germinal centre reaction: evidence for stochasticity in selection. Immunol. Cell Biol. 76, 373–381 (1998)

    Article  Google Scholar 

  55. Rupe, C.L., Sanford, J.C.: Using simulation to better understand fixation rates, and establishment of a new principle: Haldane’s Ratchet. In: Horstmeyer, M. (ed.) Proceedings of the Seventh International Conference of Creationism. Creation Science Fellowship, Pittsburgh, PA (2013)

    Google Scholar 

  56. Sanford, J., Baumgardner, J., Brewer, W., Gibson, P., Remine, W.: Mendel’s accountant: a biologically realistic forward-time population genetics program. Scalable Comput.: Pract. Exp. 8(2), 147–165 (2007)

    Google Scholar 

  57. Sanford, J., Brewer, W., Smith, F., Baumgardner, J.: The waiting time problem in a model hominin population. Theor. Biol. Med. Model. 12, 18 (2015)

    Article  Google Scholar 

  58. Schinazi, R.B.: A stochastic model of cancer risk. Genetics 174, 545–547 (2006)

    Article  Google Scholar 

  59. Schinazi, R.B.: The waiting time for a second mutation: an alternative to the Moran model. Phys. A. Stat. Mech. Appl. 401, 224–227 (2014)

    Article  MathSciNet  Google Scholar 

  60. Schweinsberg, J.: The waiting time for \(m\) mutations. Electron. J. Probab. 13(52), 1442–1478 (2008)

    Article  MathSciNet  Google Scholar 

  61. Slatkin, M.: Fixation probabilities and fixation times in a subdivided population. Evolution 35, 477–488 (1981)

    Article  Google Scholar 

  62. Stephan, W.: The rate of compensatory evolution. Genetics 144, 419–426 (1996)

    Google Scholar 

  63. Stone, J.R., Wray, G.A.: Rapid evolution of cis-regulatory sequences via local point mutations. Mol. Biol. Evol. 18, 1764–1770 (2001)

    Article  Google Scholar 

  64. Tuğrul, M., Paixão, T., Barton, N.H., Tkačik, G.: Dynamics of transcription factor analysis. PLOS Genet. 11(11), e1005639 (2015)

    Article  Google Scholar 

  65. Whitlock, M.C.: Fixation probability and time in subdivided populations. Genetics 164, 767–779 (2003)

    Google Scholar 

  66. Wodarz, D., Komarova, N.L.: Computational Biology of Cancer. Lecture Notes and Mathematical Modeling. World Scientific, New Jersey (2005)

    Google Scholar 

  67. Wright, S.: Evolution in Mendelian populations. Genetics 16, 97–159 (1931)

    Google Scholar 

  68. Wright, S.: The roles of mutation, inbreeding, crossbreeding and selection in evolution. In: Proceedings of the 6th International Congress on Genetics, vol. 1, pp. 356–366 (1932)

    Google Scholar 

  69. Wright, S.: Statistical genetics and evolution. Bull. Am. Math. Soc. 48, 223–246 (1942)

    Article  MathSciNet  Google Scholar 

  70. Yona, A.H., Alm, E.J., Gore, J.: Random sequences rapidly evolve into de novo promoters (2017). bioRxiv.org, https://doi.org/10.1101/111880

  71. Zhu, T., Hu, Y., Ma, Z.-M., Zhang, D.-X., Li, T.: Efficient simulation under a population genetics model of carcinogenesis. Bioinformatics 6(27), 837–843 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

The authors wish to thank an anonymous reviewer for several helpful suggestions that improved the clarity and presentation of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ola Hössjer .

Editor information

Editors and Affiliations

Appendices

Appendix A. A Simulation Algorithm

Recall from Sect. 12.2 that the allele frequency process \({\varvec{Z}}_t\) of the Moran model is a continuous time and piecewise constant Markov process with exponentially distributed holding times at each state \({\varvec{z}}=(z_0,\ldots ,z_m)\in \mathcal{Z}\). For all but very small population sizes, it is infeasible to simulate this process directly, since the distances between subsequent jumps are very small, of size \(O_p(N^{-1})\). The \(\tau \)-leaping algorithm was introduced (Gillespie [25], Li [42]) in order to speed up computations for a certain class of continuous time Markov processes. It is an approximate simulation algorithm with time increments of size \(\tau \). According to the leaping condition of Cao et al. [12], one chooses \(\tau =\tau (\varepsilon )\) in such a way that

$$\begin{aligned} E\left[ |Z_{t+\tau ,i}-Z_{ti}||Z_{ti}=z_i\right] \le \varepsilon z_i \end{aligned}$$
(12.118)

for \(i=0,\ldots ,m\) and some fixed, small number \(\varepsilon >0\), typically in a range between 0.01 and 0.1.

Zhu et al. [71] pointed out that it is not appropriate to use \(\tau \)-leaping for the Moran model when very small allele frequencies are updated. For this reason they defined a hybrid algorithm that combines features of exact simulation and \(\tau \)-leaping. Although most time increments are of length \(\tau \), some critical ones are shorter. Then they showed that (12.118) will be satisfied by the hybrid algorithm for a neutral model with small mutation rates, when

$$\begin{aligned} \tau \le \varepsilon /2. \end{aligned}$$
(12.119)

We will extend the method of Zhu et al. [71] to our setting, where forward and backward mutations are possible. In order to describe the simulation algorithm, we first need to define the transition rates of the Moran model. From any state \({\varvec{z}}\in \mathcal{Z}\), there are at most \((m+1)m\) jumps \({\varvec{z}}\rightarrow {\varvec{z}}+ {\varvec{\delta }}_{ij}/N\) possible, where \({\varvec{\delta }}_{ij}={\varvec{e}}_j-{\varvec{e}}_i\), \(0\le i,j \le m\) and \(i\ne j\). Each such change corresponds to an event where a type i individual dies and gets replaced by a another one of type j. Since the process remains unchanged when \(i=j\), we need not include these events in the simulation algorithm. It follows from Sect. 12.2 that the transition rate from \({\varvec{z}}\) to \({\varvec{z}}+ {\varvec{\delta }}_{ij}/N\) is

$$\begin{aligned} \begin{array}{l} a_{ij} = a_{ij}({\varvec{z}})\\ = z_i \times \frac{z_js_j}{\sum _{k=0}^m z_k s_k} (1-u_{j+1}-v_{j-1}) + z_i \times \frac{z_{j-1}s_{j-1}}{\sum _{k=0}^m z_k s_k}u_j + z_i \times \frac{z_{j+1}s_{j+1}}{\sum _{k=0}^m z_k s_k}v_j\\ = \frac{z_i}{\sum _{k=0}^m z_k s_k} \left[ z_js_j(1-u_{j+1}-v_{j-1}) + z_{j-1}s_{j-1}u_j+z_{j+1}s_{j+1}v_j\right] , \end{array} \end{aligned}$$
(12.120)

with \(u_{m+1}=v_{-1}=z_{-1}=z_{m+1}=0\). Let \(N_c\) be a threshold. For any given state \({\varvec{z}}\), define the non-critical set \(\varOmega \) of events as those pairs (ij) with \(i\ne j\) such that both of \(z_i\) and \(z_j\) exceed \(N_c/N\). The remaining events (ij) are referred to as critical, since at least one of \(z_i\) and \(z_j\) is \(N_c/N\) or smaller. The idea of the hybrid simulation method is to simulate updates of critical events exactly, whereas non-critical events are updated approximately. In more detail, the algorithm is defined as follows:

  1. 1.

    Set \(t=0\) and \({\varvec{Z}}_t={\varvec{e}}_0={\varvec{z}}\).

  2. 2.

    Compute the \(m(m+1)\) transition rates \(a_{ij}=a_{ij}({\varvec{z}})\) for \(0\le i,j \le m\) and \(i\ne j\).

  3. 3.

    Compute the set \(\varOmega =\varOmega ({\varvec{z}})\) of critical events for the current state \({\varvec{z}}\).

  4. 4.

    Determine the exponentially distributed waiting time \(e\, {\mathop {\in }\limits ^\mathcal{L}}\text{ Exp }(a)\) until the next critical event occurs, where \(a=\sum _{(i,j)\notin \varOmega } a_{ij}\) is the rate of the exponential distribution.

  5. 5.

    If \(e<\tau \), simulate a critical event \((I,J)\notin \varOmega \) from the probability distribution \(\{a_{ij}/a;\, (i,j)\notin \varOmega \}\), and update the allele frequency vector as \({\varvec{z}}\leftarrow {\varvec{z}}+ {\varvec{\delta }}_{IJ}/N\). Otherwise, if \(e\ge \tau \), simulate no critical event and leave \({\varvec{z}}\) intact.

  6. 6.

    Let \(h=\min (e,\tau )\). Then simulate non-critical events over a time interval of length h, and increment the allele frequency vector as

    $$ {\varvec{z}}\leftarrow {\varvec{z}}+ \frac{1}{N}\sum _{(i,j)\in \varOmega } n_{ij}{\varvec{\delta }}_{ij}, $$

    where \(n_{ij}\sim \text{ Po }(a_{ij}h)\) are independent and Poisson distributed random variables.

  7. 7.

    Update date time (\(t\leftarrow t+h\)) and the allele frequency process (\({\varvec{Z}}_t\leftarrow {\varvec{z}}\)).

  8. 8.

    If \({\varvec{z}}={\varvec{e}}_m\), set \(T_m=t\) and stop. Otherwise go back to step 2.

We have implemented the hybrid algorithm, with \(N_c\) and \(\varepsilon \) as input parameters and \(\tau =\varepsilon /2\). When the selection coefficients \(s_i\) are highly variable, a smaller value of \(\tau \) is needed though in order to guarantee that (12.118) holds.

Appendix B. The Expected Waiting Time for One Mutation

In this appendix we will motivate formula (12.34). It approximates the expected number of generations \(\alpha (s)\) until a single mutant with fitness s spreads and get fixed in a population where the remaining \(N-1\) individuals have fitness 1, given that such a fixation will happen and that no further mutations occur. This corresponds to a Moran model of Sect. 12.2 with \(m=1\) mutant, zero mutation rates (\(u_1=v_0=0\)), and initial allele frequency distribution \({\varvec{Z}}_0 = (1-p,p)\), where \(p=1/N\). For simplicity of notation we write \(Z_t=Z_{t1}\) for the frequency of the mutant allele 1.

Kimura and Ohta [38] derived a diffusion approximation of \(\alpha (s)\), for a general class of models. It involves the infinitesimal mean and variance functions M(z) and V(z) of the allele frequency process, defined through

$$ \begin{aligned} E(Z_{t+h}|Z_t=z)&= z + M(z)h + o(h),\\ \text{ Var }(Z_{t+h}|Z_t=z)&= V(z)h + o(h) \end{aligned} $$

as \(h\rightarrow 0\). In order to apply their formula to a mutation-free Moran model, we first need to find M(z) and V(z). To this end, suppose \(Z_t=z\). Then use formula (12.120) with \(m=1\) to deduce that

$$\begin{aligned} z \rightarrow z+1/N \text{ at } \text{ rate } a_{01}(z) = N(1-z)\frac{zs}{1-z+zs}, \end{aligned}$$
(12.121)

whereas

$$\begin{aligned} z \rightarrow z-1/N \text{ at } \text{ rate } a_{10}(z) = Nz\frac{1-z}{1-z+zs}. \end{aligned}$$
(12.122)

From this it follows that

$$\begin{aligned} M(z) = \frac{1}{N}\left[ a_{01}(z)-a_{10}(z)\right] = (s-1)\frac{(1-z)z}{1+z(s-1)} \end{aligned}$$
(12.123)

and

$$\begin{aligned} V(z) = \frac{1}{N^2} \left[ a_{01}(z)+a_{10}(z)\right] = \frac{1}{N}(1+s)\frac{(1-z)z}{1+z(s-1)}. \end{aligned}$$
(12.124)

We will also need the function

$$ G(z) = \exp \left( -\int _0^z \frac{2M(y)}{V(y)} dy\right) = \exp (-2Ns^\prime z), $$

with \(s^\prime = (s-1)/(s+1)\). The formula of Kimura and Ohta [38] takes the form

$$\begin{aligned} \alpha (s) = \int _p^1 \psi (z)\hat{\beta }(z)\left[ 1-\hat{\beta }(z)\right] dz + \frac{1-\hat{\beta }(p)}{\hat{\beta }(p)}\int _0^p \psi (z)\hat{\beta }^2(z)dz, \end{aligned}$$
(12.125)

where

$$\begin{aligned} \hat{\beta }(z) = \hat{\beta }(s;z) = \frac{\int _0^z G(y)dy}{\int _0^1 G(y)dy} = \frac{1-e^{-2Ns^\prime z}}{1-e^{-2Ns^\prime }} \end{aligned}$$
(12.126)

approximates the fixation probability of a mutant allele that starts at frequency \(Z_0=z\). In particular, \(\hat{\beta }(1/N)\) approximates the exact probability (12.32) that one single copy of an allele with fitness s takes over a population where all other individuals have fitness 1. This diffusion approximation is increasingly accurate in the limit of weak selection (\(s\rightarrow 1\)).

The other function of the two integrands in (12.125), is

$$\begin{aligned} \psi (z) = \frac{2\int _0^1 G(y)dy}{V(z)G(z)} = \frac{1-e^{-2Ns^\prime }}{e^{-2Ns^\prime z}} \times \frac{1+z(s-1)}{1+s} \times \frac{1}{s^\prime z(1-z)}. \end{aligned}$$
(12.127)

In order to verify (12.34) we will approximate (12.125) separately for neutral (\(s=1\)), advantageous (\(s>1\)), and deleterious (\(s<1\)) alleles. In the neutral case \(s=1\) we let \(s^\prime \rightarrow 0\) and find that \(\hat{\beta }(z)=z\) and \(\psi (z)=N/[z(1-z)]\). Inserting these functions into (12.125), we obtain an expression

$$ \alpha (1) = -\frac{1}{p}\left[ N(1-p)\log (1-p)\right] $$

for the expected fixation time. This is essentially the middle part of (12.34) when \(p=1/N\).

When \(s>1\), we similarly insert (12.126)–(12.127) into (12.125). After some quite long calculations, it can be shown that

$$\begin{aligned} \begin{aligned} \alpha (s)&\sim \frac{1+s}{s-1}\log (N)\\&+ \frac{s}{s-1}\left[ \log (2s^\prime )+\int _0^1 \frac{1-e^{-y}}{y}dy - \int _1^\infty \frac{e^{-y}}{y}dy - \frac{1}{s}\int _{2s^\prime }^\infty \frac{e^{-y}}{y}dy\right] \\&+ \frac{e^{-2s^\prime }}{1-e^{-2s^\prime }} \times \frac{1}{s-1}\int _0^{2s^\prime } \frac{1}{y}e^y(1-e^{-y})^2 dy \end{aligned} \end{aligned}$$
(12.128)

as \(N\rightarrow \infty \). The first term of this expression dominates for large N, and it agrees with the lower part of (12.34).

When \(s<1\), a similar calculation yields

$$\begin{aligned} \begin{aligned} \alpha (s)&\sim \frac{1+s}{1-s}\log (N)\\&+ \frac{s}{1-s}\left[ \log (2s^{\prime \prime })+\int _0^1 \frac{1-e^{-y}}{y}dy - \int _1^\infty \frac{e^{-y}}{y}dy - \frac{1}{s}\int _{2s^{\prime \prime }}^\infty \frac{e^{-y}}{y}dy\right] \\&+ \frac{e^{-2s^{\prime \prime }}}{1-e^{-2s^{\prime \prime }}} \times \frac{1}{1-s}\int _0^{2s^{\prime \prime }} \frac{1}{y}e^y(1-e^{-y})^2 dy \end{aligned} \end{aligned}$$
(12.129)

as \(N\rightarrow \infty \), with \(s^{\prime \prime }=(1-s)/(s+1)\). The first, leading term of this formula is consistent with the upper part of (12.34). The various approximations of \(\alpha (s)\) are shown in Table 12.5.

Table 12.5 Approximations of the expected waiting time \(\alpha (s)=\alpha _N(s)\) of fixation, in units of generations, for a single mutant with selection coefficient s, in a population of size N. The columns marked Diff are based on the diffusion approximation (12.125), whereas the columns marked AsDiff are asymptotic approximations of the diffusion solution, based on the middle part of (12.34) for \(s=1\), Eq. (12.128) for \(s>1\) and Eq. (12.129) for \(s<1\). The latter two formulas only work well when \(|s-1|\gg 1/N\). They have been omitted when they depart from the diffusion solution by more than 10%

Appendix C. Sketch of Proofs of Main Results

Lemma 12.1

Let \(\{\tau _k\}_{k=0}^M\) be the fixation times of the process \({\varvec{Z}}_t\), defined in (12.13), and \(\tau _{k+1}^\prime \) the time points when a successful mutation first occurs between two successive fixation events (\(\tau _k<\tau _{k+1}^\prime < \tau _{k+1}\)). Let also \(\mu _i\) be the rate in (12.15) at which successful mutations appear in a homogeneous type i population. Then

$$\begin{aligned} P\left( \tau _{k+1}^\prime -\tau _k > \frac{\zeta }{\mu _i}|{\varvec{Z}}_{t_k}={\varvec{e}}_i\right) \rightarrow \exp (-\zeta ) \end{aligned}$$
(12.130)

as \(N\rightarrow \infty \) for all \(\zeta >0\) and \(i=0,1,\ldots ,m-1\).

Sketch of proof. Let \(f_i({\varvec{z}})=f_{i,N}({\varvec{z}})\) and \(b_i({\varvec{z}})=b_{i,N}({\varvec{z}})\) be the probabilities that the offspring of a type \(i\in \{0,\ldots ,m-1\}\) individual who mutates to \(i+1\) or \(i-1\) is a successful forward or backward mutation, given that the allele frequency configuration is \({\varvec{z}}\) just before replacement occurs with the individual that dies (when \(i=0\) we put \(b_0({\varvec{z}})=0\)). Notice in particular that \(f_i=f_i({\varvec{e}}_i)\) and \(b_i=b_i({\varvec{e}}_i)\), since these two quantities are defined as the probabilities of a successful forward or backward mutation in an environment where all individuals have type i just before the mutation, that is, when \({\varvec{z}}={\varvec{e}}_i\).

When an individual is born in a population with allele configuration \({\varvec{z}}\), with probability \(1-u_{i+1}f_i({\varvec{z}})-v_{i-1}b_i({\varvec{z}})\) it is not the first successful mutation between two fixation events \(\tau _k\) and \(\tau _{k+1}\), given that no other successful has occurred between these two time points. Let \(0\le t_1< t_2 < \cdots \) be the time points when a type i individual gets an offspring, and if we choose \(\{{\varvec{Z}}_t\}\) to be left-continuous, the probability of no successful mutation \(i\rightarrow i\pm 1\) at time \(t_l\), where \(\tau _k< t_l < \tau _{k+1}\), is \(1-u_{i+1}f_i({\varvec{Z}}_{t_l})-v_{i-1}b_i({\varvec{Z}}_{t_l})\), given that no other successful mutation has occurred so far (\(\tau _{k+1}^\prime \ge t_l\)). Since the left hand side of (12.130) is the probability of no mutation \(i\rightarrow i\pm 1\) being successful among those that arrive at some time point in \({\mathbb {T}}_i(\zeta )=\{t_l;\, \tau _k < t_l \le \tau _k+\zeta /\mu _i\}\), we find that

$$\begin{aligned} \begin{array}{l} P(\tau _{k+1}^\prime -\tau _k > \zeta /\mu _i|{\varvec{Z}}_{\tau _k}={\varvec{e}}_i)\\ = E\left[ \prod _{t_l\in {\mathbb {T}}_i(\zeta )} \left( 1-u_{i+1}f_i({\varvec{Z}}_{t_l})-v_{i-1}b_i({\varvec{Z}}_{t_l})\right) \right] \\ \approx E\left[ \exp \left( - u_{i+1}\sum _{t_l\in {\mathbb {T}}_i(\zeta )} f_i({\varvec{Z}}_{t_l}) - v_{i-1}\sum _{t_l\in {\mathbb {T}}_i(\zeta )} b_i({\varvec{Z}}_{t_l})\right) \right] , \end{array} \end{aligned}$$
(12.131)

where expectation is with respect to variations in the allele frequency process \({\varvec{Z}}_t\) for \(t\in {\mathbb {T}}_i(\zeta )\).

Because of (12.4)–(12.5), with a probability tending to 1 as \(N\rightarrow \infty \), \({\varvec{Z}}_{t}\) will stay close to \({\varvec{e}}_i\) most of the time in \((\tau _k,\tau _{k+1}^\prime )\), that is, all alleles \(l\ne i\) will most of the time be kept at low frequencies. In order to motivate this, we notice that by definition, all mutations that arrive in \((\tau _k,\tau _{k+1}^\prime )\) are unsuccessful. It is known that the expected lifetime of an unsuccessful mutations is bounded by \(C\log (N)\) for a fairly large class of Moran models with selection, where C is a constant that depends on the model parameters, but not on N (Crow and Kimura [15], Section 8.9). Since mutations arrive at rate \(N(v_{i-1}+u_{i+1})\), this suggest that all alleles \(l\ne i\) are expected to have low frequency before the first successful mutation arrives, if

$$ C\log (N)\times N(v_{i-1}+u_{i+1}) = o(1) $$

as \(N\rightarrow \infty \), i.e. if the convergence rate towards zero in (12.4)–(12.5) is faster than logarithmic. This implies that it is possible to approximate the sums on the right hand sides of (12.131) by

$$\begin{aligned} \begin{array}{rcl} {\sum }_{t_l\in {\mathbb {T}}_i(\zeta )} f_i({\varvec{Z}}_{t_l}) \approx f_i |{\mathbb {T}}_i(\zeta )| \approx f_i N \times \zeta /\mu _i,\\ {\sum }_{t_l\in {\mathbb {T}}_i(\zeta )} b_i({\varvec{Z}}_{t_l}) \approx b_i |{\mathbb {T}}_i(\zeta )| \approx b_i N \times \zeta /\mu _i, \end{array} \end{aligned}$$
(12.132)

where \(|{\mathbb {T}}_i(\zeta )|\) refers to the number of elements in \({\mathbb {T}}_i(\zeta )\). In the first step of (12.132), we used that \(f_i({\varvec{z}})\rightarrow f_i\) and \(b_i({\varvec{z}})\rightarrow b_i\) as \({\varvec{z}}\rightarrow {\varvec{e}}_i\) respectively, and therefore \(f_i({\varvec{Z}}_{t_l})\approx f_i\) and \(b_i({\varvec{Z}}_{t_l})\approx b_i\) for most of the terms in (12.132). In the second step of (12.132) we used that \(|{\mathbb {T}}_i(\zeta )|\) counts the number of births of type i individuals within a time interval of length \(\zeta /\mu _i\), and that each \(t_{l+1}-t_l\) is approximately exponentially distributed. By the definition of the Moran model in Sect. 12.2, the intensity of this exponential distribution is approximately

$$ N \times \frac{Z_{t_li}s_i}{\sum _{j=0}^m Z_{t_lj}s_j} \approx N, $$

for the majority of time points \(t_l\) such that \({\varvec{Z}}_{t_l}\) stays close to \({\varvec{e}}_i\). Consequently, \(|{\mathbb {T}}_i(\zeta )|\) is approximately Poisson distributed with expected value \(N\zeta /\mu _i\). We know from (12.4)–(12.5) and (12.15) that \(\mu _i=o(1)\). Because this implies that \(N\zeta /\mu _i \gg 1\) is large, and since the coefficient of variation of a Poisson distribution tends to zero when its expected value increases, \(|{\mathbb {T}}_i(\zeta )|/(N\zeta /\mu _i)\) converges to 1 in probability as \(N\rightarrow \infty \), and therefore we approximate \(|{\mathbb {T}}_i(\zeta )|\) by \(N\zeta /\mu _i\). To conclude; (12.130) follows from (12.15), (12.131), and (12.132).    \(\square \)

Proof of Theorem 12.1. Let \({\varvec{X}}_{\zeta } = {\varvec{Z}}_{\zeta /\mu _{\text{ min }}}\) denote the allele frequency process after changing time scale by a factor \(\mu _{\text{ min }}\). Let \(S_k=\mu _{\text{ min }}\tau _k\) refer to time points of fixation when \(\{{\varvec{X}}_\zeta \}\) visits new fixed states in \(\mathcal{Z}_{\text{ hom }}\), defined in (12.6), \(S_{k+1}^\prime = \mu _{\text{ min }}\tau _{k+1}^\prime \) the time point when a successful mutation first appears after \(S_k\), and \(S=\mu _{\text{ min }}T_m = S_M\) the time when allele m gets fixed. We need to show that

$$\begin{aligned} S \, {\mathop {\longrightarrow }\limits ^\mathcal{L}}\text{ PD }(\tilde{{\varvec{e}}}_0,{\varvec{\varSigma }}_0) \text{ as } N\rightarrow \infty . \end{aligned}$$
(12.133)

To this end, write

$$\begin{aligned} S = \sum _{k=0}^{M-1} (S_{k+1}^\prime -S_k) + \sum _{k=1}^{M} (S_k-S_{k}^\prime ) =: S_{\text{ appear }} + S_{\text{ tunfix }}, \end{aligned}$$
(12.134)

where \(S_{\text{ appear }}\) is the total waiting time for new successful mutations to appear, and \(S_{\text{ tunfix }}\) is the total waiting time for tunneling and fixation, after successful mutations have appeared. We will first show that

$$\begin{aligned} S_{\text{ appear }} \, {\mathop {\longrightarrow }\limits ^\mathcal{L}}\text{ PD }(\tilde{{\varvec{e}}}_0,{\varvec{\varSigma }}_0) \text{ as } N\rightarrow \infty . \end{aligned}$$
(12.135)

It follows from (12.14) to (12.17) that \(\{{\varvec{X}}_{S_k}\}\) is a Markov chain that starts at \({\varvec{X}}_{S_0}={\varvec{e}}_0\), with transition probabilities

$$\begin{aligned} \begin{array}{rr} P({\varvec{X}}_{S_{k+1}}={\varvec{e}}_j|{\varvec{X}}_{S_k}={\varvec{e}}_i) = p_{ij,N} \rightarrow \pi _{ij}&{} \\ \text{ for } i=0,\ldots ,m-1, \, j\ne i.&{} \end{array} \end{aligned}$$
(12.136)

Because of (12.25) and Lemma 12.1, the waiting times for successful mutations \(i\rightarrow i\pm 1\) have exponential or degenerate limit distributions as \(N\rightarrow \infty \), since

$$\begin{aligned} P(S^\prime _{k+1}-S_k>\zeta |{\varvec{X}}_{S_k}={\varvec{e}}_i) \rightarrow \left\{ \begin{array}{ll} \exp (-\kappa _i \zeta ), &{} i\in I_{\text{ long }},\\ 0, &{} i\in I_{\text{ short }}, \end{array}\right. \end{aligned}$$
(12.137)

where \(I_{\text{ long }}\) and \(I_{\text{ short }}\) refer to those asymptotic states in (12.22) and (12.23) that are visited for a long and short time, respectively. Since by definition, the non-asymptotic states \(i\in I_{\text{ nas }}\) in (12.20) will have no contribution to the limit distribution of \(S_{\text{ appear }}\) as \(N\rightarrow \infty \), it follows from (12.136) to (12.137) that asymptotically, \(S_{\text{ appear }}\) is the total waiting time for a continuous time Markov chain with intensity matrix \({\varvec{\varSigma }}\), that starts at \({\varvec{e}}_0\), before it reaches its absorbing state \({\varvec{e}}_m\). This proves (12.135).

It remains to prove that \(S_{\text{ tunfix }}\) is asymptotically negligible. It follows from (12.26) that

$$\begin{aligned} P(\varepsilon ) = P_N(\varepsilon ) = \max _{i\in I_{\text{ as }}} P\left( S_{k}-S_{k}^\prime >\varepsilon |{\varvec{X}}_{S_{k-1}}={\varvec{e}}_i\right) = o(1) \end{aligned}$$
(12.138)

as \(N\rightarrow \infty \) for any \(\varepsilon > 0\). Write \(M=\sum _{i=0}^{m-1} M_i\), where \(M_i\) is the number of visits to \({\varvec{e}}_i\) by the Markov chain \(\{{\varvec{X}}_{S_k};\, k=0,\ldots ,M\}\), before it is stopped at time M. Let K be a large positive integer. We find that

$$\begin{aligned} \begin{array}{cc} &{} P(S_{\text{ tunfix }}> \varepsilon ) \le E\left[ \sum _{k=1}^{\text{ min }(K,M)} P(S_{k}-S_{k}^\prime>\varepsilon /K)\right] + P(M>K) \\ &{} \le KP(\varepsilon /K) + \sum _{i\in I_{\text{ nas }}} P(M_i>0) + E(M)/K\\ &{} \le 2E(M)/K \end{array} \end{aligned}$$
(12.139)

for all sufficiently large N. In the second step of (12.139) we used that

$$\begin{aligned} E(M) = \tilde{{\varvec{e}}}_0 ({\varvec{I}}-{\varvec{P}}_0)^{-1}{\varvec{1}}^T \rightarrow \tilde{{\varvec{e}}}_0 ({\varvec{I}}-{\varvec{\varPi }}_0)^{-1}{\varvec{1}}^T < \infty , \end{aligned}$$
(12.140)

where \({\varvec{P}}_0\) is a square matrix of order m that contains the first m rows and m columns of the transition matrix \({\varvec{P}}\) of the Markov chain \({\varvec{X}}_{S_k}\), so that its elements are the transition probabilities among and from the non-absorbing states. We used in (12.140) that M is the number of jumps until this Markov chain reaches its absorbing state, and therefore it has a discrete phase-type distribution (Bobbio et al. [9]). And because of (12.17)–(12.18), the expected value of M must be finite. In the last step of (12.139) we used (12.138) and the definition of non-asymptotic states, which implies \(P(M_i>0)=o(1)\) for all \(i\in I_{\text{ nas }}\).

Since (12.139) holds for all \(K>0\) and \(\varepsilon >0\), we deduce \(S_{\text{ tunfix }}=o(1)\) by first letting \(K\rightarrow \infty \) and then \(\varepsilon \rightarrow 0\). Together with (12.134)–(12.135) and Slutsky’s Theorem (see for instance Gut [29]), this completes the proof of (12.133).    \(\square \)

In order to motivate Theorem 12.2, we first give four lemmas. It is assumed for all of them that the regularity conditions of Theorem 12.2 hold.

Lemma 12.2

Let \(r_{ilj}\) be the probabilities defined in (12.37)–(12.40). Then

$$\begin{aligned} \begin{array}{ll} &{}r_{ilj} = O(u_j^{1-2^{-(j-l-1)}}),\\ &{}r_{ilj} = \varOmega (u_{l+2}^{1-2^{-(j-l-1)}}), \end{array} \quad i\le l \le j-2, \end{aligned}$$
(12.141)

and

$$\begin{aligned} \begin{array}{ll} &{}r_{ilj} = O(v_j^{1-2^{-(l-j-1)}}),\\ &{}r_{ilj} = \varOmega (v_{l-2}^{1-2^{-(l-j-1)}}), \end{array} \quad j+2\le l\le i, \end{aligned}$$
(12.142)

as \(N\rightarrow \infty \). The corresponding formulas for \(r_{ij}=r_{iij}\) in (12.36) are obtained by putting \(l=i\) in (12.141)–(12.142).

Proof. In order to prove (12.141), assume \(i\le l \le j-2\). Since \(r_{i,j-1,j}=1\), repeated application of the recursive formula \(r_{i,k-1,j}=R(\rho _{ikj})\sqrt{r_{ikj}u_{k+1}}\) in (12.38), for \(k=j-1,\ldots ,l+1\), leads to

$$\begin{aligned} r_{ilj} = \prod _{k=l+1}^{j-1} R(\rho _{ikj})^{2^{-(k-l-1)}}u_{k+1}^{2^{-(k-l)}}. \end{aligned}$$
(12.143)

We know from (12.48) that all \(\rho _{ilj}=O(1)\) as \(N\rightarrow \infty \). From this and the definition of the function \(R(\rho )\) in (12.41), it follows that \(R(\rho _{ilj})=\varTheta (1)\) as \(N\rightarrow \infty \), so that

$$\begin{aligned} r_{ilj} = \varTheta \left( \prod _{k=l+1}^{j-1} u_{k+1}^{2^{-(k-l)}} \right) . \end{aligned}$$
(12.144)

Then both parts of (12.141) follow by inserting the first equation of (12.46) into (12.144). The proof of (12.142) when \(j+2\le l\le i\) is analogous. Since \(r_{i,j+1,j}=1\), we use a recursion for \(k=j+1,\ldots ,l-1\) in order to arrive at the explicit formula

$$ r_{ilj} = \prod _{k=j+1}^{l-1} R(\rho _{ikj})^{2^{-(l-k-1)}}v_{k-1}^{2^{-(l-k)}}. $$

Then use (12.48) and the third equation of (12.46) to verify that \(r_{ilj}\) satisfies (12.142).    \(\square \)

Lemma 12.3

Let \(q_{ij}\), \(q_{ilj}\), \(r_{ij}\), and \(r_{ilj}\) be the probabilities defined in connection with (12.35)–(12.40). Consider a fixed \(i\in \{0,1,\ldots ,m-1\}\), and let F(i) and B(i) be the indices defined in (12.44). Then,

$$\begin{aligned} \begin{array}{ll} &{}q_{ilF(i)} \sim r_{ilF(i)}, l=i,i+1,\ldots ,F(i)-1,\\ &{}q_{ilB(i)} \sim r_{ilB(i)}, l=B(i)+1,\ldots ,i, \text{ if } i>0 \text{ and } \hat{\pi }_{iB(i)}>0 \end{array} \end{aligned}$$
(12.145)

as \(N\rightarrow \infty \). In particular,

$$\begin{aligned} \begin{array}{ll} &{}q_{iF(i)} \sim r_{iF(i)}, \\ &{}q_{iB(i)} \sim r_{iB(i)}, \text{ if } i>0 \text{ and } \hat{\pi }_{iB(i)}>0. \end{array} \end{aligned}$$
(12.146)

Sketch of proof. Notice that (12.146) is a direct consequence of (12.145), since \(q_{iij}=q_{ij}\) and \(r_{iij}=r_{ij}\). We will only motivate the upper part of (12.145), since the lower part is treated similarly. Consider a fixed \(i\in \{0,\ldots ,m-1\}\), and for simplicity of notation we write \(j=F(i)\). We will argue that

$$\begin{aligned} q_{ilj} \sim r_{ilj} \end{aligned}$$
(12.147)

for \(l=j-1,\ldots ,i\) by means of induction. Formula (12.147) clearly holds when \(l=j-1\), since, by definition, \(q_{i,j-1,j}=r_{i,j-1,j}=1\). As for the induction step, let \(i+1\le l \le j-1\), and suppose (12.147) has been proved for l. Then recall the recursive formula

$$\begin{aligned} r_{i,l-1,j} = R(\rho _{ilj})\sqrt{u_{l+1}r_{ilj}} \end{aligned}$$
(12.148)

from (12.38), with R defined in (12.41). If

$$\begin{aligned} q_{i,l-1,j} \sim R(\rho _{ilj})\sqrt{u_{l+1}q_{ilj}} \end{aligned}$$
(12.149)

holds as well, then (12.147) has been shown for \(l-1\), and the induction proof is completed. Without loss of generality we may assume that \(j\ge i+2\), since otherwise the induction proof of (12.147) stops after the first trivial step \(l=j-1\).

In order to motivate (12.149), we will look at what happens when the population is in fixed state i. Suppose \({\varvec{Z}}_{\tau _k}={\varvec{e}}_i\), and recall that \(\tau _{k+1}^\prime \) is the time point when the first successful mutation \(i\rightarrow i+1\) in \((\tau _k,\tau _{k+1})\) arrives. Therefore, if \({\varvec{Z}}_{\tau _{k+1}}={\varvec{e}}_j\), there is a non-empty set \(J=\{i+1,\ldots ,j-1\}\) of types that must be present among some of the descendants of the successful mutation, before a mutation \(j-1\rightarrow j\) arrives at some time point \(\tau _{k+1}^{\prime \prime }\in (\tau _{k+1}^\prime ,\tau _{k+1})\). Put \(Z_{tJ} = \max _{l\in J} Z_{tl}\). The regularity condition

$$\begin{aligned} P(\sup _{\tau _{k+1}^{\prime }<t<\tau _{k+1}^{\prime \prime }} Z_{tJ} > \varepsilon |{\varvec{Z}}_{\tau _k}={\varvec{e}}_i) \rightarrow 0 \end{aligned}$$
(12.150)

for all \(\varepsilon >0\) as \(N\rightarrow \infty \), assures that with high probability, none of the alleles in J reaches a high frequency after the successful \(i\rightarrow i+1\) mutation occurred, and before allele j first appears. We will need this condition below, for verifying the induction step (12.149).

The rationale for (12.150) is that fixation events \(i\rightarrow j\) will happen much more frequently than other types of fixation events \(i\rightarrow l\) with \(l\in J\), because of (12.44). We will motivate that

$$\begin{aligned} P = P(\sup _{\tau _k<t<\tau _k + a\mu _i^{-1}} Z_{tJ} > \varepsilon |{\varvec{Z}}_{\tau _k}={\varvec{e}}_i) \rightarrow 0 \end{aligned}$$
(12.151)

for any \(a>0\) and \(\varepsilon >0\) as \(N\rightarrow \infty \), with \(\mu _i\) the rate of leaving fixation state i. In Lemma 12.1 we motivated that \(\tau _{k+1}^\prime -\tau _k = O_p(\mu _i^{-1})\), and in Lemma 12.5 we will argue that \(\tau _{k+1}^{\prime \prime }-\tau _{k+1}^\prime = o_p(\mu _i^{-1})\). Since this implies \(\tau _{k+1}^{\prime \prime }-\tau _k = O_p(\mu _i^{-1})\), formula (12.150) will follow from (12.151).

In order to motivate (12.151), assume for simplicity there are no backward mutations (the proof is analogous but more complicated if we include back mutations as well). If allele \(l\in J\) exceeds frequency \(\varepsilon \), we refer to this as a semi-fixation event. Let \(\lambda _{il}(\varepsilon )\) be the rate at which this happens after time \(\tau _k\), and before the next fixed state is reached. Then, the rate at which semi-fixation events happen among some \(l\in J\), is

$$\begin{aligned} \begin{aligned} \lambda _{iJ}(\varepsilon ) =&\sum \nolimits _{l\in J} \lambda _{il}(\varepsilon )\\ \sim&Nu_{i+1}\sum \nolimits _{l\in J} q_{il}\beta _{N\varepsilon }\left( \frac{s_l}{s_i}\right) \\ \le&C(\varepsilon ) \times Nu_{i+1}\sum \nolimits _{l\in J} q_{il}\beta \left( \frac{s_l}{s_i}\right) \\ \sim&C(\varepsilon ) \sum \nolimits _{l\in J} \lambda _{il}. \end{aligned} \end{aligned}$$
(12.152)

In the second step of (12.152) we introduced \(\beta _{N\varepsilon }(s)\), the probability that a single mutant with fitness s reaches frequency \(\varepsilon \), if all other individuals have fitness 1 and there are no mutations. We made use of

$$\begin{aligned} \lambda _{il}(\varepsilon ) \sim Nu_{i+1}q_{il}\beta _{N\varepsilon }\left( \frac{s_l}{s_i}\right) . \end{aligned}$$
(12.153)

This is motivated as in the proof of Lemma 12.4, in particular Eqs. (12.163), (12.164) and variant of (12.167) for semi-fixation rather than fixation. In the third step of (12.152) we utilized that \(\beta _{N\varepsilon }(s)\) is larger than the corresponding fixation probability \(\beta (s)=\beta _N(s)\) for a population of size N. In order to quantify how much larger the fixation probability of the smaller population of size \(N\varepsilon \) is, we introduced \(C(\varepsilon )\), an upper bound of \(\beta _{N\varepsilon }(s_l/s_i)/\beta (s_l/s_i)\) that holds for all \(l\in J\). An expression for \(C(\varepsilon )\) can be derived from (12.32) if \(s_l/s_i\) is sufficiently close to 1. Indeed, we know from (12.48) that \(s_l/s_i\rightarrow 1\) as \(N\rightarrow \infty \). However, we need to sharpen this condition somewhat, to

$$\begin{aligned} s = \frac{s_l}{s_i} \ge 1 + \frac{x}{N} \end{aligned}$$
(12.154)

for all \(l\in J\) and some fixed \(x<0\). Then it follows from (12.32) that

$$ \frac{\beta _{N\varepsilon }(s)}{\beta _N(s)} = \frac{s^{-N}-1}{s^{-N\varepsilon }-1} \le \frac{(1+x/N)^{-N}-1}{(1+x/N)^{-N\varepsilon }-1} \rightarrow \frac{e^{-x}-1}{e^{-\varepsilon x}-1} =: C(\varepsilon ) $$

is a constant not depending on N. Finally, in the last step of (12.152) we assumed

$$\begin{aligned} \lambda _{il} \sim Nu_{i+1}q_{il}\beta \left( \frac{s_l}{s_i}\right) , \quad l\in J. \end{aligned}$$
(12.155)

This is motivated in the same way as Eq. (12.153), making use of (12.163)–(12.164) and (12.167).

Assuming that semi-fixation events arrive according to a Poisson process with intensity \(\lambda _{iJ}(\varepsilon )\), formula (12.151) follows from (12.44) to (12.152), since

$$\begin{aligned} \begin{aligned} P&\sim 1 - \exp \left( -\lambda _{iJ}(\varepsilon ) \times \frac{a}{\mu _i}\right) \\&\le 1 - \exp \left( -C(\varepsilon ) \sum \nolimits _{l\in J} \lambda _{il} \times \frac{a}{\mu _i}\right) \\&= 1 - \exp (-C(\varepsilon )a \sum \nolimits _{l\in J} p_{il})\\&\rightarrow 1 - \exp (-C(\varepsilon )a \sum \nolimits _{l\in J} \pi _{il})\\&= 1 - \exp (-C(\varepsilon )a \sum \nolimits _{l\in J} \hat{\pi }_{il})\\&= 0 \end{aligned} \end{aligned}$$
(12.156)

as \(N\rightarrow \infty \). In the third step of (12.156) we used (12.16) to conclude that \(p_{il}=\lambda _{il}/\mu _i\), and in the fourth step we utilized (12.17). In the fifth step of (12.156) we claimed that \(\pi _{il}=\hat{\pi }_{il}\) for \(l\in J\), Although we have not given a strict proof of this, it seems reasonable in view of the definitions of \(\pi _{il}\) and \(\hat{\pi }_{il}\) in (12.17) and (12.43), together with (12.35), (12.155), and the fact that \(q_{il}\sim r_{il}\) for \(i<l<F(i)\) (which can be proved by induction with respect to l). Finally, in the last step of (12.156) we invoked (12.44), which implies \(\hat{\pi }_{il}=0\) for all \(l\in J = \{i+1,\ldots ,F(i)-1\}\).

Equation (12.150) enables us to approximate the allele frequency \(Z_{tl}\) by a branching process with mutations, in order to motivate (12.149). (A strict proof of this for a neutral model \(s_0=\cdots s_{m-1}=1\) can be found in Theorem 2 of Durrett et al. [20].) We will look at the fate of the first \(l-1\rightarrow l\) mutation at time \(\tau \in (\tau _{k+1}^\prime ,\tau _{k+1}^{\prime \prime })\), that is a descendant of the first successful \(i\rightarrow i+1\) mutation at time \(\tau _{k+1}^\prime \), and arrives before the first \(j-1\rightarrow j\) mutation at time \(\tau _{k+1}^{\prime \prime }\). Recall that \(q=q_{i,l-1,j}\) is the probability that this l mutation gets an offspring that mutates into type j, and \(q^\prime =q_{ilj}\) is the corresponding probability that one of its descendants, an \(l\rightarrow l+1\) mutation, gets a type j offspring. Let also \(r^\prime =r_{ilj}\) be the approximation \(q^\prime \), and write \(s=s_l/s_i\) for the ratio between the selection coefficients of alleles l and i. With this simplified notation, according to (12.149), we need to show that

$$\begin{aligned} q \sim R(\rho )\sqrt{uq^\prime } \end{aligned}$$
(12.157)

as \(N\rightarrow \infty \), where \(u=u_{l+1}\), and \(\rho = \rho _{ilj}\) is defined in (12.37), i.e.

$$\begin{aligned} s = 1 + \rho \sqrt{ur^\prime }. \end{aligned}$$
(12.158)

We make the simplifying assumption that at time \(\tau \), the population has one single type l individual, the one that mutated from type \(l-1\) at this time point, whereas all other \(N-1\) individuals have type i. (Recall that we argued in Lemma 12.1 that such an assumption is asymptotically accurate.) In order to compute the probability q for the event A that this individual gets a descendant of type j, we condition on the next time point when one individual dies and is replaced by the offspring of an individual that reproduces. Let D and R be independent indicator variables for the events that the type l individual dies and reproduces respectively. Using the definition of the Moran process in Sect. 12.2, this gives an approximate recursive relation

$$\begin{aligned} \begin{aligned} q&= P(A)\\&= P(D=0,R=0)P(A|D=0,R=0)\\&+ P(D=0,R=1)P(A|D=0,R=1)\\&+ P(D=1,R=0)P(A|D=1,R=0)\\&+ P(D=1,R=1)P(A|D=1,R=1)\\&= \left( 1-\frac{1}{N}\right) \frac{N-1}{N-1+s} \times q\\&+ \left( 1-\frac{1}{N}\right) \frac{s}{N-1+s} \\&\times \left[ u(q^\prime +q-q^\prime q) + vq + (1-u-v)(2q-q^2)\right] \\&+ \frac{1}{N} \frac{N-1}{N-1+s} \times 0\\&+ \frac{1}{N} \frac{s}{N-1+s} \times \left[ uq^\prime + v\times 0 + (1-u-v)q\right] \end{aligned} \end{aligned}$$
(12.159)

for q, where \(v=v_{l-1}\) is the probability of a back mutation \(l\rightarrow l-1\). In the last step of (12.159) we retained the exact transition probabilities of the Moran process, but we used a branching process approximation for the probability q that the type l mutation at time \(\tau \) gets a type j descendant. This approximation relies on (12.150), and it means that descendants of the type l mutation that are alive at the same time point, have independent lines of descent after this time point. For instance, in the second term on the right hand side of (12.159), a type i individual dies and the type l individual reproduces (\(D=0\), \(R=1\)). Then there are three possibilities: First, the offspring of the type l individual mutates to \(l+1\) with probability u. Since the type l individual and its type \(l+1\) offspring have independent lines of descent, the probability is \(1-(1-q^\prime )(1-q)=q^\prime +q-q^\prime q\) that at least one of them gets a type j descendant. Second, if the offspring mutates back to \(l-1\) (with probability v), its type l parent has a probability q of getting a type j descendant. Third, if the offspring does not mutate (with probability \(1-u-v\)), there are two type l individuals, with a probability \(1-(1-q)^2 = 2q-q^2\) that at least one of them gets a type j offspring.

Equation (12.159) is quadratic in q. Dividing both sides of it by \(s/(N-1+s)\), it can be seen, after some computations, that this equation simplifies to \(aq^2 + bq + c = 0\), with

$$\begin{aligned} \begin{aligned} a&= (1-u-v)\left( 1-\frac{1}{N}\right) \sim 1,\\ b&= \frac{N-1}{N}\times \frac{1-s}{s} + u(1+q^\prime -\frac{q^\prime }{N}) + v\\&\sim - \frac{\rho \sqrt{ur^\prime }}{1+\rho \sqrt{ur^\prime }} + (1+q^\prime )u + v\\&\sim - \rho \sqrt{uq^\prime },\\ c&= -uq^\prime , \end{aligned} \end{aligned}$$
(12.160)

as \(N\rightarrow \infty \). When simplifying the formula for b, we used (12.158) in the second step, the induction hypothesis (12.147) in the last step (since it implies \(q^\prime \sim r^\prime \)), and additionally we assumed in the last step that \((1+q^\prime )u+v = o(\sqrt{ur^\prime })\). In order to justify this, from the second equation of (12.46) we know that \(v=O(u)\), and since \(q^\prime \le 1\), it suffices to verify that \(u = o(\sqrt{ur^\prime })\), or equivalently that \(r^\prime =\varOmega (u)\). But this follows from (12.46), (12.141), and the fact that \(u=u_{l+1}\), since

$$ r^\prime = r_{ilj} = \varOmega \left( u_{l+2}^{1-2^{-(j-l-1)}}\right) = \varOmega \left( u^{1-2^{-(j-l-1)}}\right) = \varOmega (u), $$

where in the last step we used that \(l\le j-1\). This verifies the asymptotic approximation of b in (12.160).

To conclude, in order to prove of (12.157), we notice that the only positive solution to the quadratic equation in q, with coefficients as in (12.160), is

$$ \begin{aligned} q&\sim \frac{\rho \sqrt{uq^\prime }}{2} + \sqrt{\frac{\rho ^2 uq^\prime }{4} + uq^\prime }\\&= \frac{\rho + \sqrt{\rho ^2+4}}{2}\sqrt{uq^\prime }\\&= R(\rho )\sqrt{uq^\prime }, \end{aligned} $$

where in the last step we invoked the definition of \(R(\rho )\) in (12.41). This finishes the proof of the induction step (12.149) or (12.157), and thereby the proof of (12.147).

We end this proof by a remark: Recall that \(r_{ij}\) in (12.36) is an approximation \(q_{ij}\), obtained from recursion (12.38) or (12.148) when \(j>i\), and from (12.40) when \(j<i\). A more accurate (but less explicit) approximation of \(q_{ij}\) is obtained, when \(i<j\), by recursively solving the quadratic equation \(ax^2 + bx + c=0\), with respect to \(x=r_{i,l-1,j}\) for \(l=j-1,\ldots ,i+1\), and finally putting \(r_{ij}=r_{iij}\). The coefficients of this equation are defined as in (12.160), with \(r^\prime =r_{ilj}\) instead of \(q^\prime \). When \(j<i\), the improved approximation of \(q_{ij}\) is defined analogously.    \(\square \)

Lemma 12.4

Let \(\mu _i\) be the rate (12.15) at which a successful forward or backward mutation occurs in a homogeneous type i population, and let \(\hat{\mu }_i\) in (12.42) be its approximation. Define the asymptotic transition probabilities \(\pi _{ij}\) between fixed population states as in (12.17), and their approximations \(\hat{\pi }_{ij}\) as in (12.43). Then

$$\begin{aligned} \mu _i \sim \hat{\mu }_i, \quad i=0,\ldots ,m-1, \end{aligned}$$
(12.161)

as \(N\rightarrow \infty \), and

$$\begin{aligned} \pi _{ij} = \hat{\pi }_{ij}, \quad i,j=0,1,\ldots ,m. \end{aligned}$$
(12.162)

Sketch of proof. Consider a time point \(\tau _k\) when the population becomes fixed with type i, so that \({\varvec{Z}}_{\tau _k}={\varvec{e}}_i\). Denote by \(f_{ij}\) the probability a forward mutation \(i\rightarrow i+1\), which appears at a time point later than \(\tau _k\), is the first successful mutation after \(\tau _k\), that its descendants have taken over the population by time \(\tau _{k+1}\), and that all of them by that time have type j (so that \({\varvec{Z}}_{\tau _{k+1}}={\varvec{e}}_j\)). Likewise, when \(j<i\) and \(i\ge 1\), we let \(b_{ij}\) refer to the probability that if a backward mutation \(i\rightarrow i-1\) arrives, it is successful, its descendants have taken over the population by time \(\tau _{k+1}\), and all of them have type j. For definiteness we also put \(b_{0j}=0\). We argue that

$$\begin{aligned} \lambda _{ij} \sim \left\{ \begin{array}{ll} Nu_{i+1}f_{ij}, &{} j>i,\\ Nv_{i-1}b_{ij}, &{} j<i, \end{array}\right. \end{aligned}$$
(12.163)

since the event that the population at time \(\tau _{k+1}\) have descended from more than one \(i\rightarrow i\pm 1\) mutation that occurred in the time interval \((\tau _k,\tau _{k+1})\), is asymptotically negligible.

Let \(\beta _j({\varvec{z}})\) be the probability that the descendants of a type j individual, who lives in a population with a type configuration \({\varvec{z}}\), takes over the population so that it becomes homogeneous of type j. Although \(\beta _j({\varvec{z}})\) depends on the mutation rates \(u_1,\ldots ,u_m,v_0,\ldots ,v_{m-1}\) as well as the selection coefficients \(s_1,\ldots ,s_m\), this is not made explicit in the notation. The probabilities \(f_{ij}\) and \(b_{ij}\) in (12.163) can be written as a product

$$\begin{aligned} \begin{aligned} f_{ij}&= q_{ij}E\left[ \beta _j({\varvec{Z}}_{\tau ^{\prime \prime }_{k+1}})|A_j,{\varvec{Z}}_{\tau _k}={\varvec{e}}_i\right] , j>i,\\ b_{ij}&= q_{ij}E\left[ \beta _j({\varvec{Z}}_{\tau ^{\prime \prime }_{k+1}})|A_j,{\varvec{Z}}_{\tau _k}={\varvec{e}}_i\right] , j<i \end{aligned} \end{aligned}$$
(12.164)

of two terms. Recall that the first term, \(q_{ij}\), is the probability that the first successful mutation \(i\rightarrow i\pm 1\) at time \(\tau _{k+1}^\prime > \tau _k\) has a descendant that mutates into type j at some time \(\tau _{k+1}^{\prime \prime }\in (\tau _{k+1}^{\prime },\tau _{k+1})\). The second term is the probability that this mutation has spread to the rest of the population by time \(\tau _{k+1}\). The conditional expectation of this second term is with respect to variations in \({\varvec{Z}}_{\tau _{k+1}^{\prime \prime }}\), and the conditioning is with respect to \(A_{j}\), the event that the mutation at time \(\tau _{k+1}^{\prime \prime }\) is into type j.

In order to compare the transition rates in (12.163) with the approximate ones in (12.35), we notice that the latter can be written as

$$\begin{aligned} \hat{\lambda }_{ij} = \left\{ \begin{array}{ll} Nu_{i+1}\hat{f}_{ij}, &{} j>i,\\ Nv_{i-1}\hat{b}_{ij}, &{} j<i, \end{array}\right. \end{aligned}$$
(12.165)

where

$$\begin{aligned} \begin{aligned} \hat{f}_{ij} =&r_{ij}\beta (s_{j}/s_i),&j>i,\\ \hat{b}_{ij} =&r_{ij}\beta (s_{j}/s_i),&j<i, \end{aligned} \end{aligned}$$
(12.166)

\(r_{ij}\) is the approximation of \(q_{ij}\) defined in (12.36), whereas \(\beta (s_{j}/s_i)\) is the probability that a single type j individual gets fixed in a population without mutations, where all other individuals have type i.

We will argue that the probabilities in (12.166) are asymptotically accurate approximations of those in (12.164), for all pairs ij of states that dominate asymptotically, that is, those pairs for which \(j\in \{B(i),F(i)\}\). In Lemma 12.3 we motivated that \(r_{ij}\) is an asymptotically accurate approximation of \(q_{ij}\) for all such pairs of states. Likewise, we argue that \(\beta (s_{j}/s_i)\) is a good approximation of the conditional expectation in (12.164). Indeed, following the reasoning of Lemma 12.3, since none of the intermediate alleles, between i and j, will reach a high frequency before the type j mutant appears at time \(\tau _{k+1}^{\prime \prime }\), it follows that most of the other \(N-1\) individuals will have type i at this time point. Consequently,

$$\begin{aligned} E\left[ \beta _j({\varvec{Z}}_{\tau ^{\prime \prime }_{k+1}})|A_j,{\varvec{Z}}_{\tau _k}={\varvec{e}}_i\right] \sim \beta _j\left( \frac{N-1}{N}{\varvec{e}}_i + \frac{1}{N}{\varvec{e}}_j\right) \sim \beta \left( \frac{s_j}{s_i}\right) \end{aligned}$$
(12.167)

as \(N\rightarrow \infty \). In the last step of (12.167) we used that new mutations between time points \(\tau _{k+1}^{\prime \prime }\) and \(\tau _{k+1}\) can be ignored, because of the smallness (12.4)–(12.5) of the mutation rates. Since \(\beta _j\left( (N-1){\varvec{e}}_i/N + {\varvec{e}}_j/N\right) \) is the fixation probability of a single type j mutant that has selection coefficient \(s_j/s_i\) relative to the other \(N-1\) type i individuals, it is approximately equal to the corresponding fixation probability \(\beta (s_j/s_i)\) of a mutation free Moran model. It therefore follows from (12.164) and (12.166) that

$$\begin{aligned} \begin{aligned} \hat{f}_{iF(i)} \sim&f_{iF(i)}, i=0,\ldots ,m-1,\\ \hat{b}_{iB(i)} \sim&b_{iB(i)}, i=1,\ldots ,m-1 \text{ and } B(i)\ne \emptyset \end{aligned} \end{aligned}$$
(12.168)

as \(N\rightarrow \infty \).

Next we consider pairs of types ij such that \(j\notin \{B(i),F(i)\}\). We know from (12.44), (12.165) and (12.166) that \(\hat{f}_{il}=o(\hat{f}_{iF(i)})\) for all \(l>i\) such that \(l\ne F(i)\). It is therefore reasonable to assume that \(f_{il}=o(f_{iF(i)})\) as well for all \(l>i\) with \(l\ne F(i)\), although \(\hat{f}_{il}\) need not necessarily be a good approximation of \(f_{il}\) for all these l. The same argument also applies to backward mutations when \(B(i)\ne \emptyset \) and \(\hat{\pi }_{iB(i)}>0\), that is, we should have \(f_{il}=o(f_{iB(i)})\) for all \(l<i\) such that \(l\ne B(i)\).

Putting things together, it follows from (12.44), (12.163), (12.165), (12.168), and the last paragraph that the approximate rate (12.42) at which a homogeneous type i population is transferred into a new fixed state, satisfies

$$\begin{aligned} \begin{aligned} \hat{\mu }_i&= Nv_{i-1}\sum \nolimits _{j=0}^{i-1} \hat{b}_{ij} + Nu_{i+1}\sum \nolimits _{j=i+1}^m \hat{f}_{ij}\\&\sim 1\left( \hat{\pi }_{iB(i)}>0\right) Nv_{i-1}\hat{b}_{iB(i)} + Nu_{i+1}\hat{f}_{iF(i)}\\&\sim 1\left( \hat{\pi }_{iB(i)}>0\right) Nv_{i-1}b_{iB(i)} + Nu_{i+1}f_{iF(i)}\\&\sim Nv_{i-1}\sum \nolimits _{j=0}^{i-1} b_{ij} + Nu_{i+1}\sum \nolimits _{j=i+1}^m f_{ij}\\&\sim \mu _i, \end{aligned} \end{aligned}$$
(12.169)

as \(N\rightarrow \infty \), in agreement with (12.161). Formulas (12.16)–(12.17), (12.43)–(12.44), (12.163), (12.165), and (12.168)–(12.169) also motivate why \(\pi _{ij}\) should equal \(\hat{\pi }_{ij}\), in accordance with (12.162).    \(\square \)

Lemma 12.5

The regularity condition (12.47) of Theorem 12.2 implies that (12.26) holds.

Sketch of proof. Suppose \({\varvec{Z}}_{\tau _k}={\varvec{e}}_i\) and \({\varvec{Z}}_{\tau _{k+1}}={\varvec{e}}_j\) for some \(i\in I_{\text{ as }}\) and \(j\ne i\). Write

$$\begin{aligned} \tau _{k+1}-\tau ^\prime _{k+1} = \left\{ \begin{array}{ll} \sum _{l=i+1}^{j-1} \sigma _l + \sigma _{\text{ fix }} := \sigma _{\text{ tunnel }} + \sigma _{\text{ fix }}, &{} j > i,\\ \sum _{l=j+1}^{i-1} \sigma _l + \sigma _{\text{ fix }} := \sigma _{\text{ tunnel }} + \sigma _{\text{ fix }}, &{} j < i. \end{array}\right. \end{aligned}$$
(12.170)

If \(j>i\), then the successful mutation at time \(\tau _{k+1}^\prime \) is from i to \(i+1\). This type \(i+1\) mutation has a line of descent with individuals that mutate to types \(i+2,\ldots ,j\), before the descendants of the type j mutation take over the population. The first term \(\sigma _{\text{ tunnel }}=\tau _{k+1}^{\prime \prime }-\tau _{k+1}^\prime \) on the right hand side of (12.170) is the time it takes for the type \(i+1\) mutation to tunnel into type j. It is the sum of \(\sigma _l\), the time it takes for the type \(l+1\) mutation to appear after the type l mutation, for all \(l=i+1,\ldots ,j-1\). The second term \(\sigma _{\text{ fix }}=\tau _{k+1}-\tau _{k+1}^{\prime \prime }\) on the right hand side of (12.170) is the time it takes for j to get fixed after the j mutation first appears. When \(j<i\), we interpret the terms of (12.170) analogously. It follows from (12.170) that in order to prove (12.26), it suffices to show that

$$\begin{aligned} \begin{aligned} \sigma _{\text{ tunnel }}&= o_p(\mu _{\text{ min }}^{-1}),\\ \sigma _{\text{ fix }}&= o_p(\mu _{\text{ min }}^{-1}), \end{aligned} \end{aligned}$$
(12.171)

as \(N\rightarrow \infty \) for all asymptotic states \(i\in I_{\text{ as }}\). When \(j>i\), we know from (12.44) to (12.162) that with probability tending to 1, \(j=F(i)\). Following the argument from the proof of Theorem 2 of Durrett et al. [20], we have that

$$\begin{aligned} \sigma _l = O_p(q_{ilj}^{-1}). \end{aligned}$$
(12.172)

In the special case when \(l=i+1\) and \(j=i+2\), formula (12.172) can also be deduced from the proof of Theorem 12.3, by looking at \(G(x)/G(\infty )\) in (12.191). Using (12.172), we obtain the upper part of (12.171), since

$$\begin{aligned} \begin{aligned} \sigma _{\text{ tunnel }} =&\sum \nolimits _{l=i+1}^{j-1} \sigma _l\\ =&O_p\left( \sum \nolimits _{l=i+1}^{j-1} q_{ilj}^{-1}\right) \\ =&o_p(q_{iij}^{-1})\\ =&o_p(q_{ij}^{-1})\\ =&o_p(\mu _i^{-1})\\ =&o_p(\mu _{\text{ min }}^{-1}). \end{aligned} \end{aligned}$$
(12.173)

In the second step of (12.173) we used that \(q_{iij}\le q_{ilj}\) for \(i<l\), which follows from the definition of these quantities, in the third step we invoked \(q_{ij}=q_{iij}\), and in the fourth step we applied the relation

$$\begin{aligned} \mu _i = \varTheta \left( Nu_{i+1}q_{ij}\beta \left( \frac{s_i}{s_j}\right) \right) = o(q_{ij}). \end{aligned}$$
(12.174)

The first step of (12.174) is motivated as in Lemma 12.4, since \(j=F(i)\) and hence \(\pi _{ij}>0\), whereas the second step follows from (12.4) and the fact that \(\beta (s_i/s_j)\) is bounded by 1. Finally, the fourth step of (12.173) follows from the definition of \(\mu _{\text{ min }}\) in (12.24), since (12.174) applies to any \(i\in I_{\text{ as }}\). When \(j<i\), the first part of (12.171) is shown analogously.

In order to verify the second part of (12.171), we know from the motivation of Lemma 12.4 that with high probability, \(\sigma _{\text{ fix }}\) is the time it takes for descendants of the type j mutation to take over the population, ignoring the probability that descendants of other individuals first mutated into j and then some of them survived up to time \(\tau _{k+1}\) as well. We further recall from Lemma 12.4 that because of the smallness (12.4)–(12.5) of the mutation rates, right after the j mutation has arrived at time \(\tau _{k+1}^{\prime \prime }\), we may assume that the remaining \(N-1\) individuals have type i, and after that no other mutation occurs until the j allele gets fixed at time \(\tau _{k+1}\). With these assumptions, \(\sigma _{\text{ fix }}\) is the time for one single individual with selection coefficient \(s_j/s_i\) to get fixed in a two-type Moran model without mutations, where all other individuals have selection coefficient 1. From Sect. 12.5 it follows that \(E(\sigma _{\text{ fix }})\sim \alpha (s_j/s_i)\), and therefore the second part of (12.171) will be proved if we can verify that

$$ \alpha \left( \frac{s_j}{s_i}\right) = o(\mu _{\text{ min }}^{-1}) $$

holds for all \(i\in I_{\text{ as }}\) and \(j\in \{B(i),F(i)\}\) as \(N\rightarrow \infty \). This is equivalent to showing that

$$\begin{aligned} \mu _{\text{ min }}= o\left( \min _{i\in I_{\text{ as }}}\min \left[ \alpha ^{-1}\left( \frac{s_{B(i)}}{s_i}\right) ,\alpha ^{-1}\left( \frac{s_{F(i)}}{s_i}\right) \right] \right) \end{aligned}$$
(12.175)

as \(N\rightarrow \infty \), where the \(\alpha ^{-1}(s_{B(i)}/s_i)\)-term is included only when \(B(i)\ne \emptyset \) (or equivalently, when \(\pi _{iB(i)}>0\)). Using (12.44), (12.46), (12.141), (12.161), (12.168), and (12.169), we find that

$$\begin{aligned} \begin{aligned} \mu _i \sim&\hat{\mu }_i\\ =&O\left( Nu_{i+1}r_{iF(i)}\beta (s_{F(i)}/s_j)\right) \\ =&O\left( Nu_{i+1}u_{F(i)}^{1-2^{-(F(i)-i-1)}}\beta (s_{F(i)}/s_j)\right) \\ =&O\left( Nu_{F(i)}^{2-2^{-(F(i)-i-1)}}\beta (s_{F(i)}/s_j)\right) . \end{aligned} \end{aligned}$$
(12.176)

Inserting (12.176) into the definition of \(\mu _{\text{ min }}\) in (12.24), we obtain

$$ \mu _{\text{ min }}= O\left( \min _{i\in I_{\text{ long }}} Nu_{F(i)}^{2-2^{-(F(i)-i-1)}}\beta (s_{F(i)}/s_j)\right) , $$

and formula (12.175) follows, because of (12.47).    \(\square \)

Proof of Theorem 12.2. We need to establish that the limit result (12.49) of Theorem 12.2 follows from Theorem 12.1. To this end, we first need to show that all \(\hat{\lambda }_{ij}\) are good approximations of \(\lambda _{ij}\), in the sense specified by Theorem 12.2, i.e. \(\pi _{ij}=\hat{\pi }_{ij}\) and \(\hat{\mu }_i/\hat{\mu }_{\text{ min }}\rightarrow \kappa _i\) as \(N\rightarrow \infty \). But this follows from Lemma 12.4, and the definitions of \(\mu _{\text{ min }}\) and \(\hat{\mu }_{\text{ min }}\) in (12.24) and Theorem 12.2. Then it remains to check those two regularity conditions (12.18) and (12.26) of Theorem 12.1 that are not present in Theorem 12.2. But (12.18) follows from (12.44) to (12.162), since these two equations imply \(\pi _{iF(i)}>0\) for all \(i=0,\ldots ,m-1\), and (12.26) follows from Lemma 12.5.    \(\square \)

Proof of (12.109). Let

$$\begin{aligned} \theta _i = u\times E(T_m|{\varvec{Z}}_0={\varvec{e}}_i) \end{aligned}$$
(12.177)

be the standardized expected waiting time until all m mutations have appeared and spread in the population, given that it starts in fixed state i. Our goal is to find an explicit formula for \(\theta _0\), and then show that (12.109) is an asymptotically accurate approximation of this explicit formula as \(m\rightarrow \infty \).

Recall that \(\varSigma _{ij}\) in (12.107) are the elements of the intensity matrix, for the Markov process that switches between fixed population states, when time has been multiplied by \(\hat{\mu }_{\text{ min }}=u\). When the population is in fixed state i, the standardized expected waiting time until the next transition is \(1/(-\varSigma _{ii})\). By conditioning on what happens at this transition, it can be seen that the standardized expected waiting times in (12.177), satisfy a recursive relation

$$\begin{aligned} \theta _i = \frac{1}{-\varSigma _{ii}} + \frac{\varSigma _{i,i-1}}{-\varSigma _{ii}}\times \theta _{i-1} + \frac{\varSigma _{i,i+1}}{-\varSigma _{ii}}\times \theta _{i+1}, \end{aligned}$$
(12.178)

for \(i=0,1,\ldots ,m-1\), assuming \(\theta _{-1}=0\) on the right hand side of (12.178) when \(i=0\), and similarly \(\theta _m=0\) when \(i=m-1\). Inserting the values of \(\varSigma _{ij}\) from (12.107) into (12.178), we can rewrite the latter equation as

$$\begin{aligned} \theta _0-\theta _1 = \frac{1}{m} =: b_0 \end{aligned}$$
(12.179)

and

$$\begin{aligned} \theta _i-\theta _{i+1} = \frac{Ci}{m-i}(\theta _{i-1}-\theta _i) + \frac{1}{m-i} =: a_i(\theta _{i-1}-\theta _i) + b_i, \end{aligned}$$
(12.180)

for \(i=1,\ldots ,m-1\), respectively. We obtain an explicit formula for \(\theta _0\) by first solving the linear recursion for \(\theta _i-\theta _{i+1}\) in (12.179)–(12.180), and then summing over i. This yields

$$\begin{aligned} \theta _0 = \sum _{i=0}^{m-1} (\theta _i-\theta _{i+1}) = \sum _{i=0}^{m-1} \sum _{k=0}^i \theta _{ik}, \end{aligned}$$
(12.181)

where

$$\begin{aligned} \theta _{ik} = b_k \prod _{j=k+1}^i a_j = \frac{\left( {\begin{array}{c}m-1\\ k\end{array}}\right) }{(m-k){\left( {\begin{array}{c}m-1\\ i\end{array}}\right) }}\times C^{i-k}. \end{aligned}$$
(12.182)

Formulas (12.181)–(12.182) provide the desired explicit formula for \(\theta _0\). When \(C=0\), it is clear that

$$ \begin{aligned} \theta _0 =&\sum \nolimits _{i=0}^{m-1} \theta _{ii}\\ =&\sum \nolimits _{i=0}^{m-1} 1/(m-i)\\ \sim&\log (m) + \gamma , \end{aligned} $$

where \(\gamma \approx 0.5772\) is the Euler–Mascheroni constant. This proves the upper half of (12.109). For \(C>0\), we will show that when m gets large, the (standardized) expected waiting time until the last mutant gets fixed, \(\theta _{m-1}-\theta _m=\theta _{m-1}\), dominates the first sum in (12.181). To this end, we first look at \(\theta _{m-1}\), and rewrite this quantity as

$$\begin{aligned} \begin{aligned} \theta _{m-1} =&\sum \nolimits _{k=0}^{m-1} \theta _{m-1,k}\\ =&\tfrac{1}{\left( {\begin{array}{c}m-1\\ m-1\end{array}}\right) }\sum \nolimits _{k=0}^{m-1} \tfrac{1}{m-k}\left( {\begin{array}{c}m-1\\ k\end{array}}\right) C^{m-1-k}\\ =&(1+C)^{m-1}\sum \nolimits _{k=0}^{m-1} \tfrac{1}{m-k}\left( {\begin{array}{c}m-1\\ k\end{array}}\right) \left( \tfrac{1}{1+C}\right) ^k \left( \tfrac{C}{1+C}\right) ^{m-1-k}\\ =&(1+C)^{m-1} E\left( \tfrac{1}{m-X_{m-1}}\right) \\ =&(1+C)^{m-1} E\left( \tfrac{1}{1+Y_{m-1}}\right) , \end{aligned} \end{aligned}$$
(12.183)

where

$$ \begin{aligned} X_{m-1} \, {\mathop {\in }\limits ^\mathcal{L}}&\text{ Bin }\left( m-1,\tfrac{1}{1+C}\right) ,\\ Y_{m-1} = m-1-X_{m-1} \, {\mathop {\in }\limits ^\mathcal{L}}&\text{ Bin }\left( m-1,\tfrac{C}{1+C}\right) \end{aligned} $$

are two binomially distributed random variables. For large m, we apply the Law of Large Numbers to \(Y_{m-1}\) and find that

$$\begin{aligned} \begin{aligned} \theta _{m-1} \approx&(1+C)^{m-1} \tfrac{1}{1+E(Y_{m-1})}\\ \approx&(1+C)^{m-1} \tfrac{1}{mC/(1+C)}\\ =&(1+C)^{m}/(Cm), \end{aligned} \end{aligned}$$
(12.184)

in agreement with the lower half of (12.109). In view of (12.181), in order to finalize the proof of (12.109), we need to show that the sum of \(\theta _{m-j}-\theta _{m-j+1}\) for \(j=2,3,\ldots ,m\), is of a smaller order than (12.184). A similar argument as in (12.183) leads to

$$\begin{aligned} \begin{aligned} \theta _{m-j}-\theta _{m-j+1} =&\sum \nolimits _{k=0}^{m-j} \theta _{m-j,k}\\ =&(j-1)!(1+C)^{m-j} E\left[ \tfrac{1}{\prod _{n=1}^j (n+Y_{m-j})}\right] \\ \le&\tfrac{2}{j}(1+C)^{m-j} E\left[ \tfrac{1}{(1+Y_{m-j})(2+Y_{m-j})}\right] , \end{aligned} \end{aligned}$$
(12.185)

where

$$ Y_{m-j} \, {\mathop {\in }\limits ^\mathcal{L}}\text{ Bin }\left( m-j,\frac{C}{1+C}\right) . $$

For large m we have, by the Law of Large Numbers, that

$$\begin{aligned} \begin{array}{lll} &{}&{}\theta _{m-j}-\theta _{m-j+1} \le \frac{2}{j}(1+C)^{m-j}\frac{1}{\left[ 1+(m-j)C/(1+C)\right] ^2}\\ &{}\le &{} \left\{ \begin{array}{ll} 4(1+C)^{m/2}/m, &{} j> m/2,\\ (1+C)^{m-j}/\left[ m/2\times C/(1+C)\right] ^2, &{} 2\le j \le m/2. \end{array}\right. \end{array} \end{aligned}$$
(12.186)

By summing (12.186) over j, it is easy to see that

$$ \sum _{j=2}^{m} (\theta _{m-j}-\theta _{m-j+1}) \ll (1+C)^{m}/(Cm) \sim \theta _m $$

as \(m\rightarrow \infty \). Together with (12.184), this completes the derivation of the lower part of (12.109).    \(\square \)

Sketch of proof of Theorem 12.3. Our proof will parallel that of Theorem 1 in Durrett el al. [20], see also Wodarz and Komarova [66]. We first use formula (12.66) in order to deduce that the ratio between the two rates of fixation from a type 0 population, satisfies \(\hat{\lambda }_{02}/\hat{\lambda }_{01} \rightarrow \infty \) as \(N\rightarrow \infty \). When \(\rho =0\) in (12.51), this is a consequence of \(\hat{\lambda }_{02}/\hat{\lambda }_{01}\sim N\sqrt{u_2}\) and the assumption \(N\sqrt{u_2}\rightarrow \infty \) on the second mutation rate \(u_2\). When \(\rho < 0\), \(\hat{\lambda }_{02}/\hat{\lambda }_{01}\) tends to infinity at an even faster rate, due to the \(\psi (\rho u_2^{1/2})\)-term of \(\hat{\lambda }_{01}\) in (12.66). In any case, it follows that condition (12.44) is satisfied, with \(F(0)=2\) and \(\hat{\pi }_{02}=1\). That is, tunneling from 0 to 2 will occur with probability tending to 1 as \(N\rightarrow \infty \) whether \(\rho =0\) or \(\rho <0\). As in the proof of Lemma 12.3 we conclude from this that the fraction \(Z_t=Z_{t1}\) of allele 1 will stay close to 0, and we may use a branching process approximation for \(Z_t\). A consequence of this approximation is that type 1 mutations arrive according to a Poisson process with intensity \(Nu_1\), and the descendants of different type 1 mutations evolve independently. Let \(0<\sigma \le \infty \) be the time it takes for the first type 2 descendant of a type 1 mutation to appear. In particular, if \(\sigma =\infty \), this type 1 mutation has no type 2 descendants. Letting \(G(x)=P(\sigma \le x)\) be the distribution function of \(\sigma \), it follows by a Poisson process thinning argument that

$$\begin{aligned} P(T_2^{\prime \prime }\ge t) \sim \exp (-Nu_1\int _0^t G(x)dx). \end{aligned}$$
(12.187)

We use Kolmogorov’s backward equation in order to determine G. To this end, we will first compute \(G(x+h)\) for a small number \(h>0\), by conditioning on what happens during the time interval (0, h). As in formulas (12.121)–(12.122) of Appendix B, we let \(a_{ij}(z)\) refer to the rate at which a type i individual dies and gets replaced by the offspring of a type j individual, when the number of type 1 individuals before the replacement is Nz. Since we look at the descendants of one type 1 individual, we have that \(z=Z_0=1/N\). Using a similar argument as in Eq. (12.159), it follows from this that

$$\begin{aligned} \begin{aligned}&G(x+h) = a_{00}(1/N)h \times G(x)\\ +&a_{01}(1/N)h \left[ u_2\times 1 + (1-u_2)(2G(x)-G(x)^2)\right] \\ +&a_{10}(1/N)h \times 0 + a_{11}(1/N)h \times \left[ u_2\times 1 + (1-u_2)G(x)\right] \\ +&\left[ 1-\sum \nolimits _{ij} a_{ij}(1/N) h\right] G(x) + o(h) \end{aligned} \end{aligned}$$
(12.188)

for small \(h>0\). Notice that the two \(a_{00}(1/N)\) terms cancel out in (12.188), whereas \(a_{11}(1/N)(1-G(x))u_2\times h=O(N^{-2}u_2\times h)\) is too small to have an asymptotic impact. Using formulas (12.121)–(12.122) for \(a_{01}(1/N)\) and \(a_{10}(1/N)\), it follows that (12.188) simplifies to

$$ G(x+h) = s\times h \left[ u_2 + 2G(x)-G(x)^2\right] + 1\times h \times 0 + \left[ 1-(s+1)h\right] G(x) + o(h), $$

when all asymptotically negligible terms are put into the remainder term. Letting \(h\rightarrow 0\), we find that G(x) satisfies the differential equation

$$\begin{aligned} \begin{aligned} G^\prime (x) =&-sG(x)^2 + (s-1)G(x)+su_2\\ =&-s(G(x)-r_1)(G(x)-r_2), \end{aligned} \end{aligned}$$
(12.189)

where

$$ \begin{aligned} r_1 =&(s-1)/(2s) + \sqrt{\left[ (s-1)/(2s)\right] ^2 + u_2},\\ r_2 =&(s-1)/(2s) - \sqrt{\left[ (s-1)/(2s)\right] ^2 + u_2} \end{aligned} $$

are the two roots of the quadratic equation \(-sy^2 + (s-1)y+su_2=0\). Recall from (12.51) that \(s=1+\rho \sqrt{u_2}\). We may therefore express these two roots as

$$\begin{aligned} \begin{aligned} r_1 =&\sqrt{u_2}\left( \rho + \sqrt{\rho ^2+4s^2}\right) /(2s) \sim \sqrt{u_2}\left( \rho + \sqrt{\rho ^2+4}\right) /(2s) \\ =&\sqrt{u_2}R(\rho )/s,\\ r_2 =&\sqrt{u_2}\left( \rho - \sqrt{\rho ^2+4s^2}\right) /(2s) \sim \sqrt{u_2}\left( \rho - \sqrt{\rho ^2+4}\right) /(2s), \end{aligned} \end{aligned}$$
(12.190)

where in the second step we used that \(u_2\rightarrow 0\) and \(s\rightarrow 1\) as \(N\rightarrow \infty \), and in the last step we invoked (12.41), the definition of \(R(\rho )\). Since \(r_2< 0 < r_1\), and \(G^\prime (x)\rightarrow 0\) as \(x\rightarrow \infty \), it follows from (12.189) that we must have \(G(\infty )=r_1\). Together with the other boundary condition \(G(0)=0\), this gives as solution

$$\begin{aligned} G(x) = r_1 \frac{1-e^{-(r_1-r_2)sx}}{1-\frac{r_1}{r_2}e^{-(r_1-r_2)sx}} \end{aligned}$$
(12.191)

to the differential equation (12.189), with

$$ r_1 - r_2 \sim \frac{\sqrt{u_2} \times \sqrt{\rho ^2+4}}{s} $$

and

$$\begin{aligned} -\frac{r_1}{r_2} \sim \frac{\sqrt{\rho ^2+4}+\rho }{\sqrt{\rho ^2+4}-\rho }. \end{aligned}$$
(12.192)

Putting things together, we find that

$$\begin{aligned} \begin{aligned} P\left( NR(\rho )u_1\sqrt{u_2}\times T_2^{\prime \prime } \ge t\right) \sim&P\left( Nu_1r_1s \times T_2^{\prime \prime } \ge t\right) )\\ \sim&\exp \left( -Nu_1\int _0^{t/(Nu_1r_1s)} G(x)dx\right) \\ \sim&\exp \left( -\int _0^{t} h(y)dy\right) , \end{aligned} \end{aligned}$$
(12.193)

where formula (12.190) was used in the first step, (12.187) in the second step, in the third step we changed variables \(y=Nu_1r_1 s\times x\) and introduced the hazard function \(h(x) = G\left( x/(Nu_1r_1s)\right) /(sr_1)\). If \(Nu_1\rightarrow a >0\) as \(N\rightarrow \infty \), it follows from (12.191) and the fact that \(s\rightarrow 1\) that we can rewrite the hazard function as

$$\begin{aligned} \begin{array}{l} h(x) \sim \frac{1}{sr_1}G\left( \frac{x}{sar_1}\right) \\ = \frac{1}{s} \times \frac{1-\exp \left( -\frac{r_1-r_2}{r_1} \times \frac{x}{a}\right) }{1-\frac{r_1}{r_2}\exp \left( -\frac{r_1-r_2}{r_1}\times \frac{x}{a}\right) } \sim \frac{1-\exp \left( -\frac{r_1-r_2}{r_1} \times \frac{x}{a}\right) }{1-\frac{r_1}{r_2}\exp \left( -\frac{r_1-r_2}{r_1}\times \frac{x}{a}\right) }. \end{array} \end{aligned}$$
(12.194)

We finally obtain the limit result (12.110)–(12.111) when \(a>0\) from (12.193) to (12.194), using (12.192) and the fact that

$$ \frac{r_1-r_2}{r_1} \sim \frac{2\sqrt{\rho ^2+4}}{\rho +\sqrt{\rho ^2+4}}. $$

When \(Nu_1\rightarrow 0\), one similarly shows that (12.193) holds, with \(h(x)=1\). Finally, formula (12.112) follows by integrating (12.193) with respect to t.    \(\square \)

Motivation of formula (12.114). We will motivate formula (12.114) in terms of the transition rates \(\hat{\lambda }_{ij}\) in (12.35), rather than those in (12.113) that are adjusted for tunneling and fixation of alleles.

Since we assume \(s_1=\cdots =s_{m-1}=1<s_m\) in (12.114), it follows from (12.35) that it is increasingly difficult to have backward and forward transitions over larger distances, except that it is possible for some models to have a direct forward transition to the target allele m. By this we mean that the backward and forward transition rates from any state i satisfy \(\hat{\lambda }_{i,i-1}\gg \cdots \gg \hat{\lambda }_{i0}\), and \(\hat{\lambda }_{i,i+1}\gg \cdots \gg \hat{\lambda }_{i,m-1}\) respectively, as \(N\rightarrow \infty \). For this reason, from any fixed state i, it is only possible to have competition between the two forward transitions \(i\rightarrow i+1\) and \(i\rightarrow m\) when \(0\le i\le m-2\). Since \(\gamma _i=(\hat{\lambda }_{im}/\hat{\lambda }_{i,i+1})^2\), and since the transition rates to the intermediate alleles \(i+1,\ldots ,m-1\) are of a smaller order than the transition rate to \(i+1\), it follows that (12.35) predicts a total forward rate of fixation from fixed state i of the order

$$\begin{aligned} \begin{aligned} Nu_{i+1}f_i \sim&\hat{\lambda }_{i,i+1}+\hat{\lambda }_{i,i+m}\\ =&\hat{\lambda }_{i,i+1}(1+\sqrt{\gamma _i})\\ =&Nu_{i+1}\beta \left( \tfrac{s_{i+1}}{s_i}\right) (1+\sqrt{\gamma _i})\\ =&u_{i+1}(1+\sqrt{\gamma _i}), \end{aligned} \end{aligned}$$
(12.195)

where in the last step we used that \(s_i=s_{i+1}\) and \(\beta (1)=1/N\). We will extend the argument in the proof of Theorem 3 in Durrett et al. [20], and indicate that the total forward rate of fixation from i should rather be

$$\begin{aligned} Nu_{i+1}f_i \sim \hat{\lambda }_{i,i+1}\chi \left( \frac{\gamma _i}{\beta (s_m)}\right) = u_{i+1}\chi \left( \frac{\gamma _i}{\beta (s_m)}\right) , \end{aligned}$$
(12.196)

where \(\chi (\cdot )\) is the function defined in (12.63). This will also motivate (12.114), since this formula serves the purpose of modifying the incorrect forward rate of fixation (12.195), so that it equals the adjusted one in (12.196), keeping the relative sizes of the different forward rates \(i\rightarrow j\) of fixation intact for \(j=i+1,\ldots ,m\).

The rationale for (12.196) is that type \(i+1\) mutations arrive according to a Poisson process at rate \(Nu_{i+1}\), and \(\chi /N\) is the probability that any such type \(i+1\) mutation has descendants of type \(i+1\) or m that spread to the whole population. We need to show that

$$\begin{aligned} \chi = \chi \left( \frac{\gamma _i}{\beta (s_m)}\right) . \end{aligned}$$
(12.197)

To this end, let \(X_t\) be the fraction of descendants of a \(i\rightarrow i+1\) mutation, Nt time units after this mutation appeared. We stop this process at a time point \(\tau \) when \(X_t\) reaches any of the two boundary points 0 or 1 (\(X_\tau =0\) or 1), or when a successful mutation \(i+1\rightarrow i+2\) appears before that, which is a descendant of the type \(i+1\) mutation that itself will have type m descendants who spread to the whole population, before any other type gets fixed (\(0<X_\tau <1\)). We have that \(x=X_0=1/N\), but define

$$ \bar{\beta }(s_m;x) = \bar{\beta }(x) = P(X_\tau =0|X_0=x) $$

for any value of x. This is a non-fixation probability, i.e. the probability that the descendants of Nx individuals of type \(i+1\) at time \(t=0\) neither have a successful type \(i+2\) descendant, nor take over the population before that. Since the descendants of a single type \(i+1\) mutation take over the population with probability \(1-\bar{\beta }(1/N)\), it is clear that

$$\begin{aligned} \chi = N\left[ 1-\bar{\beta }\left( \frac{1}{N}\right) \right] \sim \lim _{x\rightarrow 0} \frac{1-\bar{\beta }(x)}{x} = -\bar{\beta }^\prime (0). \end{aligned}$$
(12.198)

Durrett et al. [20] prove that it is possible to neglect the impact of further \(i\rightarrow i+1\) mutations after time \(t=0\). It follows that \(X_t\) will be a version of the Moran process of Appendix B with \(s=s_{i+1}/s_i=1\), during the time interval \((0,\tau )\), when time speeded up by a factor of N. Using (12.123)–(12.124), we find that the infinitesimal mean and variance functions of \(X_t\) are

$$\begin{aligned} \begin{aligned} M(x) =&N\times 0 = 0,\\ V(x) =&N\times 2x(1-x)/N = 2x(1-x), \end{aligned} \end{aligned}$$
(12.199)

respectively. At time t, a successful type \(i+2\) mutation arrives at rate

$$\begin{aligned} \begin{aligned} N\times NX_t \times u_{i+2}q_{i+1,m}\beta \left( \tfrac{s_m}{s_i}\right) \sim&N^2X_t \times u_{i+2}r_{i+1,m}\beta (s_m)\\ =&N^2X_t \times r_{im}^2\beta (s_m)\\ =&X_t \times (\hat{\lambda }_{im}/\hat{\lambda }_{i,i+1})^2 \beta (s_m)^{-1}\\ =&X_t \times \gamma _i\beta (s_m)^{-1}\\ =:&X_t \times \gamma ^\prime , \end{aligned} \end{aligned}$$
(12.200)

where in the second step we used \(r_{im}^2=u_{i+2}r_{i+1,m}\), which follows from (12.36), since all \(R(\rho _{ilj})=1\) when \(s_1=\cdots =s_{m-1}=1\). Then in the third step we used \(\hat{\lambda }_{im}/\hat{\lambda }_{i,i+1}=Nr_{im}\beta (s_m)\), which follows from (12.35), and in the last step we introduced the short notation \(\gamma ^\prime = \gamma _i\beta (s_m)^{-1}\). (One instance of \(\gamma ^\prime \) is presented for the boundary scenarios of Sect. 12.7.2.1, below formula (12.105).)

We will use (12.199)–(12.200) and Kolmogorov’s backward equation in order to derive a differential equation for \(\bar{\beta }(x)\). Consider a fixed \(0<x<1\), and let \(h>0\) be a small number. Then condition on what happens during time interval (0, h). When h is small, it is unlikely that the process \(X_t\) will stop because it hits any of the boundaries 0 or 1, i.e.

$$ \begin{aligned} P(\tau<h,0<X_\tau<1) =&x\gamma ^\prime h + o(h),\\ P(\tau <h,X_\tau \in \{0,1\}) =&o(h) \end{aligned} $$

as \(h\rightarrow 0\). The non-fixation probability can therefore be expressed as

$$ \begin{aligned} \bar{\beta }(x) =&x\gamma ^\prime h \times 0 + (1 - x\gamma ^\prime h)\int _0^t \bar{\beta }(y)dP(X_h=y|X_0=x) + o(h)\\ =&(1-x\gamma ^\prime h)\left[ \bar{\beta }(x) + \tfrac{1}{2}V(x)\bar{\beta }^{\prime \prime }(x)h\right] + o(h). \end{aligned} $$

Letting \(h\rightarrow 0\), we find from (12.199) that \(\bar{\beta }(x)\) satisfies the differential equation

$$\begin{aligned} x(1-x)\bar{\beta }^{\prime \prime }(x) - x\gamma ^\prime \bar{\beta }(x) = 0. \end{aligned}$$
(12.201)

Durrett et al. [20] use a power series argument to prove that the solution of (12.201), with boundary conditions \(\bar{\beta }(0)=1\) and \(\bar{\beta }(1)=0\), is

$$\begin{aligned} \bar{\beta }(x) = \frac{\sum _{k=1}^\infty \frac{(\gamma ^\prime )^{k}}{k!(k-1)!}(1-x)^k}{\sum _{k=1}^\infty \frac{(\gamma ^\prime )^{k}}{k!(k-1)!}}. \end{aligned}$$
(12.202)

Recalling (12.63) and that \(\gamma ^\prime =\gamma _i/\beta (s_m)\), we deduce formula (12.197) from (12.198) and differentiation of (12.202) with respect to x.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hössjer, O., Bechly, G., Gauger, A. (2018). Phase-Type Distribution Approximations of the Waiting Time Until Coordinated Mutations Get Fixed in a Population. In: Silvestrov, S., Malyarenko, A., Rančić, M. (eds) Stochastic Processes and Applications. SPAS 2017. Springer Proceedings in Mathematics & Statistics, vol 271. Springer, Cham. https://doi.org/10.1007/978-3-030-02825-1_12

Download citation

Publish with us

Policies and ethics