Journal of Mathematical Biology

, Volume 70, Issue 6, pp 1207–1247 | Cite as

Finding the best resolution for the Kingman–Tajima coalescent: theory and applications

  • Raazesh Sainudiin
  • Tanja Stadler
  • Amandine Véber
Article

Abstract

Many summary statistics currently used in population genetics and in phylogenetics depend only on a rather coarse resolution of the underlying tree (the number of extant lineages, for example). Hence, for computational purposes, working directly on these resolutions appears to be much more efficient. However, this approach seems to have been overlooked in the past. In this paper, we describe six different resolutions of the Kingman–Tajima coalescent together with the corresponding Markov chains, which are essential for inference methods. Two of the resolutions are the well-known \(n\)-coalescent and the lineage death process due to Kingman. Two other resolutions were mentioned by Kingman and Tajima, but never explicitly formalized. Another two resolutions are novel, and complete the picture of a multi-resolution coalescent. For all of them, we provide the forward and backward transition probabilities, the probability of visiting a given state as well as the probability of a given realization of the full Markov chain. We also provide a description of the state-space that highlights the computational gain obtained by working with lower-resolution objects. Finally, we give several examples of summary statistics that depend on a coarser resolution of Kingman’s coalescent, on which simulations are usually based.

Keywords

\(n\)-Coalescent resolutions Tree shape statistics Computationally efficient and statistically sufficient inference 

Mathematics Subject Classification (2000)

92D15 92D20 60J10 

Notes

Acknowledgments

We are grateful to Robert C. Griffiths for his insights, comments and guidance on this project, to John Rhodes and Mike Steel for their comments on an earlier version of this manuscript, and to Mike Steel for pointing out (Kemeny and Snell 1960, Defn. 6.3.1). We also thank the referee and the associate editor for their pertinent comments, particularly on the computational aspects of this work. During the initial course of this study, R.S. was supported by a research fellowship from the Royal Commission for the Exhibition of 1851 and T.S. was supported by a PhD scholarship of the German Science Foundation and a summer studentship of the Allan Wilson Centre. A.V. was supported by the ANR project MANEGE (ANR-09-BLAN-0215) and T.S. by the Swiss National Science foundation. R.S. and A.V. were supported in part by the chaire Modélisation Mathématique et Biodiversité of Veolia Environnement-École Polytechnique-Museum National d’Histoire Naturelle-Fondation X.

References

  1. Aldous DJ (2001) Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat Sci 16(1):23–34CrossRefMATHMathSciNetGoogle Scholar
  2. Bahlo M, Griffiths R (1996) Inference from gene trees in a subdivided population. Theor Pop Biol 57:79–95CrossRefGoogle Scholar
  3. Beaumont M, Zhang W, Balding D (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035Google Scholar
  4. Beaumont M, Robert C, Marin JM, Cornuet J (2009) Adaptivity for ABC algorithms: the ABC–PMC scheme. Biometrika 96(4):983–990CrossRefMATHMathSciNetGoogle Scholar
  5. Birkner M, Blath J (2008) Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J Math Biol 57:435–465CrossRefMATHMathSciNetGoogle Scholar
  6. Colless DH (1982) Review of phylogenetics: the theory and practice of phylogenetic systematics. Syst Zool 31:100–104CrossRefGoogle Scholar
  7. Del Moral P (2004) Feynman–Kac formulae: genealogical and interacting particle systems with applications. Springer, New YorkCrossRefGoogle Scholar
  8. Doucet A, Johansen AM (2009) A tutorial on particle filtering and smoothing: fifteen years later. In: Crisan D, Rozovsky B (eds) The Oxford handbook of nonlinear filtering. Oxford University Press, OxfordGoogle Scholar
  9. Etheridge AM (2011) Some mathematical models from population genetics. Lecture notes in mathematics 2012. Springer, BerlinGoogle Scholar
  10. Fisher R (1930) The genetical theory of natural selection. Clarenson, OxfordMATHGoogle Scholar
  11. Ford D, Matsen E, Stadler T (2009) A method for investigating relative timing information on phylogenetic trees. Syst Biol 58(2):167–183CrossRefGoogle Scholar
  12. Fu YX (1995) Statistical properties of segregating sites. Theor Pop Biol 48:172–197CrossRefMATHGoogle Scholar
  13. Griffiths R, Tavare S (1994) Ancestral inference in population genetics. Stat Sci 9:307–319CrossRefMATHMathSciNetGoogle Scholar
  14. Griffiths R, Tavare S (1996) Markov chain inference methods in population genetics. Math Comput Model 23:141–158CrossRefMATHMathSciNetGoogle Scholar
  15. Iorio M, Griffiths R (2004) Importance sampling on coalescent histories I. Adv Appl Prob 36:417–433CrossRefMATHGoogle Scholar
  16. Kemeny J, Snell J (1960) Finite Markov chains. D. van Nostrand Company Inc, PrincetonMATHGoogle Scholar
  17. Kendall DG (1975) Some problems in mathematical genealogy. In: Gani J (ed) Perspectives in probability and statistics. Academic Press, New York, pp 325–345Google Scholar
  18. Kingman JFC (1982a) The coalescent. Stoch Proc Appl 13:235–248CrossRefMATHMathSciNetGoogle Scholar
  19. Kingman JFC (1982b) On the genealogy of large populations. J Appl Probab 19:27–43CrossRefMathSciNetGoogle Scholar
  20. Kolmogorov A (1942) Sur l’estimation statistique des paramètres de la loi de gauss. Bull Acad Sci URSS Ser Math 6:3–32MATHGoogle Scholar
  21. Le Cam L (1964) Sufficiency and approximate sufficiency. Ann Math Stat 35:1419–1455CrossRefMATHGoogle Scholar
  22. Leuenberger C, Wegmann D (2009) Bayesian computation and model selection without likelihoods. Genetics 184:243–252CrossRefGoogle Scholar
  23. Marin JM, Pudlo P, Robert CP, Ryder RJ (2012) Approximate Bayesian computational methods. Stat Comput 22(6):1167–1180. doi:10.1007/s11222-011-9288-2 CrossRefMATHMathSciNetGoogle Scholar
  24. Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328CrossRefGoogle Scholar
  25. McKenzie A, Steel M (2000) Distribution of cherries for two models of trees. Math Biosci 164:81–92CrossRefMATHMathSciNetGoogle Scholar
  26. Pritchard J, Seielstad M, Perez-Lezaun A, Feldman M (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16:1791–1798CrossRefGoogle Scholar
  27. Ralph P, Coop G (2013) The geography of recent genetic ancestry across Europe. PLoS Biol 11(5): e1001555Google Scholar
  28. Sackin MJ (1975) “Good” and “bad” phenograms. Syst Zool 21:225–226CrossRefGoogle Scholar
  29. Sainudiin R, Thornton K, Harlow J, Booth J, Stillman M, Yoshida R, Griffiths R, McVean G, Donnelly P (2011) Experiments with the site frequency spectrum. Bull Math Biol 73(4):829–872CrossRefMATHMathSciNetGoogle Scholar
  30. Semple C, Steel M (2003) Phylogenetics. Oxford University Press, OxfordMATHGoogle Scholar
  31. Sisson S, Fan Y, Tanaka M (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA 104:1760–1765CrossRefMATHMathSciNetGoogle Scholar
  32. Slatkin M (2002) A vectorized method of importance sampling with applications to models of mutation and migration. Theor Pop Biol 62:339–348CrossRefMATHGoogle Scholar
  33. Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc B 62:605–655CrossRefMATHMathSciNetGoogle Scholar
  34. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460Google Scholar
  35. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595Google Scholar
  36. Tavaré S (1983) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Pop Biol 26:119–164CrossRefGoogle Scholar
  37. Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Pop Biol 7:256–276CrossRefMATHMathSciNetGoogle Scholar
  38. Weiss G, von Haeseler A (1998) Inference of population history using a likelihood approach. Genetics 149:1539–1546Google Scholar
  39. Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Raazesh Sainudiin
    • 1
  • Tanja Stadler
    • 2
  • Amandine Véber
    • 3
  1. 1.Biomathematics Research Centre and School of Mathematics and StatisticsUniversity of CanterburyChristchurchNew Zealand
  2. 2.Institut f. Integrative BiologieETH ZürichZürichSwitzerland
  3. 3.Centre de Mathématiques AppliquéesÉcole PolytechniquePalaiseau CedexFrance

Personalised recommendations