# Finding the best resolution for the Kingman–Tajima coalescent: theory and applications

- 452 Downloads
- 3 Citations

## Abstract

Many summary statistics currently used in population genetics and in phylogenetics depend only on a rather coarse resolution of the underlying tree (the number of extant lineages, for example). Hence, for computational purposes, working directly on these resolutions appears to be much more efficient. However, this approach seems to have been overlooked in the past. In this paper, we describe six different resolutions of the Kingman–Tajima coalescent together with the corresponding Markov chains, which are essential for inference methods. Two of the resolutions are the well-known \(n\)-coalescent and the lineage death process due to Kingman. Two other resolutions were mentioned by Kingman and Tajima, but never explicitly formalized. Another two resolutions are novel, and complete the picture of a multi-resolution coalescent. For all of them, we provide the forward and backward transition probabilities, the probability of visiting a given state as well as the probability of a given realization of the full Markov chain. We also provide a description of the state-space that highlights the computational gain obtained by working with lower-resolution objects. Finally, we give several examples of summary statistics that depend on a coarser resolution of Kingman’s coalescent, on which simulations are usually based.

## Keywords

\(n\)-Coalescent resolutions Tree shape statistics Computationally efficient and statistically sufficient inference## Mathematics Subject Classification (2000)

92D15 92D20 60J10## Notes

### Acknowledgments

We are grateful to Robert C. Griffiths for his insights, comments and guidance on this project, to John Rhodes and Mike Steel for their comments on an earlier version of this manuscript, and to Mike Steel for pointing out (Kemeny and Snell 1960, Defn. 6.3.1). We also thank the referee and the associate editor for their pertinent comments, particularly on the computational aspects of this work. During the initial course of this study, R.S. was supported by a research fellowship from the Royal Commission for the Exhibition of 1851 and T.S. was supported by a PhD scholarship of the German Science Foundation and a summer studentship of the Allan Wilson Centre. A.V. was supported by the ANR project MANEGE (ANR-09-BLAN-0215) and T.S. by the Swiss National Science foundation. R.S. and A.V. were supported in part by the chaire Modélisation Mathématique et Biodiversité of Veolia Environnement-École Polytechnique-Museum National d’Histoire Naturelle-Fondation X.

## References

- Aldous DJ (2001) Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Stat Sci 16(1):23–34CrossRefzbMATHMathSciNetGoogle Scholar
- Bahlo M, Griffiths R (1996) Inference from gene trees in a subdivided population. Theor Pop Biol 57:79–95CrossRefGoogle Scholar
- Beaumont M, Zhang W, Balding D (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035Google Scholar
- Beaumont M, Robert C, Marin JM, Cornuet J (2009) Adaptivity for ABC algorithms: the ABC–PMC scheme. Biometrika 96(4):983–990CrossRefzbMATHMathSciNetGoogle Scholar
- Birkner M, Blath J (2008) Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J Math Biol 57:435–465CrossRefzbMATHMathSciNetGoogle Scholar
- Colless DH (1982) Review of phylogenetics: the theory and practice of phylogenetic systematics. Syst Zool 31:100–104CrossRefGoogle Scholar
- Del Moral P (2004) Feynman–Kac formulae: genealogical and interacting particle systems with applications. Springer, New YorkCrossRefGoogle Scholar
- Doucet A, Johansen AM (2009) A tutorial on particle filtering and smoothing: fifteen years later. In: Crisan D, Rozovsky B (eds) The Oxford handbook of nonlinear filtering. Oxford University Press, OxfordGoogle Scholar
- Etheridge AM (2011) Some mathematical models from population genetics. Lecture notes in mathematics 2012. Springer, BerlinGoogle Scholar
- Fisher R (1930) The genetical theory of natural selection. Clarenson, OxfordzbMATHGoogle Scholar
- Ford D, Matsen E, Stadler T (2009) A method for investigating relative timing information on phylogenetic trees. Syst Biol 58(2):167–183CrossRefGoogle Scholar
- Fu YX (1995) Statistical properties of segregating sites. Theor Pop Biol 48:172–197CrossRefzbMATHGoogle Scholar
- Griffiths R, Tavare S (1994) Ancestral inference in population genetics. Stat Sci 9:307–319CrossRefzbMATHMathSciNetGoogle Scholar
- Griffiths R, Tavare S (1996) Markov chain inference methods in population genetics. Math Comput Model 23:141–158CrossRefzbMATHMathSciNetGoogle Scholar
- Iorio M, Griffiths R (2004) Importance sampling on coalescent histories I. Adv Appl Prob 36:417–433CrossRefzbMATHGoogle Scholar
- Kemeny J, Snell J (1960) Finite Markov chains. D. van Nostrand Company Inc, PrincetonzbMATHGoogle Scholar
- Kendall DG (1975) Some problems in mathematical genealogy. In: Gani J (ed) Perspectives in probability and statistics. Academic Press, New York, pp 325–345Google Scholar
- Kingman JFC (1982a) The coalescent. Stoch Proc Appl 13:235–248CrossRefzbMATHMathSciNetGoogle Scholar
- Kingman JFC (1982b) On the genealogy of large populations. J Appl Probab 19:27–43CrossRefMathSciNetGoogle Scholar
- Kolmogorov A (1942) Sur l’estimation statistique des paramètres de la loi de gauss. Bull Acad Sci URSS Ser Math 6:3–32zbMATHGoogle Scholar
- Le Cam L (1964) Sufficiency and approximate sufficiency. Ann Math Stat 35:1419–1455CrossRefzbMATHGoogle Scholar
- Leuenberger C, Wegmann D (2009) Bayesian computation and model selection without likelihoods. Genetics 184:243–252CrossRefGoogle Scholar
- Marin JM, Pudlo P, Robert CP, Ryder RJ (2012) Approximate Bayesian computational methods. Stat Comput 22(6):1167–1180. doi: 10.1007/s11222-011-9288-2 CrossRefzbMATHMathSciNetGoogle Scholar
- Marjoram P, Molitor J, Plagnol V, Tavare S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA 100:15324–15328CrossRefGoogle Scholar
- McKenzie A, Steel M (2000) Distribution of cherries for two models of trees. Math Biosci 164:81–92CrossRefzbMATHMathSciNetGoogle Scholar
- Pritchard J, Seielstad M, Perez-Lezaun A, Feldman M (1999) Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16:1791–1798CrossRefGoogle Scholar
- Ralph P, Coop G (2013) The geography of recent genetic ancestry across Europe. PLoS Biol 11(5): e1001555Google Scholar
- Sackin MJ (1975) “Good” and “bad” phenograms. Syst Zool 21:225–226CrossRefGoogle Scholar
- Sainudiin R, Thornton K, Harlow J, Booth J, Stillman M, Yoshida R, Griffiths R, McVean G, Donnelly P (2011) Experiments with the site frequency spectrum. Bull Math Biol 73(4):829–872CrossRefzbMATHMathSciNetGoogle Scholar
- Semple C, Steel M (2003) Phylogenetics. Oxford University Press, OxfordzbMATHGoogle Scholar
- Sisson S, Fan Y, Tanaka M (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci USA 104:1760–1765CrossRefzbMATHMathSciNetGoogle Scholar
- Slatkin M (2002) A vectorized method of importance sampling with applications to models of mutation and migration. Theor Pop Biol 62:339–348CrossRefzbMATHGoogle Scholar
- Stephens M, Donnelly P (2000) Inference in molecular population genetics. J R Stat Soc B 62:605–655CrossRefzbMATHMathSciNetGoogle Scholar
- Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460Google Scholar
- Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595Google Scholar
- Tavaré S (1983) Line-of-descent and genealogical processes, and their applications in population genetics models. Theor Pop Biol 26:119–164CrossRefGoogle Scholar
- Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Pop Biol 7:256–276CrossRefzbMATHMathSciNetGoogle Scholar
- Weiss G, von Haeseler A (1998) Inference of population history using a likelihood approach. Genetics 149:1539–1546Google Scholar
- Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159Google Scholar