Experiments with the Site Frequency Spectrum
Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods, we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under the infinitely-many-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS, we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on the SFS to a more powerful version that conditions on the topological information provided by the SFS.
KeywordsControlled lumped coalescent Population genetic Markov bases
- Beaumont, M., Zhang, W., & Balding, D. (2002). Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035. Google Scholar
- Ewens, W. (2000). Mathematical population genetics (2nd edn.). Berlin: Springer. Google Scholar
- Fay, J., & Wu, C. (2000). Hitchhiking under positive Darwinian selection. Genetics 155, 1405–1413. Google Scholar
- Grayson, D., & Stillman, M. (2004). Macaulay 2, a software system for research in algebraic geometry. Available at www.math.uiuc.edu/Macaulay2.
- Griffiths, R., & Tavare, S. (2003). The genealogy of a neutral mutation. In P. Green, N. Hjort, & S. Richardson (Eds.), Highly structured stochastic systems (pp. 393–412). London: Oxford University Press. Google Scholar
- Hemmecke, R., Hemmecke, R., & Malkin, P. (2005). 4ti2 version 1.2—computation of Hilbert bases, Graver bases, toric Gröbner bases, and more. Available at www.4ti2.de.
- Hudson, R. (1993). The how and why of generating gene genealogies. In: Clark. A., Takahata, N. (Eds.) Mechanisms of molecular evolution (pp. 23–36). Sunderland: Sinauer. Google Scholar
- Jukes, T., & Cantor, C. (1969). Evolution of protein molecules. In H. Munro (Ed.), Mammalian protein metabolism (pp. 21–32). San Diego: Academic Press. Google Scholar
- Kendall, D. (1975). Some problems in mathematical genealogy. In: Gani, J. (Ed.), Perspectives in probability and statistics (pp. 325–345). San Diego: Academic Press. Google Scholar
- Loera, J. D., Haws, D., Hemmecke, R., Huggins, P., Tauzer, J., & Yoshida, R. (2004). Lattice Point Enumeration: LattE, software to count the number of lattice points inside a rational convex polytope via Barvinok’s cone decomposition. Available at www.math.ucdavis.edu/~latte.
- Sainudiin, R., & Stadler, T. (2009) A unified multi-resolution coalescent: Markov lumpings of the Kingman-Tajima n-coalescent. UCDMS Research Report 2009/4, 5 April 2009 (submitted). Available at http://www.math.canterbury.ac.nz/~r.sainudiin/preprints/SixCoal.pdf.
- Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595. Google Scholar
- Thornton, K., Jensen, J. D., Becquet, C., & Andolfatto, P. (2007). Progress and prospects in mapping recent selection in the genome. Heredity 98, 340–348. Google Scholar
- Wakeley, J. (2007). Coalescent theory: an introduction. Greenwood Village: Roberts & Co. Google Scholar
- Weiss, G., & von Haeseler, A. (1998). Inference of population history using a likelihood approach. Genetics, 149, 1539–1546. Google Scholar