Journal of Statistical Physics

, Volume 142, Issue 6, pp 1302–1316 | Cite as

Slicing and Dicing the Genome: A Statistical Physics Approach to Population Genetics

  • Yosef E. Maruvka
  • Nadav M. Shnerb
  • Sorin Solomon
  • Gur Yaari
  • David A. Kessler


The inference of past demographic parameters from current genetic polymorphism is a fundamental problem in population genetics. The standard techniques utilize a reconstruction of the gene-genealogy, a cumbersome process that may be applied only to small numbers of sequences. We present a method that compares the total number of haplotypes (distinct sequences) with the model prediction. By chopping the DNA sequence into pieces we condense the immense information hidden in sequence space into a function for the number of haplotypes versus subsequence size. The details of this curve are robust to statistical fluctuations and are seen to reflect the process parameters. This procedure allows for a clear visualization of the quality of the fit and, crucially, the numerical complexity grows only linearly with the number of sequences. Our procedure is tested against both simulated data as well as empirical mtDNA data from China and provides excellent fits in both cases.


Galton-Watson theory Haplotype statistics Population genetics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gillespie, J.H.: Population Genetics: A Concise Guide. Johns Hopkins University Press, Baltimore (1998) Google Scholar
  2. 2.
    Stephens, M.: Inferences under the coalescent. In: Balding, D.J., Bishop, M.J., Cannings, C. (eds.) Handbook of Statistical Genetics, pp. 213–238. Wiley, New York (2001) Google Scholar
  3. 3.
    Tavaré, S.: Ancestral inference in population genetics. In: Picard, J. (ed.) Lectures in Probability Theory and Statistics: Ecole d’Eté de Probabilités de Saint-Flour XXXI, 2001. Lecture Notes in Mathematics, vol. 837, pp. 1–188. Springer, Berlin (2004) Google Scholar
  4. 4.
    Felsenstein, J.: Trees of genes in populations. In: Gascuel, O., Steel, M. (eds.) Reconstructing Evolution: New Mathematical and Computational Advances, pp. 3–29. Oxford University Press, Oxford (2007) Google Scholar
  5. 5.
    Kohl, J., Paulsen, I., Laubach, T., Radtke, A., von Haessler, A.: HvrBase++: a phylogenetic database for primate species. Nucleic Acids Res. 34, D700–D704 (2006) CrossRefGoogle Scholar
  6. 6.
    Maruvka, Y.M., Shnerb, N.M., Kessler, D.A.: Universal features of surname distribution in a subsample of a growing population. J. Theor. Biol. 262, 245–256 (2009) CrossRefGoogle Scholar
  7. 7.
    Manrubia, S., Zanette, D.H.: At the boundary between biological and cultural evolution: the origin of surname distributions. J. Theor. Biol. 216, 461–477 (2002) CrossRefMathSciNetGoogle Scholar
  8. 8.
    Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. Government Printing Office, Washington (1972) zbMATHGoogle Scholar
  9. 9.
    Sigurdardo, S., Helgason, A., Gulcher, J.R., Stefansson, K., Donnelly, P.: The mutation rate in the human mtDNA control region. Am. J. Hum. Genet. 66, 1599–1609 (2000) CrossRefGoogle Scholar
  10. 10.
    Wakeley, J.: Substitution rate variation among sites in hypervariable region I of human mitochondrial DNA. J. Mol. Evol. 37, 613–623 (1993) CrossRefGoogle Scholar
  11. 11.
    Excoffier, L., Yang, Z.: Substitution rate variation among sites in the mitochondrial hypervariable region I of humans and chimpanzees. Mol. Biol. Evol. 16, 1357–1368 (1999) Google Scholar
  12. 12.
    Maruvka, Y.E., Shnerb, N.M., Bar-Yam, Y., Wakeley, J.: Recovering population parameters from a single genealogy: an unbiased estimator of the growth rate. Mol. Biol. Evol. doi: 10.1093/molbev/msq331. First published online December 16, 2010
  13. 13.
    Atkinson, Q.D., Gray, R.D., Drummond, A.: mtDNA variation predicts population size in humans and reveals a major southern Asian chapter in human prehistory. Mol. Biol. Evol. 25, 468–474 (2007) CrossRefGoogle Scholar
  14. 14.
    Larkin, M.A., et al.: ClustalW and ClustalX version 2. Bioinformatics 23, 2947–2948 (2007) CrossRefGoogle Scholar
  15. 15.
    Ho, S.Y.W., Endicott, P.: The crucial role of calibration in molecular date estimates for the peopling of the Americas. Am. J. Hum. Genet. 83, 142–146 (2008) CrossRefGoogle Scholar
  16. 16.
    Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997) Google Scholar
  17. 17.
    Fu, Y.X., Li, W.H.: Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, 195–199 (1997) Google Scholar
  18. 18.
    Weiss, G., von Haeseler, A.: Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998) Google Scholar
  19. 19.
    Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 116, 1791–1798 (1999) Google Scholar
  20. 20.
    Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002) Google Scholar
  21. 21.
    Leman, S.C., Chen, Y., Stajich, J.E., Noor, M.A., Uyenoyama, M.K.: Likelihoods from summary statistics: Recent divergence between species. Genetics 171, 1419–1436 (2005) CrossRefGoogle Scholar
  22. 22.
    Becquet, C., Przeworski, M.: A new approach to estimate parameters of speciation models with application to apes. Genome Res. 17, 1505–1519 (2007) CrossRefGoogle Scholar
  23. 23.
    Ewens, W.J.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87–112 (1972) CrossRefMathSciNetGoogle Scholar
  24. 24.
    Lohmueller, K.E., Bustamante, C.D., Clark, A.G.: Methods for human demographic inference using haplotype patterns from genomewide singlenucleotide polymorphism data. Genetics 182, 217–231 (2009) CrossRefGoogle Scholar
  25. 25.
    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H., Bustamante, C.D.: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009) CrossRefGoogle Scholar
  26. 26.
    Bahlo, M., Griffiths, R.C.: Inference from gene trees in a subdivided population. Theor. Popul. Biol. 57, 79–95 (2000) CrossRefzbMATHGoogle Scholar
  27. 27.
    De Iorio, M., Griffiths, R.C.: Importance sampling on coalescent histories, II. Subdivided population models. Adv. Appl. Probab. 36, 434–454 (2004) CrossRefzbMATHGoogle Scholar
  28. 28.
    De Iorio, M., Griffiths, R.C., Lebois, R., Rousset, F.: Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theor. Popul. Biol. 68, 41–53 (2005) CrossRefzbMATHGoogle Scholar
  29. 29.
    Griffiths, R.C., Majoram, P.: Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502 (1996) CrossRefGoogle Scholar
  30. 30.
    Fearnhead, P., Donnelly, P.: Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001) Google Scholar
  31. 31.
    Coop, G., Griffiths, R.C.: Ancestral inference on gene trees under selection. Theor. Popul. Biol. 66, 219–232 (2004) CrossRefGoogle Scholar
  32. 32.
    Kuhner, M.K., Yamato, J., Felsenstein, J.: Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149, 429–434 (1998) Google Scholar
  33. 33.
    Kuhner, M.K., Smith, L.P.: Comparing likelihood and Bayesian coalescent estimation of population parameters. Genetics 175, 155–165 (2007) CrossRefGoogle Scholar
  34. 34.
    Slatkin, M., Hudson, R.R.: Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129, 555–562 (1991) Google Scholar
  35. 35.
    Griffiths, R.C., Tavaré, S.: Sampling theory for neutral alleles in a varying environment. Philos. Trans. R. Soc. Lond. B 344, 403–410 (1994) CrossRefADSGoogle Scholar
  36. 36.
    Wakeley, J.: Coalescent Theory: An Introduction. Roberts & Company Publishers, Greenwood Village (2008) Google Scholar
  37. 37.
    Rosset, S., et al.: Maximum likelihood estimation of site-specific mutation rates in human mitochondrial DNA from partial phylogenetic classification. Genetics 180, 1511–1524 (2008) CrossRefGoogle Scholar
  38. 38.
    Drummond, A.J., Rambaut, A.: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007) CrossRefGoogle Scholar
  39. 39.
    Drummond, A.J.: Private communication Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Yosef E. Maruvka
    • 1
  • Nadav M. Shnerb
    • 1
  • Sorin Solomon
    • 2
  • Gur Yaari
    • 3
  • David A. Kessler
    • 1
  1. 1.Department of PhysicsBar-Ilan UniversityRamat-GanIsrael
  2. 2.Racah Institute of PhysicsHebrew University of JerusalemJerusalemIsrael
  3. 3.Department of Ecology and Evolutionary BiologyYale UniversityNew HavenUSA

Personalised recommendations