Decoding Coalescent Hidden Markov Models in Linear Time

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8394)


In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.


Demographic inference effective population size coalescent with recombination expectation-maximization augmented hidden Markov model human migration out of Africa 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Browning, B.L., Browning, S.R.: A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011)CrossRefGoogle Scholar
  2. 2.
    Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997)CrossRefGoogle Scholar
  3. 3.
    Cahill, J.A., Green, R.E., Fulton, T.L., et al.: Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution. PLoS Genetics 9, e1003345 (2013)Google Scholar
  4. 4.
    Dutheil, J.Y., Ganapathy, G., Hobolth, A., et al.: Ancestral population genomics: the coalescent hidden Markov model approach. Genetics 183, 259–274 (2009)CrossRefGoogle Scholar
  5. 5.
    Ernst, J., Kellis, M.: ChromHMM: automating chromatin-state discovery and characterization. Nature Methods 9, 215–216 (2012)CrossRefGoogle Scholar
  6. 6.
    Groenen, M.A., Archibald, A.L., Uenishi, H., et al.: Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491(7424), 393–398 (2012)CrossRefGoogle Scholar
  7. 7.
    Gronau, I., Hubisz, M.J., Gulko, B., et al.: Bayesian inference of ancient human demographic history from individual genome sequences. Nature Genetics 43, 1031–1034 (2011)CrossRefGoogle Scholar
  8. 8.
    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H., Bustamante, C.D.: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genetics 5, e1000695 (2009)Google Scholar
  9. 9.
    Haddrill, P.R., Thornton, K.R., Charlesworth, B., Andolfatto, P.: Multilocus patterns of nucleotide variability and the demographic selection history of Drosophila melanogaster populations. Genome Res. 15, 790–799 (2005)CrossRefGoogle Scholar
  10. 10.
    Hailer, F., Kutschera, V.E., Hallstrom, B.M., et al.: Nuclear genomic sequences reveal that polar bears are an old and distinct bear lineage. Science 336, 344–347 (2012)CrossRefGoogle Scholar
  11. 11.
    Harris, K., Nielsen, R.: Inferring demographic history from a spectrum of shared haplotype lengths. PLoS Genetics 9, e1003521 (2013)Google Scholar
  12. 12.
    Hobolth, A., Christensen, O.F., Mailund, T., Schierup, M.H.: Genomic relationships and speciation times of human, chimpanzee, and gorilla inferred from a coalescent hidden Markov model. PLoS Genetics 3, 294–304 (2007)CrossRefGoogle Scholar
  13. 13.
    Hudson, R.R.: Properties of the neutral allele model with intergenic recombination. Theor. Popul. Biol. 23, 183–201 (1983)CrossRefzbMATHGoogle Scholar
  14. 14.
    Hudson, R.R.: Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)CrossRefGoogle Scholar
  15. 15.
    Li, H., Durbin, R.: Inference of human population history from individual whole-genome sequences. Nature 10, 1–5 (2011)CrossRefGoogle Scholar
  16. 16.
    Mailund, T., Dutheil, J.Y., Hobolth, A., et al.: Estimating divergence time and ancestral effective population size of Bornean and Sumatran orangutan subspecies using a coalescent hidden Markov model. PLoS Genetics 7, e1001319 (2011)Google Scholar
  17. 17.
    Mailund, T., Halager, A.E., Westergaard, M., et al.: A new isolation with migration model along complete genomes infers very different divergence processes among closely related great ape species. PLoS Genetics 8(12), e1003125 (2012)Google Scholar
  18. 18.
    Meyer, M., Kircher, M., Gansauge, M.T., et al.: A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)CrossRefGoogle Scholar
  19. 19.
    Miller, W., Schuster, S.C., Welch, A.J.: Polar and brown bear genomes reveal ancient admixture and demographic footprints of plast climate change. Proc. Natl. Acad. Sci. USA 109, 2382–2390 (2012)CrossRefGoogle Scholar
  20. 20.
    Orlando, L., Ginolhac, A., Zhang, G., et al.: Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013)CrossRefGoogle Scholar
  21. 21.
    Palamara, P.F., Lencz, T., Darvasi, A., Pe’er, I.: Length distributions of identity by descent reveal fine-scale demographic history. Am. J. Hum. Genet. 91, 809–822 (2012)CrossRefGoogle Scholar
  22. 22.
    Paul, J.S., Steinrücken, M., Song, Y.S.: An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128 (2011)CrossRefGoogle Scholar
  23. 23.
    Pritchard, J.: Whole-genome sequencing data offer insights into human demography. Nature Genetics 43, 923–925 (2011)CrossRefGoogle Scholar
  24. 24.
    Sheehan, S., Harris, K., Song, Y.S.: Estimating variable effective population sizes from multiple genomes: A sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662 (2013)CrossRefGoogle Scholar
  25. 25.
    Steinrücken, M., Paul, J.S., Song, Y.S.: A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theor. Popul. Biol. 87, 51–61 (2013)CrossRefGoogle Scholar
  26. 26.
    The 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010)Google Scholar
  27. 27.
    Thornton, K., Andolfatto, P.: Approximate Bayesian inference reveals evidence for a recent, severe bottleneck in a Netherlands population of Drosophila melanogaster. Genetics 172, 1607–1619 (2006)CrossRefGoogle Scholar
  28. 28.
    Wan, Q.H., Pan, S.K., Hu, L., et al.: Genome analysis and signature discovery for diving and sensory properties of the endangered chinese alligator. Cell Res. 23(9), 1091–1105 (2013)CrossRefGoogle Scholar
  29. 29.
    Wiuf, C., Hein, J.: Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999)CrossRefzbMATHGoogle Scholar
  30. 30.
    Zhao, S., Zheng, P., Dong, S., et al.: Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation. Nature Genetics 45, 67–71 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of CaliforniaBerkeleyUSA
  2. 2.Computer Science DivisionUniversity of CaliforniaBerkeleyUSA
  3. 3.Department of StatisticsUniversity of CaliforniaBerkeleyUSA
  4. 4.Department of Integrative BiologyUniversity of CaliforniaBerkeleyUSA

Personalised recommendations