Skip to main content

Accurate Computation of Likelihoods in the Coalescent with Recombination Via Parsimony

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4955))

Abstract

Understanding the variation of recombination rates across a given genome is crucial for disease gene mapping and for detecting signatures of selection, to name just a couple of applications. A widely-used method of estimating recombination rates is the maximum likelihood approach, and the problem of accurately computing likelihoods in the coalescent with recombination has received much attention in the past. A variety of sampling and approximation methods have been proposed, but no single method seems to perform consistently better than the rest, and there still is great value in developing better statistical methods for accurately computing likelihoods. So far, with the exception of some two-locus models, it has remained unknown how the true likelihood exactly behaves as a function of model parameters, or how close estimated likelihoods are to the true likelihood. In this paper, we develop a deterministic, parsimony-based method of accurately computing the likelihood for multi-locus input data of moderate size. We first find the set of all ancestral configurations (ACs) that occur in evolutionary histories with at most k crossover recombinations. Then, we compute the likelihood by summing over all evolutionary histories that can be constructed only using the ACs in that set. We allow for an arbitrary number of crossing over, coalescent and mutation events in a history, as long as the transitions stay within that restricted set of ACs. For given parameter values, by gradually increasing the bound k until the likelihood stabilizes, we can obtain an accurate estimate of the likelihood. At least for moderate crossover rates, the algorithm-based method described here opens up a new window of opportunities for testing and fine-tuning statistical methods for computing likelihoods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bafna, V., Bansal, V.: The number of recombination events in a sample history: conflict graph and lower bounds. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 78–90 (2004)

    Article  Google Scholar 

  2. Bafna, V., Bansal, V.: Improved Recombination Lower Bounds for Haplotype Data. In: Miyano, S., Mesirov, J., Kasif, S., Istrail, S., Pevzner, P.A., Waterman, M. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3500, pp. 569–584. Springer, Heidelberg (2005)

    Google Scholar 

  3. Beaumont, M.: Detecting population expansion and decline using microsatellites. Genetics 153, 2013–2029 (1999)

    Google Scholar 

  4. Bordewich, M., Semple, C.: Computing the minimum number of hybridization events for a consistent evolutionary history. Discrete Applied Mathematics 155, 914–928 (2007)

    MathSciNet  MATH  Google Scholar 

  5. De Iorio, M., Griffiths, R.C.: Importance sampling on coalescent histories. I. Adv. Appl. Prob. 36, 417–433 (2004)

    Article  MATH  Google Scholar 

  6. De Iorio, M., Griffiths, R.C.: Importance sampling on coalescent histories. II: Subdivided population models. Adv. Appl. Prob. 36, 434–454 (2004)

    Article  MATH  Google Scholar 

  7. Ethier, S.N., Griffiths, R.C.: The infinitely-many-sites model as a measure valued diffusion. Ann. Probab. 15, 515–545 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  8. Ethier, S.N., Griffiths, R.C.: On the two-locus sampling distribution. J. Math. Biol. 29, 131–159 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fearnhead, P., Donnelly, P.: Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001)

    Google Scholar 

  10. Fearnhead, P., Donnelly, P.: Approximate likelihood methods for estimating local recombination rates. J. R. Statist. Soc. B 64, 657–680 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  11. Fearnhead, P., Smith, N.G.C.: A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes. Am. J. Hum. Genet. 77, 781–794 (2005)

    Article  Google Scholar 

  12. Griffiths, R.C., Marjoram, P.: Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502 (1996)

    Article  Google Scholar 

  13. Griffiths, R.C., Tavaré, S.: Ancestral inference in population genetics. Stat. Sci. 9, 307–319 (1994)

    Article  MATH  Google Scholar 

  14. Griffiths, R.C., Tavaré, S.: Sampling theory for neutral alleles in a varying environment. Proc. R. Soc. London B. 344, 403–410 (1994)

    Google Scholar 

  15. Griffiths, R.C., Tavaré, S.: Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46, 131–159 (1994)

    Article  MATH  Google Scholar 

  16. Gusfield, D.: Optimal, efficient reconstruction of Root-Unknown phylogenetic networks with constrained recombination. J. Comput. Sys. Sci. 70, 381–398 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  17. Gusfield, D., Eddhu, S., Langley, C.: The fine structure of galls in phylogenetic networks. INFORMS J. on Computing, special issue on Computational Biology 16, 459–469 (2004)

    MathSciNet  Google Scholar 

  18. Gusfield, D., Eddhu, S., Langley, C.: Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J. Bioinf. Comput. Biol. 2, 173–213 (2004)

    Article  Google Scholar 

  19. Hein, J.: Reconstructing evolution of sequences subject to recombination using parsimony. Math. Biosci. 98, 185–200 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  20. Hein, J.: A heuristic method to reconstruct the history of sequences subject to recombination. J. Mol. Evol. 36, 396–405 (1993)

    Article  Google Scholar 

  21. Hudson, R.R.: Generating Samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002)

    Article  Google Scholar 

  22. Hudson, R., Kaplan, N.: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111, 147–164 (1985)

    Google Scholar 

  23. Hudson, R.R.: Two-locus sampling distributions and their application. Genetics 159, 1805–1817 (2001)

    Google Scholar 

  24. International HapMap Consortium. A haplotype map of the human genome 437, 1299–1320 (2005)

    Google Scholar 

  25. Kuhner, M.K., Yamato, J., Felsenstein, J.: Estimating effective population size and mutation rate from sequence data using metropolis-hastings sampling. Genetics 140, 1421–1430 (1995)

    Google Scholar 

  26. Kuhner, M.K., Yamato, J., Felsenstein, J.: Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393–1401 (2000)

    Google Scholar 

  27. Larribe, F., Lessard, S., Schork, N.J.: Gene Mapping via the Ancestral Recombination Graph. Theor. Popul. Biol. 62, 2150–2229 (2002)

    Article  Google Scholar 

  28. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003)

    Google Scholar 

  29. Lyngsø, R.B., Song, Y.S., Hein, J.: Minimum recombination histories by branch and bound. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 239–250. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  30. McVean, G., Awadalla, P., Fearnhead, P.: A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160, 1231–1241 (2002)

    Google Scholar 

  31. McVean, G., Cardin, N.: Approximating the coalescent with recombination. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1387–1393 (2005)

    Article  Google Scholar 

  32. McVean, G.A.T., Myers, S., Hunt, S., Deloukas, P., Bentley, D.R., Donnelly, P.: The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004)

    Article  Google Scholar 

  33. Myers, S., Bottolo, L., Freeman, C., McVean, G., Donnelly, P.: A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321–324 (2005)

    Article  Google Scholar 

  34. Myers, S.R., Griffiths, R.C.: Bounds on the minimum number of recombination events in a sample history. Genetics 163, 375–394 (2003)

    Google Scholar 

  35. Simonsen, K.L., Churchill, G.A.: A Markov chain model of coalescence with recombination. Theor. Popul. Biol. 52, 43–59 (1997)

    Article  MATH  Google Scholar 

  36. Song, Y.S., Hein, J.: Parsimonious reconstruction of sequence evolution and haplotype blocks: Finding the minimum number of recombination events. In: Proc. of Workshop on Algorithms in Bioinformatics 2003, Berlin, Germany. LNCS, pp. 287–302. Springer, Berlin (2003)

    Google Scholar 

  37. Song, Y.S., Hein, J.: On the minimum number of recombination events in the evolutionary history of DNA sequences. J. Math. Biol. 48, 160–186 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  38. Song, Y.S., Hein, J.: Constructing minimal ancestral recombination graphs. J. Comput. Biol. 12, 147–169 (2005)

    Article  Google Scholar 

  39. Song, Y.S., Lyngsø, R.B., Hein, J.: Counting all possible ancestral configurations of sample sequences in population genetics. IEEE Transactions on Computational Biology and Bioinformatics 3(3), 239–251 (2006)

    Article  Google Scholar 

  40. Song, Y.S., Wu, Y., Gusfield, D.: Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences. In: Proc. of ISMB 2005, Bioinformatics, vol. 21, pp. 413–422 (2005)

    Google Scholar 

  41. Stephens, M., Donnelly, P.: Inference in molecular population genetics. J.R. Stat. Soc. Ser. B 62, 605–655 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  42. Wall, J.D.: A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17, 156–163 (2000)

    Google Scholar 

  43. Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. J. Comput. Biol. 8, 69–78 (2001)

    Article  Google Scholar 

  44. Wilson, I.J., Balding, D.J.: Genealogical inference from microsatellite data. Genetics 150, 499–510 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin Vingron Limsoon Wong

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lyngsø, R.B., Song, Y.S., Hein, J. (2008). Accurate Computation of Likelihoods in the Coalescent with Recombination Via Parsimony. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78839-3_41

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78838-6

  • Online ISBN: 978-3-540-78839-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics