Skip to main content

Probability, Statistics, and Computational Science

  • Protocol
  • First Online:
Evolutionary Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 855))

Abstract

In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ewens, W. J. and Grant, G. R. (2005) Statistical methods in bioinformatics: an introduction. Springer, New York, NY.

    Google Scholar 

  2. Deonier, R. C., Tavaré, S., and Waterman, M. S. (2005) Computational genome analysis: an introduction. Springer, New York, NY.

    Google Scholar 

  3. Davison, A. C. (2009) Statistical models. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  4. Ross, S. M. (2007) Introduction to probability models. Academic Press.

    Google Scholar 

  5. Hardy, G. H. (1908) Mendelian proportions in a mixed population. Science, 28, 49.

    Article  PubMed  CAS  Google Scholar 

  6. Weinberg, W. (1908) Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, 64, 368–382.

    Google Scholar 

  7. Pachter, L. and Sturmfels, B. (eds.) (2005) Algebraic statistics for computational biology.

    Google Scholar 

  8. Casella, G. and Berger, R. L. (2002) Statistical inference. Thomson Learning, Pacific Grove, CA.

    Google Scholar 

  9. Efron, B. and Tibshirani, R. (1993) An introduction to the bootstrap. Chapman & Hall/CRC, New York, NY.

    Google Scholar 

  10. Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian data analysis, second edition. Chapman & Hall/CRC, Boca Raton, Fla.

    Google Scholar 

  11. Dempster, A. P., Laird, N. M., Rubin, D. B., et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.

    Google Scholar 

  12. Norris, J. R. (1998) Markov chains. Cambridge University Press.

    Google Scholar 

  13. Wright, S. (1990) Evolution in Mendelian populations. Bulletin of Mathematical Biology, 52, 241–295.

    PubMed  CAS  Google Scholar 

  14. Fisher, R. A. (1930) The genetical theory of natural selection. Clarendon Press, Oxford, UK.

    Google Scholar 

  15. Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. Mammalian protein metabolism, 3, 21–132.

    CAS  Google Scholar 

  16. Rabiner, L. R. (1989) A tutorial on HMM and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.

    Article  Google Scholar 

  17. Durbin, R. (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  18. Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.

    Article  Google Scholar 

  19. Baum, L. E. (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3, 1–8.

    Google Scholar 

  20. Bishop, C. M. (2006) Pattern recognition and machine learning. Springer, New York.

    Google Scholar 

  21. Husmeier, D., Dybowski, R., and Roberts, S. (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, New York.

    Google Scholar 

  22. Koller, D. and Friedman, N. (2009) Probabilistic graphical models: principles and techniques. The MIT Press, Cambridge, MA.

    Google Scholar 

  23. Jordan, M. I. (1998) Learning in graphical models. Kluwer Academic Publishers, Cambridge, MA.

    Google Scholar 

  24. Schwarz, G. (1978) Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.

    Article  Google Scholar 

  25. Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Intelligence, 62, 144.

    Google Scholar 

  26. Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97.

    Article  Google Scholar 

  27. Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.

    Article  PubMed  CAS  Google Scholar 

  28. Felsenstein, J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA.

    Google Scholar 

  29. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

    Article  PubMed  CAS  Google Scholar 

  30. Siepel, A. and Haussler, D. (2005) Phylogenetic hidden Markov models. Statistical Methods in Molecular Evolution, pp. 325–351.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niko Beerenwinkel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Beerenwinkel, N., Siebourg, J. (2012). Probability, Statistics, and Computational Science. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 855. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-582-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-582-4_3

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-61779-581-7

  • Online ISBN: 978-1-61779-582-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics