Probability, Statistics, and Computational Science

  • Niko BeerenwinkelEmail author
  • Juliane Siebourg
Part of the Methods in Molecular Biology book series (MIMB, volume 855)


In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.

Key words

Bayesian inference Bayesian networks Dynamic programming Expectation maximization algorithm Hidden Markov models Markov chains Maximum likelihood Statistical models 


  1. 1.
    Ewens, W. J. and Grant, G. R. (2005) Statistical methods in bioinformatics: an introduction. Springer, New York, NY.Google Scholar
  2. 2.
    Deonier, R. C., Tavaré, S., and Waterman, M. S. (2005) Computational genome analysis: an introduction. Springer, New York, NY.Google Scholar
  3. 3.
    Davison, A. C. (2009) Statistical models. Cambridge University Press, Cambridge, UK.Google Scholar
  4. 4.
    Ross, S. M. (2007) Introduction to probability models. Academic Press.Google Scholar
  5. 5.
    Hardy, G. H. (1908) Mendelian proportions in a mixed population. Science, 28, 49.PubMedCrossRefGoogle Scholar
  6. 6.
    Weinberg, W. (1908) Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, 64, 368–382.Google Scholar
  7. 7.
    Pachter, L. and Sturmfels, B. (eds.) (2005) Algebraic statistics for computational biology.Google Scholar
  8. 8.
    Casella, G. and Berger, R. L. (2002) Statistical inference. Thomson Learning, Pacific Grove, CA.Google Scholar
  9. 9.
    Efron, B. and Tibshirani, R. (1993) An introduction to the bootstrap. Chapman & Hall/CRC, New York, NY.Google Scholar
  10. 10.
    Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian data analysis, second edition. Chapman & Hall/CRC, Boca Raton, Fla.Google Scholar
  11. 11.
    Dempster, A. P., Laird, N. M., Rubin, D. B., et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.Google Scholar
  12. 12.
    Norris, J. R. (1998) Markov chains. Cambridge University Press.Google Scholar
  13. 13.
    Wright, S. (1990) Evolution in Mendelian populations. Bulletin of Mathematical Biology, 52, 241–295.PubMedGoogle Scholar
  14. 14.
    Fisher, R. A. (1930) The genetical theory of natural selection. Clarendon Press, Oxford, UK.Google Scholar
  15. 15.
    Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. Mammalian protein metabolism, 3, 21–132.Google Scholar
  16. 16.
    Rabiner, L. R. (1989) A tutorial on HMM and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.CrossRefGoogle Scholar
  17. 17.
    Durbin, R. (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK.Google Scholar
  18. 18.
    Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.CrossRefGoogle Scholar
  19. 19.
    Baum, L. E. (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3, 1–8.Google Scholar
  20. 20.
    Bishop, C. M. (2006) Pattern recognition and machine learning. Springer, New York.Google Scholar
  21. 21.
    Husmeier, D., Dybowski, R., and Roberts, S. (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, New York.Google Scholar
  22. 22.
    Koller, D. and Friedman, N. (2009) Probabilistic graphical models: principles and techniques. The MIT Press, Cambridge, MA.Google Scholar
  23. 23.
    Jordan, M. I. (1998) Learning in graphical models. Kluwer Academic Publishers, Cambridge, MA.Google Scholar
  24. 24.
    Schwarz, G. (1978) Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.CrossRefGoogle Scholar
  25. 25.
    Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Intelligence, 62, 144.Google Scholar
  26. 26.
    Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97.CrossRefGoogle Scholar
  27. 27.
    Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.PubMedCrossRefGoogle Scholar
  28. 28.
    Felsenstein, J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA.Google Scholar
  29. 29.
    Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.PubMedCrossRefGoogle Scholar
  30. 30.
    Siepel, A. and Haussler, D. (2005) Phylogenetic hidden Markov models. Statistical Methods in Molecular Evolution, pp. 325–351.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Biosystems Science and EngineeringETH ZurichBaselSwitzerland

Personalised recommendations