Abstract
In this chapter, we review basic concepts from probability theory and computational statistics that are fundamental to evolutionary genomics. We provide a very basic introduction to statistical modeling and discuss general principles, including maximum likelihood and Bayesian inference. Markov chains, hidden Markov models, and Bayesian network models are introduced in more detail as they occur frequently and in many variations in genomics applications. In particular, we discuss efficient inference algorithms and methods for learning these models from partially observed data. Several simple examples are given throughout the text, some of which point to models that are discussed in more detail in subsequent chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ewens, W. J. and Grant, G. R. (2005) Statistical methods in bioinformatics: an introduction. Springer, New York, NY.
Deonier, R. C., Tavaré, S., and Waterman, M. S. (2005) Computational genome analysis: an introduction. Springer, New York, NY.
Davison, A. C. (2009) Statistical models. Cambridge University Press, Cambridge, UK.
Ross, S. M. (2007) Introduction to probability models. Academic Press.
Hardy, G. H. (1908) Mendelian proportions in a mixed population. Science, 28, 49.
Weinberg, W. (1908) Über den Nachweis der Vererbung beim Menschen. Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg, 64, 368–382.
Pachter, L. and Sturmfels, B. (eds.) (2005) Algebraic statistics for computational biology.
Casella, G. and Berger, R. L. (2002) Statistical inference. Thomson Learning, Pacific Grove, CA.
Efron, B. and Tibshirani, R. (1993) An introduction to the bootstrap. Chapman & Hall/CRC, New York, NY.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003) Bayesian data analysis, second edition. Chapman & Hall/CRC, Boca Raton, Fla.
Dempster, A. P., Laird, N. M., Rubin, D. B., et al. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1–38.
Norris, J. R. (1998) Markov chains. Cambridge University Press.
Wright, S. (1990) Evolution in Mendelian populations. Bulletin of Mathematical Biology, 52, 241–295.
Fisher, R. A. (1930) The genetical theory of natural selection. Clarendon Press, Oxford, UK.
Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. Mammalian protein metabolism, 3, 21–132.
Rabiner, L. R. (1989) A tutorial on HMM and selected applications in speech recognition. Proceedings of the IEEE, 77, 257–286.
Durbin, R. (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge, UK.
Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13, 260–269.
Baum, L. E. (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities, 3, 1–8.
Bishop, C. M. (2006) Pattern recognition and machine learning. Springer, New York.
Husmeier, D., Dybowski, R., and Roberts, S. (2005) Probabilistic modeling in bioinformatics and medical informatics. Springer, New York.
Koller, D. and Friedman, N. (2009) Probabilistic graphical models: principles and techniques. The MIT Press, Cambridge, MA.
Jordan, M. I. (1998) Learning in graphical models. Kluwer Academic Publishers, Cambridge, MA.
Schwarz, G. (1978) Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Neal, R. M. (1993) Probabilistic inference using Markov Chain Monte Carlo methods. Intelligence, 62, 144.
Hastings, W. K. (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97.
Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741.
Felsenstein, J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA.
Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.
Siepel, A. and Haussler, D. (2005) Phylogenetic hidden Markov models. Statistical Methods in Molecular Evolution, pp. 325–351.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Beerenwinkel, N., Siebourg, J. (2012). Probability, Statistics, and Computational Science. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 855. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-582-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-61779-582-4_3
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-581-7
Online ISBN: 978-1-61779-582-4
eBook Packages: Springer Protocols