Skip to main content

Modeling Sequence Evolution

  • Protocol
Bioinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 452))

Abstract

DNA and amino acid sequences contain information about both the phylogenetic relationships among species and the evolutionary processes that caused the sequences to divergence. Mathematical and statistical methods try to detect this information to determine how and why DNA and protein molecules work the way they do. This chapter describes some of the models of evolution of biological sequences most widely used. It first focuses on single nucleotide/amino acid replacement rate models. Then it discusses the modelling of evolution at gene and protein module levels. The chapter concludes with speculations about the future use of molecular evolution studies using genomic and proteomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hein, J. (1994) TreeAlign. Methods Mol Biol 25, 349–364.

    PubMed  CAS  Google Scholar 

  2. Whelan, S., Goldman, N. (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18, 691–699.

    PubMed  CAS  Google Scholar 

  3. Liò, P., Vannucci, M. (2003) Investigating the evolution and structure of chemokine receptors. Gene 317, 29–37.

    Article  PubMed  Google Scholar 

  4. Glusman, G., Yanai, I., Rubin, I., et al. (2001) The complete human olfactory subgenome. Genome Res 11, 685–702.

    Article  PubMed  CAS  Google Scholar 

  5. Weiner, A. M. (2002) SINEs and LINEs: the art of biting the hand that feeds you. Curr Opin Cell Biol 14, 343–350.

    Article  PubMed  CAS  Google Scholar 

  6. Li, W. H. (2006) Molecular Evolution. Sin-auer Associates, Sunderland, MA.

    Google Scholar 

  7. Jukes, T. H., Cantor, C. R. (1969), Evolution of protein molecules in (Munro, H. N., ed.). Mammalian Protein Metabolism. Academic Press, New York.

    Google Scholar 

  8. Kimura, M. (1980) Estimation of evolutionary distances between homologous nucleotide sequences. Proc Natl Acad Sci U S A 78, 454–458.

    Article  Google Scholar 

  9. Blaisdell, J. (1985) A method of estimating from two aligned present-day DNA sequences their ancestral composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22, 69–81.

    Article  PubMed  CAS  Google Scholar 

  10. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17, 368–376.

    Article  PubMed  CAS  Google Scholar 

  11. Hasegawa, M., Kishino, H., Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22, 160–174.

    Article  PubMed  CAS  Google Scholar 

  12. Lanave, C., Preparata, G., Saccone, C., et al. (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20, 86–93.

    Article  PubMed  CAS  Google Scholar 

  13. Zarkikh, A. (1994) Estimation of evolutionary distances between nucleotide sequences. J Mol Evol 39, 315–329.

    Article  Google Scholar 

  14. Li, W.-H. (1997) Molecular Evolution. Sin-auer Associates, Sunderland, MA.

    Google Scholar 

  15. Yang, Z. (1994) Maximum likelihood phy-logenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39, 306–314.

    Article  PubMed  CAS  Google Scholar 

  16. Hasegawa, M., Di Rienzo, A., Kocher, T. D., et al. (1993) Toward a more accurate time scale for the human mitochondrial DNA tree. J Mol Evol 37, 347–354.

    Article  PubMed  CAS  Google Scholar 

  17. Yang, Z., Goldman, N., Friday, A. (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol 44, 384–399.

    Google Scholar 

  18. Felsenstein, J., Churchill, G. A. (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93–104.

    PubMed  CAS  Google Scholar 

  19. Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77, 257–286.

    Article  Google Scholar 

  20. Eddy, S. (1996) Hidden Markov models. Curr Opinion Struct Biol 6, 361–365.

    Article  CAS  Google Scholar 

  21. Averof, M., Rokas, A., Wolfe, K. H., et al. (2000) Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287, 1283–1286.

    Article  PubMed  CAS  Google Scholar 

  22. Siepel, A., Haussler, D. (2003) Combining phylogenetic and hidden Markov models in biosequence analysis. Proceedings of the Seventh Annual international Conference on Research in Computational Molecular Biology (RECOMB'03). ACM Press, Berlin, Germany, 10–13 April. pp. 277–286.

    Chapter  Google Scholar 

  23. Siepel, A., Haussler, D. (2004) Phyloge-netic estimation of context dependent substitution rates by maximum likelihood. Mol Biol Evol 21, 468–488.

    Article  PubMed  CAS  Google Scholar 

  24. Whelan, S., Goldman, N. (2004) Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167, 2027–2043.

    Article  PubMed  CAS  Google Scholar 

  25. Goldman, N., Yang, Z. (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11, 725–736.

    PubMed  CAS  Google Scholar 

  26. Yang, Z., Nielsen, R. (1998) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 46, 409–418.

    CAS  Google Scholar 

  27. Grantham, R. (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154), 862–864.

    Article  PubMed  CAS  Google Scholar 

  28. Yang, Z., Nielsen, R., Goldman, N., et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.

    PubMed  CAS  Google Scholar 

  29. Pedersen, A.M. K., Wiuf, C., Christiansen, F. B. (1998) A codon-based model designed to describe lentiviral evolution. Mol Biol Evol 15, 1069–1081.

    PubMed  CAS  Google Scholar 

  30. Yang, Z., Nielsen, R., Goldman, N., et al. (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449.

    PubMed  CAS  Google Scholar 

  31. Dayhoff, M. O., Eck, R. V., Park, C. M. (1972) A model of evolutionary change in proteins, in (Dayhoff, M. O., ed.), Atlas of Protein Sequence and Structure. vol. 5. National Biomedical Research Foundation, Washington, DC.

    Google Scholar 

  32. Dayhoff, M. O., Schwartz, R. M., Orcutt, B. C. (1978) A model of evolutionary change in proteins, in (Dayhoff, M. O., ed.), Atlas of Protein Sequence and Structure. vol. 5. National Biomedical Research Foundation, Washington, DC.

    Google Scholar 

  33. Jones, D. T., Taylor, W. R., Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequence. CABIOS 8, 275–282.

    PubMed  CAS  Google Scholar 

  34. Gonnet, G. H., Cohen, M. A., Benner, S. A. (1992). Exhaustive matching of the entire protein sequence database. Science 256, 1443–1445.

    Article  PubMed  CAS  Google Scholar 

  35. Henikoff, S., Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad U S A 89, 10915–10919.

    Article  CAS  Google Scholar 

  36. Claverie, J. M. (1993) Detecting frame shifts by amino acid sequence comparison. J Mol Biol 234, 1140–1157.

    Article  PubMed  CAS  Google Scholar 

  37. Altschul, S. F. (1993) A protein alignment scoring system sensitive at all evolutionary distances. J Mol Evol 36, 290–300.

    Article  PubMed  CAS  Google Scholar 

  38. Naylor, G., Brown, W. M. (1997) Structural biology and phylogenetic estimation. Nature 388, 527–528.

    Article  PubMed  CAS  Google Scholar 

  39. Rzhetsky, A. (1995) Estimating substitution rates in ribosomal RNA genes. Genetics 141, 771–783.

    PubMed  CAS  Google Scholar 

  40. Goldman, N., Thorne, J. L., Jones, D. T. (1996) Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. J Mol Biol 263, 196–208.

    Article  PubMed  CAS  Google Scholar 

  41. Thorne, J. L., Goldman, N., Jones, D. T. (1996) Combining protein evolution and secondary structure. Mol Biol Evol 13, 666–673.

    PubMed  CAS  Google Scholar 

  42. Goldman, N., Thorne, J. L., Jones, D. T. (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149, 445–458.

    PubMed  CAS  Google Scholar 

  43. Liò, P., Goldman, N., Thorne, J. L., et al. (1998) PASSML: combining evolutionary inference and protein secondary structure prediction. Bioinformatics 14, 726–733.

    Article  PubMed  Google Scholar 

  44. Liò, P., Goldman, N. (1999) Using protein structural information in evolutionary inference: transmembrane proteins. Mol Biol Evol 16, 1696–1710.

    PubMed  Google Scholar 

  45. Fornasari, M. S., Parisi, G., Echave, J. (2002) Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. Mol Biol Evol 19, 352–356.

    PubMed  CAS  Google Scholar 

  46. Sippl, M. J. (1993) Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355–362.

    Article  PubMed  CAS  Google Scholar 

  47. Bastolla, U., Porto, M., Roman, H. E., et al. (2005) The principal eigenvector of contact matrices and hydrophobicity profiles in proteins. Proteins 58, 22–30.

    Article  PubMed  CAS  Google Scholar 

  48. Pollock, D. D., Taylor, W. R., Goldman, N. (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 287, 187–198.

    Article  PubMed  CAS  Google Scholar 

  49. Pagel, M. (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc (B) 255, 37–45.

    Article  Google Scholar 

  50. Rzhetsky, A. (1995) Estimating substitution rates in ribosomal RNA genes. Genetics 141, 771–783.

    PubMed  CAS  Google Scholar 

  51. Telford, M. J., Wise, M. J., Gowri-Shankar, V. (2005) Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: examples from the Bilateria. Mol Biol Evol 22, 1129–1136.

    Article  PubMed  CAS  Google Scholar 

  52. Hudelot, C., Gowri-Shankar, V., Jow, H., et al. (2003) RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol Phyl Evol 28, 241–252.

    Article  CAS  Google Scholar 

  53. Dermitzakis, E. T., Clark, A. G. (2001) Differential selection after duplication in mammalian developmental genes. Mol Biol Evol 18, 557–562.

    PubMed  CAS  Google Scholar 

  54. Walsh, J. B. (1995) How often do duplicated genes evolve new functions? Genetics 139, 421–428.

    PubMed  CAS  Google Scholar 

  55. Nadeau, J. H., Sankoff, D. (1997) Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics 147, 1259–1266.

    PubMed  CAS  Google Scholar 

  56. Force, A., Cresko, W. A., Pickett, F. B., et al. (2005) The origin of sub-functions and modular gene regulation. Genetics 170, 433–446.

    Article  PubMed  CAS  Google Scholar 

  57. Lynch, M., O'Hely, M., Walsh, B., et al. (2001) The probability of preservation of a newly arisen gene duplicate. Genetics 159, 1789–1804.

    PubMed  CAS  Google Scholar 

  58. He, X., Zhang, J. (2005) Rapid sub-func-tionalization accompanied by prolonged and substantial neo-functionalization in duplicate gene evolution. Genetics 169, 1157.

    Article  PubMed  Google Scholar 

  59. von Mering, C., Krause, R., Snel, B., et al. (2002) Comparative assessment of large-scale datasets of protein-protein interactions. Nature 417(6887), 399–403.

    Article  Google Scholar 

  60. Tang, H., Lewontin, R. C. (1999) Locating regions of differential variability in DNA and protein sequences. Genetics 153, 485–495.

    PubMed  CAS  Google Scholar 

  61. Gu, X. (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16, 1664–1674.

    PubMed  CAS  Google Scholar 

  62. Gu, X. (2001) Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol 18, 453.

    PubMed  CAS  Google Scholar 

  63. Karev, G. P, Wolf, Y. I., Koonin, E. V. (2003) Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve? Bioinformatics 19, 1889–1900.

    Article  PubMed  CAS  Google Scholar 

  64. Karev, G., et al. (2002) Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol 2, 18–24.

    Article  PubMed  Google Scholar 

  65. Rzhetsky, A., Ayala, F. J., Hsu, L. C., et al. (1997) Exon/intron structure of aldehyde dehydrogenase genes supports the “introns-late” theory. Proc Natl Acad Sci USA 94, 6820–6825.

    Article  PubMed  CAS  Google Scholar 

  66. Piazza, F., Liò, P. Statistical analysis of simple repeats in the human genome. Physica A 347, 472–488.

    Google Scholar 

  67. Odom, G. L., Robichaux, J. L., Deininger, P. L. (2004) Predicting mammalian SINE subfamily activity from A-tail length. Mol Biol Evol 21, 2140–2148.

    Article  PubMed  CAS  Google Scholar 

  68. Roy-Engel, A. M., Salem, A. H., Oyeniran, O. O., et al. (2002) Active Alu element “A-tails”: size does matter. Genome Res 12, 1333–1344.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Liò, P., Bishop, M. (2008). Modeling Sequence Evolution. In: Keith, J.M. (eds) Bioinformatics. Methods in Molecular Biology™, vol 452. Humana Press. https://doi.org/10.1007/978-1-60327-159-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-159-2_13

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-707-5

  • Online ISBN: 978-1-60327-159-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics