Skip to main content
Log in

A Collection of Amino Acid Replacement Matrices Derived from Clusters of Orthologs

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

Sequence divergence among orthologous proteins was characterized with 34 amino acid replacement matrices, sequence context analysis, and a phylogenetic tree. The model was trained on very large datasets of aligned protein sequences drawn from 15 organisms including protists, plants, Dictyostelium, fungi, and animals. Comparative tests with models currently used in phylogeny, i.e., with JTT+Γ±F and WAG+Γ±F, made on a test dataset of 380 multiple alignments containing protein sequences from all five of the major taxonomic groups mentioned, indicate that our model should be preferred over the JTT+Γ±F and WAG+Γ±F models on datasets similar to the test dataset. The strong performance of our model of orthologous protein sequence divergence can be attributed to its ability to better approximate amino acid equilibrium frequencies to compositions found in alignment columns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  • Altschul SF (1998) Generalized affine gap costs for protein sequence alignment. Proteins 32:88–96

    Article  PubMed  Google Scholar 

  • Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon P, Durufle L, Gaasterland T, Lopez P, Muller M, Philippe H (2002) The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci U S A 99(3):1414–1419

    Article  PubMed  Google Scholar 

  • Brown M, Hughey R, Mian IS, Sjolander K, Underwood R, Haussler D (1993) Using Dirichlet mixture priors to derive Hidden Markov Models for protein families. In: Hunter L, Searls D, Shavlik J (eds) Proceedings of First International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA

  • Cao Y, Adachi J, Janke A, Paabo S, Hasegawa M (1994) Phylogenetic relationships among eutherian orders estimated from inferred sequences of mitochondrial proteins: instability of a tree based on a single gene. J Mol Evol 39:519–527

    Article  PubMed  Google Scholar 

  • Dayhoff MO, Eck RV, Park CM (1972) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of Protein Sequence and Structure ,Vol 5. National Biomedical Research Foundation, Washington, DC, pp 89–99

    Google Scholar 

  • Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In Dayhoff MO (ed) Atlas of protein sequence and structure Vol 5, Suppl 3. National Biomedical Research Foundation, Washington, DC, pp 345–358

    Google Scholar 

  • Dimmic MW, Rest JS, Mindell DP, Goldstein RA (2002) rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol 55:65–73

    Article  PubMed  Google Scholar 

  • Efron B (1979) Bootstrap methods: another look at the jacknife. Ann Stat 7:1–26

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  PubMed  Google Scholar 

  • Felsenstein J (1993) PHYLIP (Phylogeny Inference Package), version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle

    Google Scholar 

  • Felsenstein J, Churchill GA (1996) A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol 13(1):93–104

    PubMed  Google Scholar 

  • Feng DF, Cho G, Doolittle RF (1997) Determining divergence times with a protein clock: update and reevaluation. Proc Natl Acad Sci USA 94:13028–13033

    Article  PubMed  Google Scholar 

  • Fitch WM (2000) Homology, a personal view on some of the problems. Trends Genet 16:227–231

    Article  PubMed  Google Scholar 

  • Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458

    PubMed  Google Scholar 

  • Goldman N, Whelan S (2002) A novel use of equilibrium frequencies in models of sequence evolution. Mol Biol Evol 19:1821–1831

    PubMed  Google Scholar 

  • Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15:910–917

    PubMed  Google Scholar 

  • Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17:49–61

    Article  PubMed  Google Scholar 

  • Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8:275–282

    PubMed  Google Scholar 

  • Lin K, May AC, Taylor WR (2001) Abstract Amino acid substitution matrices from an artificial neural network model. J Comput Biol. 8:471–481

    Article  PubMed  Google Scholar 

  • Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9:745–756

    PubMed  Google Scholar 

  • Miyamoto MM, Fitch WM (1995) Testing the covarion hypothesis of molecular evolution. Mol Biol Evol 12:503–513

    PubMed  Google Scholar 

  • Müller T, Vingron M (2000) Modelling amino acid replacement. J Comp Biol 7:761–776

    Article  Google Scholar 

  • Müller T, Spang R, Vingron M (2002) Estimating amino acid substitution models: A comparison of Dayhoff’s sstimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol 19:8–13

    PubMed  Google Scholar 

  • Notredame C, Higgins DG, Heringa J (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 302:205–217

    Article  PubMed  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Google Scholar 

  • Sjolander K, Karplus K, Brown M, Hughey R, Krogh A, Mian IS, Haussler D (1996) Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homology. Comput Appl Biosci 12:327–345

    PubMed  Google Scholar 

  • Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637

    Article  PubMed  Google Scholar 

  • Taylor WR (1986) The classification of amino acid conservation. J Theoret Biol 119:205–218

    Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    PubMed  Google Scholar 

  • Tourasse NJ, Li WH (2000) Selective constraints, amino acid composition and the rate of protein evolution. Mol Biol Evol 17:656–664

    PubMed  Google Scholar 

  • Veerassamy S, Smith A, Tillier ER (2003) A transition probability model for amino acid substitutions from blocks. J Comput Biol 10:997–1010

    Article  PubMed  Google Scholar 

  • Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699

    PubMed  Google Scholar 

  • Xu W, Miranker DP (2004) A metric model of amino acid substitution. Bioinformatics 20:1214–1221

    Article  PubMed  Google Scholar 

  • Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314

    Article  PubMed  Google Scholar 

  • Yang Z (1995) A space–time process model for the evolution of DNA sequences. Genetics 139:993–1005

    PubMed  Google Scholar 

  • Yang Z (1997) PAML:a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556

    PubMed  Google Scholar 

  • Yu YK, Altschul SF (2004) The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 21:902–911

    Article  PubMed  Google Scholar 

  • Yu YK, Wootton JC, Altschul SF (2003) The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci USA 100:15688–15693

    Article  PubMed  Google Scholar 

  • Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ (eds) Evolving genes and proteins. Academic Press, New York, pp 97–166

    Google Scholar 

Download references

Acknowledgments

We thank T. Hwa for guidance and encouragement and R. Konecny of the W.M. Keck Laboratory for Integrated Biology II for helping to keep the processors running. R.O. was supported in part by a grant from the NIH to W.F.L. (GM-062350), grants from the NSF to T.H. (DMR-9971456, DMR-0211308, MCB-0083704), and the Center for Theoretical Biological Physics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William F. Loomis.

Additional information

[Reviewing Editor : Dr. Martin Kreitman]

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Olsen, R., Loomis, W.F. A Collection of Amino Acid Replacement Matrices Derived from Clusters of Orthologs. J Mol Evol 61, 659–665 (2005). https://doi.org/10.1007/s00239-005-0060-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-005-0060-0

Keywords

Navigation