Journal of Molecular Evolution

, Volume 42, Issue 2, pp 313–320 | Cite as

Probabilistic reconstruction of ancestral protein sequences

  • Jeffrey M. Koshi
  • Richard A. Goldstein


Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.

Key words

Bayesian statistics Evolutionary reconstruction Homologous sequences Protein evolution Maximum likelihood 



maximum likelihood


maximum parsimony


point-accepted mutations


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Benner SA, Badcoe I, Cohen MA, Gerloff DL (1994a) Bona fide prediction of aspects of protein conformation. J Mol Biol 235:926–958Google Scholar
  2. Benner SA, Cohen MA, Gerloff DL (1994b) Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng 7:1323–1332Google Scholar
  3. Cooper, A, Mourer-Chauvire C, Chambers GK, von Haeseler A, Wilson AC, Paabo S (1992) Independent origins of New Zealand moas and kiwis. Proc Nat Acad Sci USA 89:8741–8744Google Scholar
  4. Czelusniak J, Goodman M, Moncrief ND, Kehoe SM (1990) Maximum parsimony approach to construction of evolutionary trees from aligned homologous sequences. Methods Enzymol 183:601–615Google Scholar
  5. Dayhoff MO, Eck RV (1968) A model of evolutionary change in proteins. In: Dayhoff MO, Eck RV (eds) Atlas of protein sequence and structure, volume 3. National Biomedical Research Foundation Silver Spring, MD, pp 33–41Google Scholar
  6. DeSalle R, Gatesy J, Wheeler W, Grimaldi D (1992) DNA sequences from a fossil: termite in oligo-miocene amber and their phylogenetic implications. Science 257:1933–1936Google Scholar
  7. Felsenstein J (1973) Maximum likelihood and minimum steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249Google Scholar
  8. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376Google Scholar
  9. Fitch WM (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zool 20:406–416Google Scholar
  10. Gobel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18:309–317Google Scholar
  11. Higgins DG, Bleasby AJ, Fuchs R (1992) Clustal V: improved software for multiple sequence alignment. CABIOS 8:189–191Google Scholar
  12. Higuchi R, Bowman B, Freiberger M, Ryder OA, Wilson AC (1984) DNA sequences from the quagga, an extinct member of the horse family. Nature 312:282–284Google Scholar
  13. Holmquist R (1979) The method of parsimony: an experimental test and theoretical analysis of the adequacy of molecular restoration studies. J Mol Biol 135:939–958Google Scholar
  14. Koshi JM, Goldstein RA (1995) Context-dependent optimal substitution matrices derived using Bayesian statistics and phylogenetic trees. Protein Eng 8:641–645Google Scholar
  15. Libertini G, Donato AD (1994) Reconstruction of ancestral sequences by the inferential method, a tool for protein engineering studies. J Mol Evol 39:219–229Google Scholar
  16. Malcolm BA, Wilson KP, Matthews BW, Kirsch JF, Wilson AC (1990) Ancestral lysozymes reconstructed, neutrality tested, and thermostability linked to hydrocarbon packing. Nature 345:86–88Google Scholar
  17. Moore GW, Barnabas J, Goodman M (1973) A method for constructing maximum parsimony ancestral amino acid sequences on a given network. J Theor Biol 38:459–485Google Scholar
  18. Neher E (1994) How frequent are correlated changes in families of protein sequences. Proc Nat Acad Sci USA 91:98–102Google Scholar
  19. Paabo S (1989) Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proc Nat Acad Sci USA 86:1939–1943Google Scholar
  20. Rost B, Sander C (1994) Conservation and prediction of solvent accessibility in protein families. Proteins 20:216–226Google Scholar
  21. Rost B, Sander C, Schneider R (1994) Redefining the goals of protein secondary structure prediction. J Mol Biol 235:13–26Google Scholar
  22. Saitou N (1990) Maximum likelihood methods. Methods Enzymol 183:584–598Google Scholar
  23. Shih P, Malcolm BA, Rosenberg S, Kirsch JF, Wilson AC (1993) Reconstruction and testing ancestral proteins. Methods Enzymol 224:576–590Google Scholar
  24. Shindyalov I, Kochanov N, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations. Protein Eng 7(3):349–358Google Scholar
  25. Stackhouse J, Presnell SR, McGeehan GM, Nambiar KP, Benner SA (1990) The ribonuclease from an extinct ruminant. FEBS Lett 262:104–106Google Scholar
  26. Taylor WR, Hatrick K (1994) Compensating changes in protein multiple sequence alignments. Protein Eng 7:341–348Google Scholar
  27. Yang Z (1994) Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Systematic Biol 43:329–342Google Scholar

Copyright information

© Springer-Verlag New York Inc. 1996

Authors and Affiliations

  • Jeffrey M. Koshi
    • 1
  • Richard A. Goldstein
    • 1
    • 2
  1. 1.Biophysics Research DivisionUniversity of MichiganAnn ArborUSA
  2. 2.Department of ChemistryUniversity of MichiganAnn ArborUSA

Personalised recommendations