Probabilistic ancestral sequences and multiple alignments

  • Gaston H. Gonnet
  • Steven A. Benner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1097)


An evolutionary configuration (EC) is a set of aligned sequences of characters (possibly representing amino acids, DNA, RNA or natural language). We define the probability of an EC, based on a given phylogenetic tree and give an algorithm to compute this probability efficiently. From these probabilities, we can compute the most likely sequence at any place in the phylogenetic tree, or its probability profile. The probability profile at the root of the tree is called the probabilistic ancestral sequence. By computing the probability of an EC, we can find by dynamic programming alignments over two subtrees. This gives an algorithm for computing multiple alignments. These multiple alignments are maximum likelihood, and are a compatible generalization of two sequence alignments.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. Allison and C.S. Wallace. The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments. J. Molecular Evolution, 39:418–430, 1994.Google Scholar
  2. [2]
    Lachlan H. Bell, John R. Coggins, and James E. Milner-White. Mix'n'match: an improved multiple sequence alignment procedure for distantly related proteins using secondary structure predictions, designed to be independent of the choice of gap penalty and scoring matrix. Protein Engineering, 6(7):683–690, 1993.PubMedGoogle Scholar
  3. [3]
    Steven A. Benner, Mark A. Cohen, and Gaston H. Gonnet. Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Molecular Biology, 229:1065–1082, 1993.Google Scholar
  4. [4]
    Steven A. Benner, Mark A. Cohen, and Gaston H. Gonnet. Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Engineering, 7(11), 1994.Google Scholar
  5. [5]
    Humberto Carillo and David. Lipman. The multiple sequence alignment problem in biology. SIAM J. Appl. Math., 48(5):1073–1082, 1988.Google Scholar
  6. [6]
    Margaret O. Dayhoff, R. M. Schwartz, and B. C. Orcutt. A model for evolutionary change in proteins. In Margaret O. Dayhoff, editor, Atlas of Protein Sequence and Structure, volume 5, pages 345–352. National Biochemical Research Foundation, Washington DC, 1978.Google Scholar
  7. [7]
    Adam Godzik and Jeffrey Skolnick. Flexible algorithm for direct multiple alignment of protein structures and sequences. CABIOS, 10(6):587–596, 1994.PubMedGoogle Scholar
  8. [8]
    Gaston H. Gonnet, Mark A. Cohen, and Steven A. Benner. Exhaustive matching of the entire protein sequence database. Science, 256:1443–1445, 1992.PubMedGoogle Scholar
  9. [9]
    Gaston H. Gonnet and Chantal Korostensky. Evaluation measures of multiple sequence alignments. In preparation, 1996.Google Scholar
  10. [10]
    O. Gotoh. An improved algorithm for matching biological sequences. J. Mol. Biol., 162:705–708, 1982.PubMedGoogle Scholar
  11. [11]
    Sandeep K. Gupta, John Kececioglu, and Alejandro A. Schaffer. Improving the practical space and time efficiency of the shortest-paths approach to sum-of-pairs multiple sequence alignment. J. Computational Biology, 1996. To appear.Google Scholar
  12. [12]
    Xiaoqiu Huang. On global sequence alignment. CABIOS, 10(3):227–235, 1994.PubMedGoogle Scholar
  13. [13]
    Charles E. Lawrence, Stephen F. Altschul, Mark S. Boguski, Jun S. Liu, Andrew F. Neuwald, and John C. Wootton. Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment. Science, 262:208–214, October 1993.PubMedGoogle Scholar
  14. [14]
    David J. Lipman, Stephen F. Altschul, and John D. Kececioglu. A tool for multiple sequence alignment. Proc. Natl. Acad. Sci. USA, 86:4412–4415, June 1989.PubMedGoogle Scholar
  15. [15]
    S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48:443–453, 1970.PubMedGoogle Scholar
  16. [16]
    Gregory D. Schuler, Stephen F. Altschul, and David J. Lipman. A work-bench for multiple alignment construction and analysis. PROTEINS: Structure, Function, and Genetics, 9:180–190, 1991.Google Scholar
  17. [17]
    Peter H. Sellers. On the theory and computation of evolutionary distances. SIAM J Appl. Math., 26(4):787–793, Jun 1974.Google Scholar
  18. [18]
    Temple F. Smith and Michael S. Waterman. Identification of common molecular subsequences. J. Mol. Biol., 147:195–197, 1981.PubMedGoogle Scholar
  19. [19]
    J.D. Thompson, D.G. Higgins, and T.J Gibson. Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673–4680, 1994.PubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Gaston H. Gonnet
    • 1
  • Steven A. Benner
    • 1
  1. 1.Informatik Organic Chemistry E.T.H.ZurichSwitzerland

Personalised recommendations