Skip to main content
Log in

A survey of multiple sequence comparison methods

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Multiple sequence comparison refers to the search for similarity in three or more sequences. This article presents a survey of the exhaustive (optimal) and heuristic (possibly sub-optimal) methods developed for the comparison of multiple macromolecular sequences. Emphasis is given to the different approaches of the heuristic methods. Four distance measures derived from information engineering and genetic studies are introduced for the comparison between two alignments of sequences. The use ofentropy, which plays a central role in information theory as measures of information, choice and uncertainty, is proposed as a simple measure for the evaluation of the optimality of an alignment in the absence of anya priori knowledge about the structures of the sequences being compared. This article also gives two examples of comparison between alternative alignments of the same set of 5SRNAs as obtained by several different heuristic methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Altschul, S. F. and B. W. Erickson. 1986. Optimal sequence alignment using affine gap costs.Bull. math. Biol. 48, 603–616.

    MATH  MathSciNet  Google Scholar 

  • Altschul, S. F. 1989. Gap costs for multiple sequence alignment.J. theor. Biol. 138, 297–309.

    MathSciNet  Google Scholar 

  • Altschul, S. F. and D. J. Lipman. 1989. Trees, stars, and multiple biological sequence alignment.SIAM J. appl. Math. 49, 197–209.

    Article  MATH  MathSciNet  Google Scholar 

  • Altschul, S. F., R. J. Carroll and D. J. Lipman. 1989. Weights for data related by a tree.J. mol. Biol. 207, 647–653.

    Article  Google Scholar 

  • Bacon, D. J. and W. F. Anderson. 1986. Multiple sequence alignment.J. molec. Biol. 191, 153–161.

    Article  Google Scholar 

  • Bains, W. 1986. MULTAN: A program to align multiple DNA sequences.Nucl. Acids Res. 14, 159–177.

    Google Scholar 

  • Barton, G. J. and M. J. E. Sternberg. 1987. A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons.J. molec. Biol. 198, 327–337.

    Article  Google Scholar 

  • Blanken, R. L., L. C. Klotz and A. G. Hinnebusch. 1982. Computer comparison of new and existing criteria for constructing evolutionary trees from sequence data.J. molec. Evol. 19, 9–19.

    Article  Google Scholar 

  • Bradley, D. W. and R. A. Bradley. 1983. Application of sequence comparison to the study of bird songs. In:Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds). London: Addison-Wesley.

    Google Scholar 

  • Carrillo, H. and D. Lipman. 1988. The multiple sequence alignment problem in biology.SIAM J. appl. Math. 48, 1073–1082.

    Article  MATH  MathSciNet  Google Scholar 

  • Cavalli-Sforza, L. L. and W. F. Bodmer. 1971.The Genetics of Human Population. San Francisco: Freeman, pp. 704–706.

    Google Scholar 

  • Chan, S. C. 1990. Random Graph and Sequence Synthesis. Ph.D. Thesis, Department of Systems Design Engineering, University of Waterloo, Canada.

    Google Scholar 

  • Chan, S. C. and A. K. C. Wong. 1990. Synthesis and recognition of sequences.IEEE Trans. Pattern. Anal. Machine Intell., in press.

  • Chan, S. C., A. K. C. Wong and D. K. Y. Chiu. 1992. A multiple sequence comparison method.Bull. math. Biol., in press.

  • Chiu, D. K. Y. and A. K. C. Wong. 1986. Synthesizing knowledge: a cluster analysis approach using event covering.IEEE Trans. Syst. Man. Cyber. 16, 251–259.

    Google Scholar 

  • Cohen, D. N., T. A. Reichert and A. K. C. Wong. 1975. Matching code sequences utilizing context free quality measure.Math. Biosci. 24, 25–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering.Nucl. Acids Res. 16, 10881–10890.

    Google Scholar 

  • Dantzig, G. B. 1963.Linear Programming and Extensions. Princeton: Princeton University Press.

    MATH  Google Scholar 

  • Davison, D. 1985. Sequence similarity (‘homology’) searching for molecular biologists.Bull. math. Biol. 47, 437–474.

    Article  MATH  MathSciNet  Google Scholar 

  • Delcoigne, A. and P. Hansen. 1975. Sequence comparison by dynamic programming.Biometrika 62, 661–664.

    Article  MATH  Google Scholar 

  • Delihas, N. and J. Anderson. 1982. Generalized structures of the 5S ribosomal RNAs.Nucl. Acids Res. 10, 7323–7344.

    Google Scholar 

  • Doolittle, R. F. (Ed.) 1990. Molecular evolution: computer analysis of protein and nucleic acid sequences.Methods Enzymol. 183.

  • Dumas, J. P. and J. Ninio. 1982. Efficient algorithms for folding and comparing nucleic acid sequences.Nucl. Acids Res. 10, 197–206.

    Google Scholar 

  • Dumey, A. I. 1956. Indexing for rapid random-access memory.Comput. Automat. 5, 6–8.

    Google Scholar 

  • Dunn, G. and B. S. Everitt. 1982.An Introduction to Mathematical Taxonomy. Cambridge, U.K.: Cambridge University Press.

    MATH  Google Scholar 

  • Edwards, A. W. F. and L. L. Cavalli-Sforza. 1964. Reconstruction of evolutionary trees. In:Phenetic and Phylogenetic Classification, V. H. Heywood and J. McNeill (Eds). London: Systematics Association.

    Google Scholar 

  • Erickson, B. W. and P. H. Sellers. 1983. Recognition of patterns in genetic sequences. In:Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds). London: Addison-Wesley.

    Google Scholar 

  • Fager, E. W. 1972. Diversity: a sampling study.Am. Nat. 106, 293–310.

    Article  Google Scholar 

  • Feng, D. F., M. S. Johnson and R. F. Doolittle. 1985. Aligning amino acid sequences: comparison of commonly used methods.J. molec. Evol. 21, 112–125.

    Article  Google Scholar 

  • Feng, D. F. and R. F. Doolittle. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees.J molec. Evol. 25, 351–360.

    Google Scholar 

  • Fickett, J. W. 1984. Fast optimal alignment.Nucl. Acids Res. 12, 175–180.

    Google Scholar 

  • Fitch, W. M. 1971. Towards defining the course of evolution: minimum change for a specific tree topology.Syst. Zool. 20, 406–416.

    Article  Google Scholar 

  • Fitch, W. M. and T. Smith. 1983. Optimal sequence alignments.Proc. natn. Acad. Sci. U.S.A. 80, 1382–1386.

    Article  Google Scholar 

  • Fox, G. E. and C. R. Woese. 1975. 5S RNA secondary structure.Nature 256, 505–507.

    Article  Google Scholar 

  • Fredman, M. L. 1984. Algorithms for computing evolutionary similarity measures with length independent gap penalties.Bull. math. Biol. 46, 553–566.

    Article  MATH  MathSciNet  Google Scholar 

  • Gatlin, L. L. 1972.Information Theory and the Living System. New York: Columbia University Press.

    Google Scholar 

  • Gonzalez, R. C. and M. G. Thomason. 1978. Syntactic pattern recognition: an introduction. London: Addison-Wesley.

    MATH  Google Scholar 

  • Gordon, A. D. 1973. A sequence-comparison statistic and algorithm.Biometrika 60, 197–200.

    Article  MATH  MathSciNet  Google Scholar 

  • Gotoh, O. 1982. An improved algorithm for matching biological sequences,J. molec. Biol. 162, 705–708.

    Article  Google Scholar 

  • Gotoh, O. 1986. Alignment of three biological sequences with an efficient traceback procedure.J. theor. Biol. 121, 327–337.

    Article  MathSciNet  Google Scholar 

  • Gribskov, M., R. Lüthy and D. Eisenberg. 1990. Profile analysis.Methods Enzymol. 183, 146–159.

    Google Scholar 

  • Hartigan, J. A. 1973. Minimum mutation fits to a given tree.Biometrics 29, 53–65.

    Article  Google Scholar 

  • von Heijne, G. 1987.Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit. London: Academic Press.

    Google Scholar 

  • Hein, J. 1989a. A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given.Molec. Biol. Evol. 6, 649–668.

    Google Scholar 

  • Hein, J. 1989b. A tree reconstruction method that is economical in the number of pairwise comparisons used.Molec. Biol. Evol. 6, 669–684.

    Google Scholar 

  • Higgins, D. G. and P. M. Sharp. 1988. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer.Gene 73, 237–244.

    Article  Google Scholar 

  • Hogeweg, P. and B. Hesper. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method.J. molec. Evol. 20, 175–186.

    Article  Google Scholar 

  • Hori, H. and S. Osawa. 1979. Evolutionary change in 5SRNA secondary structure and a phylogenic tree of 54 5SRNA species.Proc. natn. Acad. Sci. USA 76, 381–385.

    Article  Google Scholar 

  • Johnson, M. S. and R. F. Doolittle. 1986. A method for the simultaneous alignment of three or more amino acid sequences.J. molec. Evol. 23, 267–278.

    Article  Google Scholar 

  • Jue, R. A., N. W. Woodbury and R. F. Doolittle. 1980. Sequence homologies amongE. coli ribosomal proteins: evidence for evolutionary related groupings and internal duplications.J. molec. Evol. 15, 129–148.

    Article  Google Scholar 

  • Karlin, S., G. Ghandour, F. Ost, S. Tavare and L. J. Korn. 1983. New approaches for computer analysis of nucleic acid sequences.Proc. natn. Acad. Sci. U.S.A. 80, 5660–5664.

    Article  MATH  Google Scholar 

  • Klotz, L. C. and R. L. Blanken. 1981. A practical method for calculating evolutionary trees from sequence data.J. theor. Biol. 91, 216–272.

    Article  Google Scholar 

  • Klotz, L. C., N. Komar, R. L. Blanken and R. M. Mitchell. 1979. Calculation of evolutionary trees from sequence data.Proc. natn. Acad. Sci. U.S.A. 76, 4516–4520.

    Article  Google Scholar 

  • Konings, D. A. M., P. Hogeweg and B. Hesper. 1987. Evolution of the primary and secondary structures of the E1a mRNAs of the Adenovirus.Molec. Biol. Evol. 4, 300–314.

    Google Scholar 

  • Krishnan, G., R. K. Kaul and P. Jagadeeswaran. 1986. DNA sequence analysis: a procedure to find homologies among many sequences.Nucl. Acids Res. 14, 543–550.

    Google Scholar 

  • Lathrop, R. H., T. A. Webster and T. F. Smith. 1987. ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition.Comm. ACM 30, 909–921.

    Article  MATH  Google Scholar 

  • Lipman, D. J., S. F. Altschul and J. D. Kececioglu. 1989. A tool for multiple sequence alignment.Proc. natn. Acad. Sci. U.S.A. 86, 4412–4415.

    Article  Google Scholar 

  • Martinez, H. M. 1983. An efficient method for finding repeats in molecular sequences.Nucl. Acids Res. 11, 4629–4634.

    Google Scholar 

  • Martinez, H. M. 1988. A flexible multiple sequence alignment program.Nucl. Acids Res. 16, 1683–1691.

    Google Scholar 

  • Miclet, L. 1986.Structural Methods in Pattern Recognition. Oxford, U.K.: North Oxford Academic.

    MATH  Google Scholar 

  • Miller, W. and E. W. Myers. 1988. Sequence comparison with concave weighting functions.Bull. math. Biol. 50, 97–120.

    Article  MATH  MathSciNet  Google Scholar 

  • Murata, M., J. S. Richardson and J. L. Sussman. 1985. Simultaneous comparison of three protein sequences.Proc. natn. Acad. Sci. U.S.A. 82, 3073–3077.

    Article  Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. molec. Biol. 48, 444–453.

    Article  Google Scholar 

  • Nei, M., F. Tajima and Y. Tateno. 1983. Accuracy of estimated phylogenetic trees from molecular data: II. gene frequency data.J. molec. Evol. 19, 153–170.

    Article  Google Scholar 

  • Patthy, L. 1987. Detecting homology of distantly related proteins with consensus sequences.J. molec. Biol. 198, 567–577.

    Article  Google Scholar 

  • Penny, D. 1976. Criteria for optimizing phylogenetic trees and the problem of determining the root of a tree.J. molec. Evol. 8, 95–116.

    Article  Google Scholar 

  • Reichert, T. A., D. N. Cohen and A. K. C. Wong. 1973. An application of information theory to genetic mutations and matching of polypeptide sequences.J. theor. Biol. 42, 245–261.

    Article  Google Scholar 

  • Rempe, U. 1987. Characterizing DNA variability by stochastic matrices. In:Classification and Related Methods of Data Analysis, H. H. Bock (Ed.). Amsterdam: Elsevier.

    Google Scholar 

  • Rulot, H. and E. Vidal. 1987. Modelling (sub)string-length based constraints through a grammatical inference method. In:Pattern Recognition Theory and Applications, NATO ASI Series, Vol. F30, P. A. Devijver and J. Kittler (Eds). New York: Springer-Verlag.

    Google Scholar 

  • Sankoff, D. 1972. Matching sequences under deletion-insertion constraints.Proc. natn. Acad. Sci. U.S.A. 68, 4–6.

    Article  MathSciNet  Google Scholar 

  • Sankoff, D. and P. Sellers. 1973. Shortcuts, diversions and maximal chains in partially ordered sets.Discrete Math. 4, 287–293.

    Article  MATH  MathSciNet  Google Scholar 

  • Sankoff, D., C. Morel and R. J. Cedergren. 1973. Evolution of 5S RNA and the nonrandomness of base replacement.Nature New Biol. 245, 232–234.

    Article  Google Scholar 

  • Sankoff, D. 1975. Minimum mutation trees of sequences.SIAM J. appl. Math. 78, 35–42.

    Article  MathSciNet  Google Scholar 

  • Sankoff, D., R. J. Cedergren and G. Lapalme. 1976. Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA.J. molec. Evol. 7, 133–149.

    Article  Google Scholar 

  • Sankoff, D., R. J. Cedergren and W. Mckay. 1982. A strategy for sequence phylogeny research.Nucl. Acids Res. 10, 421–431.

    Google Scholar 

  • Sankoff, D. and R. J. Cedergren. 1983. Simultaneous comparison of three or more sequences related by a tree. In:Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds). London: Addison-Wesley.

    Google Scholar 

  • Sankoff, D. and J. B. Kruskal (Eds). 1983.Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. London: Addison-Wesley.

    Google Scholar 

  • Sankoff, D., Y. Abel, R. J. Cedergren and M. W. Gray. 1987. Supercomputing for molecular cladistics. In:Classification and Related Methods of Data Analysis, H. H. Bock (Ed.). Amsterdam: Elsevier.

    Google Scholar 

  • Schneider, T. D., G. D. Stormo, L. Gold and A. Ehrenfeucht. 1986. Information content of binding sites on nucleotide sequences.J. molec. Biol. 188, 415–431.

    Article  Google Scholar 

  • Sellers, P. 1974a. An algorithm for the distance between two finite sequences.Comb. Theory 16, 253–258.

    Article  MATH  MathSciNet  Google Scholar 

  • Sellers, P. 1974b. On the theory and computation of evolutionary distances.SIAM J. appl. Math. 26, 787–793.

    Article  MATH  MathSciNet  Google Scholar 

  • Shannon, C. E. 1948. A mathematical theory of communication.Bell System Techn. J. 27, 379–432, 623–656.

    MathSciNet  MATH  Google Scholar 

  • Sneath, H. A. and R. R. Sokal. 1973.Numerical Taxonomy. San Francisco: W. H. Freeman.

    MATH  Google Scholar 

  • Sobel, E. and H. M. Martinez. 1986. A multiple sequence alignment program.Nucl. Acids Res. 14, 363–374.

    Google Scholar 

  • Subbiah, S. and S. C. Harrison. 1989. A method for multiple sequence alignment with gaps.J. molec. Biol. 209, 539–548.

    Article  Google Scholar 

  • Taylor, P. 1984. A fast homology program for aligning biological sequences.Nucl. Acids Res. 12, 447–455.

    Google Scholar 

  • Taylor, W. R. 1986a. The classification of amino acid conservation.J. theor. Biol. 119, 205–218.

    Article  Google Scholar 

  • Taylor, W. R. 1986b. Identification of protein sequence homology by consensus template alignment.J. molec. Biol. 188, 233–258.

    Article  Google Scholar 

  • Taylor, W. R. 1987. Multiple sequence alignment by a pairwise algorithm.CABIOS 3, 81–87.

    Google Scholar 

  • Taylor, W. R. 1988. A flexible method to align large numbers of biological sequences.J. molec. Evol. 28, 161–169.

    Article  Google Scholar 

  • Ukkonen, E. 1983. On approximate string matching. In:Proc. Int. Conf. Found. Comp. Theor.

  • Ukkonen, E. 1985. Algorithms for approximate string matching.Informat. Control 64, 100–118.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S., T. F. Smith and W. A. Beyer. 1976. Some biological sequence metrices.Adv. Math. 20, 367–387.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S. 1984a. General methods of sequence comparison.Bull. math. Biol. 46, 473–500.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S. 1984b. Efficient sequence alignment algorithms.J. theor. Biol. 108, 333–337.

    MathSciNet  Google Scholar 

  • Waterman, M. S., R. Arratia and D. J. Galas. 1984. Pattern recognition in several sequences: consensus and alignment.Bull. math. Biol. 46, 515–527.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S. and M. D. Perlwitz. 1984. Line geometries for sequence comparisons.Bull. math. Biol. 46, 567–577.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S. 1986. Multiple sequence alignment by consensus.Nucl. Acids Res. 14, 9095–9102.

    MathSciNet  Google Scholar 

  • Waterman, M. S. 1988. Computer analysis of nucleic acid sequences.Methods Enzymol. 164, 765–793.

    Article  Google Scholar 

  • Waterman, M. S. 1989. Consensus patterns in sequences. In:Mathematical Methods for DNA Sequences. Florida, U.S.A.: CRC Press.

    Google Scholar 

  • Waterman, M. S. and R. Jones. 1990. Consensus methods for DNA and protein sequence alignment.Methods Enzymol. 183, 221–237.

    Google Scholar 

  • Webster, T. A., R. H. Lathrop and T. F. Smith. 1987. Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system.Biochemistry 26, 6950–6957.

    Article  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman. 1983. Rapid similarity searches of nucleic acid and protein data banks.Proc. natn. Acad. Sci. USA 80, 726–730.

    Article  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman. 1984. The context dependent comparison of biological sequences.SIAM J. appl. Math. 44, 557–567.

    Article  MATH  MathSciNet  Google Scholar 

  • Wong, A. K. C., T. A. Reichert, D. N. Cohen and B. O. Aygun. 1974. A generalized method for matching informational macromolecular code sequences.Comput. Biol. Med. 4, 43–57.

    Article  Google Scholar 

  • Wong, A. K. C., T. S. Liu and C. C. Wang. 1976. Statistical analysis of residue variability in cytochromec.J. molec. Biol. 102, 287–295.

    Google Scholar 

  • Wong, A. K. C. and D. E. Ghahraman. 1980. Random graphs: structural-contextual dichotomy.IEEE Trans. Pattern Anal. Machine Intell. 2, 341–348.

    MATH  Google Scholar 

  • Wong, A. K. C. and M. You. 1985. Entropy and distance of random graphs with application to structural pattern recognition.IEEE Trans. Pattern Anal. Machine Intell. 7, 599–609.

    Article  MATH  Google Scholar 

  • Wong, A. K. C. 1987. Structural pattern recognition: a random graph approach. In:Pattern Recognition Theory and Applications, NATO ASI Series, Vol. F30, P. A. Devijver and J. Kittler (Eds). New York: Springer-Verlag.

    Google Scholar 

  • Wong, A. K. C., J. Constant and M. You. 1990. Random Graphs. In:Syntactic and Structural Pattern Recognition—Fundamentals, Advances, and Applications, H. Bunke and A. Sanfeliu (Eds). World Scientific Publishing Company Pte. Ltd.

  • You, M. 1983. A random graph approach to pattern recognition. Ph.D. Thesis, Department of Systems Design Engineering, University of Waterloo, Waterloo.

    Google Scholar 

  • You, M. and A. K. C. Wong. 1984.An algorithm for graph optimal isomorphism. Proc. 7th. Int. Conf. on Pattern Recog., pp. 316–319.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chan, S.C., Wong, A.K.C. & Chiu, D.K.Y. A survey of multiple sequence comparison methods. Bltn Mathcal Biology 54, 563–598 (1992). https://doi.org/10.1007/BF02459635

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02459635

Keywords

Navigation