Skip to main content
Log in

A multiple sequence comparison method

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

This article presents a new method for the comparison of multiple macromolecular sequences. It is based on a hierarchical sequence synthesis procedure that does not require anya priori knowledge of the molecular structure of the sequences or the phylogenetic relations among the sequences. It differs from the existing methods as it has the capability of: (i) generating a statistical-structural model of the sequences through a synthesis process that detects homologous groups of the sequences, and (ii) aligning the sequences while the taxonomic tree of the sequences is being constructed in one single phase. It produces superior results when compared with some existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Altschul, S. F. 1989. Gap costs for multiple sequence alignment,J. theor. Biol. 138, 297–309.

    MathSciNet  Google Scholar 

  • Altschul, S. F. and D. J. Lipman. 1989. Trees, stars, and multiple biological sequence alignment,SIAM J. appl. Math. 49, 197–209.

    Article  MATH  MathSciNet  Google Scholar 

  • Bacon, D. J. and W. F. Anderson. 1986. Multiple sequence alignment,J. molec. Biol. 191, 153–161.

    Article  Google Scholar 

  • Bains, W. 1986. MULTAN. A program to align multiple DNA sequences,Nucl. Acids Res. 14, 159–177.

    Google Scholar 

  • Barton, G. J. and M. J. E. Sternberg. 1987a. A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons,J. molec. Biol. 198, 327–337.

    Article  Google Scholar 

  • Barton, G. J. and M. J. E. Sternberg. 1987b. Evaluation and improvements in the automatic alignment of protein sequences,Protein Engng. 1, 89–94.

    Google Scholar 

  • Carrillo, H. and D. Lipman. 1988. The multiple sequence alignment problem in biology,SIAM J. appl. Math. 48, 1073–1082.

    Article  MATH  MathSciNet  Google Scholar 

  • Cavalli-Sforza, L. L. and W. F. Bodmer. 1971.The Genetics of Human Populations, pp. 704–706. Freeman, San Francisco.

    Google Scholar 

  • Chan, S. C. 1990. Random Graph and Sequence Synthesis, Ph.D. Thesis, University of Waterloo, Canada.

    Google Scholar 

  • Chan, S. C. and A. K. C. Wong. 1991. Synthesis and recognition of sequences,IEEE Trans. Pattern Anal. Machine Intell. 13, 1245–1255.

    Article  Google Scholar 

  • Chan, S. C., A. K. C. Wong and D. K. Y. Chiu. 1991. A survey of multiple sequence comparison methods,Bull. math. Biol., in press.

  • Chiu, D. K. Y. and T. Kolodziejezak. 1990. Inferencing consensus structure from nucleic acid sequences,CABIOS, in press.

  • Chiu, D. K. Y. and A. K. C. Wong. 1986. Synthesizing knowledge: a cluster analysis approach using event covering.IEEE Trans. Syst. Man. Cyber. 16, 251–259.

    Google Scholar 

  • Cohen, D. N., T. A. Reichert and A. K. C. Wong. 1975. Matching code sequences utilizing context free quality measures.Math. Biosci. 24, 25–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Corpet, F. 1988. Multiple sequence alignment with hierarchical clustering,Nucl. Acids Res. 16, 10881–10890.

    Google Scholar 

  • Dayhoff, M. O. 1978. A model of evolutionary change in proteins. Matrices for detecting distance relationships. InAtlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, M. O. Dayhoff (Ed.), Washington, DC: National Biomedical Research Foundation.

    Google Scholar 

  • Edwards, A. W. F. and L. L. Cavalli-Sforza. 1964. Reconstruction of evolutionary trees. InPhenetic and Phylogenetic Classification, V. H. Heywood and J. McNeill (Eds). London, UK: Systematics Association.

    Google Scholar 

  • Fager, E. W. 1972. Diversity: a sampling study,Am. Nat. 106, 293–310.

    Article  Google Scholar 

  • Feng, D. F. and R. F. Doolittle. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees,J. molec. Evol. 25, 351–360.

    Google Scholar 

  • Fitch, W. M. and T. Smith. 1983. Optimal sequence alignments.Proc. natn. Acad. Sci. USA 80, 1382–1386.

    Article  Google Scholar 

  • Fredman, M. L. 1984. Algorithms for computing evolutionary similarity measures with length independent gap penalties.Bull. math. Biol. 46, 553–566.

    MATH  MathSciNet  Google Scholar 

  • Gatlin, L. L. 1972.Information Theory and the Living System. New York: Columbia University Press.

    Google Scholar 

  • Gotoh, O. 1986. Alignment of three biological sequences with an efficient traceback procedure,J. theor. Biol. 121, 327–337.

    Article  MathSciNet  Google Scholar 

  • Gribskov, M., R. Lüthy and D. Eisenberg. 1990. Profile analysis.Methods Enzymol. 183, 146–159.

    Google Scholar 

  • Grosjean, H., R. J. Cedergren and W. Mckay. 1982. Structure in tRNA data,Biochimie 64, 387–397.

    Google Scholar 

  • Hein, J. 1989. A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given,Molec. biol. Evol. 6, 649–668.

    Google Scholar 

  • Higgins, D. G. and P. M. Sharp. 1988. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer.Gene 73, 237–244.

    Article  Google Scholar 

  • Hogeweg, P. and B. Hesper. 1984. The alignment of sets of sequences and the construction of phyletic trees: an integrated method.J. molec. Evol. 20, 175–186.

    Article  Google Scholar 

  • Hori, H. and S. Osawa. 1979. Evolutionary change in 5SRNA secondary structure and a phylogenic tree of 54 5SRNA species,Proc. natn. Acad. Sci. USA 76, 381–385.

    Article  Google Scholar 

  • Johnson, M. S. and R. F. Doolittle. 1986. A method for the simultaneous alignment of three or more amino acid sequences,J. molec. Evol. 23, 267–278.

    Article  Google Scholar 

  • Jue, R. A., N. W. Woodbury and R. F. Doolittle. 1980. Sequence homologies amongE. coli ribosomal proteins: evidence for evolutionary related groupings and internal duplications,J. molec. Evol. 15, 129–148.

    Article  Google Scholar 

  • Krishnan, G., R. K. Kaul and P. Jagadeeswaran. 1986. DNA sequence analysis: a procedure to find homologies among many sequences,Nucl. Acids. Res. 14, 543–550.

    Google Scholar 

  • Lathrop, R. H., T. A. Webster and T. F. Smith. 1987. ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition.Comm. ACM 30, 909–921.

    Article  MATH  Google Scholar 

  • Lewin, B. 1985.Genes. New York: John Wiley & Sons.

    Google Scholar 

  • Lipman, D. J., S. F. Altschul and J. D. Kececioglu. 1989. A tool for multiple sequence alignment,Proc. natn. Acad. Sci. USA 86, 4412–4415.

    Article  Google Scholar 

  • Martinez, H. M. 1988. A flexible multiple sequence alignment program.Nucl. Acids Res. 16, 1683–1691.

    Google Scholar 

  • Murata, M., J. S. Richardson and J. L. Sussman. 1985. Simultaneous comparison of three protein sequences.Proc. natn. Acad. Sci. USA 82, 3073–3077.

    Article  Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. molec. Biol. 48, 444–453.

    Article  Google Scholar 

  • Nei, M., F. Tajima and Y. Tateno. 1983. Accuracy of estimated phylogenetic trees from molecular data: II. gene frequency data,J. molec. Evol. 19, 153–170.

    Article  Google Scholar 

  • Patthy, L. 1987. Detecting homology of distantly related proteins with consensus sequences,J. molec. Biol. 198, 567–577.

    Article  Google Scholar 

  • Reichert, T. A., D. N. Cohen and A. K. C. Wong. 1973. An application of information theory to genetic mutations and matching of polypeptide sequences.J. theor. Biol. 42, 245–261.

    Article  Google Scholar 

  • Rempe, U. 1987. Characterizing DNA variability by stochastic matrices. InClassification and Related Methods of Data Analysis, H. H. Bock (Ed.), Amsterdam: Elsevier.

    Google Scholar 

  • Sankoff, D. 1975. Minimum mutation trees of sequences,SIAM J. appl. Math. 78, 35–42.

    Article  MathSciNet  Google Scholar 

  • Sankoff, D. and R. J. Cedergren. 1983. Simultaneous comparison of three or more sequences related by a tree. InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds). London: Addison-Wesley.

    Google Scholar 

  • Sankoff, D., R. J. Cedergren and G. Lapalme. 1976. Frequency of insertion-deletion, transversion, and transition in the evolution of 5S ribosomal RNA,J. molec. Evol. 7, 133–149.

    Article  Google Scholar 

  • Sankoff, D., R. J. Cedergren and W. Mckay. 1982. A strategy for sequence phylogeny research,Nucl. Acids Res. 10, 421–431.

    Google Scholar 

  • Sankoff, D., Y. Abel, R. J. Cedergren and M. W. Gray. 1987. Supercomputing for molecular cladistics. InClassification and Related Methods of Data Analysis, H. H. Bock (Ed.). Amsterdam. Elsevier.

    Google Scholar 

  • Schneider, T. D., G. D. Stormo, L. Gold and A. Ehrenfeucht. 1986. Information content of binding sites on nucleotide sequences.J. molec. Biol. 188, 415–431.

    Article  Google Scholar 

  • Shannon, C. E. 1948. A mathematical theory of communication.Bell System Techn. J. 27, 379–432, 623–656.

    MathSciNet  MATH  Google Scholar 

  • Sobel, E. and H. M. Martinez. 1986. A multiple sequence alignment program,Nucl. Acids Res. 14, 363–374.

    Google Scholar 

  • Subbiah, S. and S. C. Harrison. 1989. A method for multiple sequence alignment with gaps,J. molec. Biol. 209, 539–548.

    Article  Google Scholar 

  • Taylor, W. R. 1986a. The classification of amino acid conservation,J. theor. Biol. 119, 205–218.

    Article  Google Scholar 

  • Taylor, W. R. 1986b. Identification of protein sequence homology by consensus template alignment,J. molec. Biol. 188, 233–258.

    Article  Google Scholar 

  • Taylor, W. R. 1987. Multiple sequence alignment by a pairwise algorithm,CABIOS 3, 81–87.

    Google Scholar 

  • Taylor, W. R. 1988. A flexible method to align large numbers of biological sequences,J. molec. Evol. 28, 161–169.

    Article  Google Scholar 

  • Waterman, M. S. 1986. Multiple sequence alignment by consensus,Nucl. Acids Res. 14, 9095–9102.

    MathSciNet  Google Scholar 

  • Waterman, M. S. 1988. Computer analysis of nucleic acid sequences,Methods Enzymol. 164, 765–793.

    Article  Google Scholar 

  • Waterman, M. S. 1989. Consensus patterns in sequences. InMathematical Methods for DNA Sequences. Boca Raton, FL: CRC Press.

    Google Scholar 

  • Waterman, M. S. and R. Jones 1990. Consensus methods for DNA and protein sequence alignment,Methods Enzymol. 183, 221–237.

    Google Scholar 

  • Waterman, M. S. and M. D. Perlwitz. 1984. Line geometries for sequence comparisons.Bull. math. Biol. 46, 567–577.

    MATH  MathSciNet  Google Scholar 

  • Waterman, M. S., T. F. Smith and W. A. Beyer. 1976. Some biological sequence metrices,Adv. Math. 20, 367–387.

    Article  MATH  MathSciNet  Google Scholar 

  • Waterman, M. S., R. Arratia and D. J. Galas. 1984. Pattern recognition in several sequences: consensus and alignment,Bull. math. Biol. 46, 515–527.

    MATH  MathSciNet  Google Scholar 

  • Webster, T. A., R. H. Lathrop and T. F. Smith. 1987. Prediction of a common structural domain in aminoacyl-tRNA synthetases through use of a new pattern-directed inference system,Biochemistry 26, 6950–6957.

    Article  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman 1984. The context dependent comparison of biological sequences.SIAM J. appl. Math. 44, 557–567.

    Article  MATH  MathSciNet  Google Scholar 

  • Williams, W. T. and H. T. Clifford. 1971. On the comparison of two classifications of the same set of elements,Taxon,20, 519–522.

    Article  Google Scholar 

  • Wong, A. K. C. 1987. Structural pattern recognition: a random graph approach. InPattern Recognition Theory and Applications, NATO ASI Series, Vol. F30, P. A. Devijver and J. Kittler (Eds). New York: Springer-Verlag

    Google Scholar 

  • Wong, A. K. C. and M. You. 1985. Entropy and distance of random graphs with application to structural pattern recognition.IEEE Trans. Pattern. Anal. Machine Intell. 7, 599–609.

    Article  MATH  Google Scholar 

  • Wong, A. K. C., T. A. Reichert, D. N. Cohen and B. O. Aygun. 1974. A generalized method for matching informational macromolecular code sequences,Comput. Biol. Med. 4, 43–57.

    Article  Google Scholar 

  • Wong, A. K. C., T. S. Liu and C. C. Wang. 1976. Statistical analysis of residue variability in cytochromec, J. molec. Biol. 102, 287–295.

    Google Scholar 

  • Wong, A. K. C., J. Constant and M. You. 1990. Random graphs. InSyntactic and Structural Pattern Recognition—Fundamentals, Advances, and Applications, H. Bunke and A. Sanfeliu (Eds). Cleveland, OH: World Scientific Publishing Company.

    Google Scholar 

  • You, M. 1983. A random graph approach to pattern recognition. Ph.D. Thesis, University of Waterloo, Canada.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wong, A.K.C., Chan, S.C. & Chiu, D.K.Y. A multiple sequence comparison method. Bltn Mathcal Biology 55, 465–486 (1993). https://doi.org/10.1007/BF02460892

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02460892

Keywords

Navigation