Skip to main content
Log in

Sequence similarity (‘Homology’) searching for molecular biologists

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Major types of sequence similarity searching (often, and incorrectly, called ‘homology’ searching) are reviewed and examples of each are presented. The features and limitations of each type of program, and individual implementations of each type are discussed. Two pairs of sequences are used as examples to show how implementations of each type differ in their results and their presentation. Both local and global alignment programs are examined, and the programs reviewed run on many different types of computer architectures, from laboratory computers such as the IBM PC, minicomputers such as the VAX, to large mainframe computers such as DEC-10/20 series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Arratia, R. and M. S. Waterman 1985. “An Erdos-Renyi Law with Shifts.”Adv. Math. (in press).

  • Arratia, R. and M. S. Waterman. 1985. “An Extreme Value Theory for Sequence Matching.” Manuscript.

  • Beyers, T. H. and M. S. Waterman. 1984. “Determining All Optimal and Near-optimal Solutions when Solving Shortest Path Problems by Dynamic Programming.”Opl. Res. Q 32, 1381–1384.

    Article  Google Scholar 

  • Boswell, D. R. and A. D. MacLachlan. 1984. “Sequence Comparison by Exponentially Damped Alignment.”Nucl. Acids Res. 12, 457–464.

    Google Scholar 

  • Brutlag, D., J. Clayton, P. Friedland and L. H. Kedes. 1982. “SEQ: A Nucleotide Sequence Analysis and Recombination System.”Nucl. Acids Res. 10, 279–294.

    Google Scholar 

  • Collins, J. F. and A. W. F. Coulson. 1984. “Applications of Parallel Processing Algorithms for DNA Sequence Analysis.”Nucl. Acids Res. 12, 181–192.

    Google Scholar 

  • Conrad, B. and D. Mount. 1982. “Microcomputer Programs for DNA Sequence Analysis.”Nucl. Acids Res. 10, 31–37.

    Google Scholar 

  • Davison, D. and K. H. Thompson. 1984. “A Non-metric Sequence Alignment Program.”Bull. math. Biol. 46, 579–590.

    Article  MathSciNet  Google Scholar 

  • —, C. H. Chapman, C. Wedeen and P. M. Bingham. 1985. “Genetic and Physical Studies of a Portion of theWhite Locus Participating in Transcriptional Regulation and in Synapsis Dependent Interactions inDrosophila Adult Tissues.”Genetics 110, 479–494.

    Google Scholar 

  • Delaney, A. D. 1982. “A DNA Sequence Handling Program.”Nucl. Acids Res. 10, 61–67.

    Google Scholar 

  • Dumas, J.-P. and J. Ninio. 1982. “Efficient Algorithms for Folding and Comparison.”Nucl. Acids Res. 10, 197–206.

    Google Scholar 

  • Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. J. Sankoff and J. B. Kruskal (Editors), pp. 55–90. Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • Feller, W. 1968.Introduction to Probability Theory and Statistics, 3rd edn. New York: Wiley.

    Google Scholar 

  • Fickett, J. 1984. “Fast Optimal Sequence Alignment.”Nucl. Acids Res. 12, 175–180.

    Google Scholar 

  • Fitch, W. and T. Smith. 1983. “Optimal Sequence Alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1286.

    Article  Google Scholar 

  • Goad, W. and M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetry.”Nucl. Acids Res. 10, 247–263.

    Google Scholar 

  • Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. molec. Biol. 162, 705–708.

    Article  Google Scholar 

  • Jungck, J. R. and R. M. Friedman. 1984. “Mathematical Tools for Molecular Genetics Data: An Annotated Bibliography.”Bull. math. Biol. 46, 699–744.

    Article  MATH  MathSciNet  Google Scholar 

  • Kanehisa, M. 1984. “Use of Statistical Criteria for Screening Potential Homologies in Nucleic Acid Sequences.”Nucl. Acids Res. 12, 203–213.

    Google Scholar 

  • Karlin, S., G. Ghandour, F. Ost, S. Tavare and L. J. Korn. 1983. “New Approaches for Computer Analysis of Nucleic Acid Sequences.”Proc. natn. Acad. Sci. U.S.A. 80, 5660–5664.

    Article  MATH  Google Scholar 

  • —— and D. E. Foulser and L. J. Korn. 1984. “Comparative Analysis of Human and Bovine Papovaviruses.”Mol. Biol. Evol. 1, 367–370.

    Google Scholar 

  • —— and M. W. Wegman. 1977. “Computer Analysis of Nucleic Acid Regulatory Sequences.”Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405.

    Article  Google Scholar 

  • Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. Sankoff and J. B. Kruskal (Editors), pp. 1–40. Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • Lipman, D. J. and W. R. Pearson. 1985. “Rapid and Sensitive Protein Similarity Searches.”Science 227, 1435–1440.

    Google Scholar 

  • —, W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic Acid Similarities.”Nucl. Acids Res. 12, 215–226.

    Google Scholar 

  • Maizel, J. and R. Lenk. 1981. “Enhanced Graphic Analysis of Nucleic Acid and Protein Sequences.”Proc. natn. Acad. Sci. U.S.A. 78, 7665–7669.

    Article  MathSciNet  Google Scholar 

  • Martinez, H. 1983. “An Efficient Method for Finding Repeats in Molecular Sequences.”Nucl. Acids. Res. 11, 4629–4634.

    Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. molec. Biol. 48, 444–453.

    Article  Google Scholar 

  • Novotny, J. 1982. “Matrix Program to Analyze Primary Structure Homologies.”Nucl. Acids Res. 10, 127–131.

    Google Scholar 

  • Nussinov, R. 1980. “Some Rules for Ordering of Nucleotides in the DNA.”Nucl. Acids Res. 8, 4545–4562.

    Google Scholar 

  • Ohtsubo, H. and Ohtsubo, E. 1978. “Nucleotide Sequence of an Insertion Element, IS1.”Proc. natn. Acad. Sci. U.S.A. 75, 615–619.

    Article  Google Scholar 

  • —, K. Nyman, W. Dososzkiewicz and E. Ohtsubo. 1981. “Multiple Copies of Isoinsertion Sequences of IS1 inShigella dysenteriae Chromosome.”Nature 292, 640–643.

    Article  Google Scholar 

  • Pustell, J. and F. C. Kafatos. 1982. “A Convenient and Adaptable Package of DNA Sequence Analysis Programs for Microcomputers.”Nucl. Acids Res. 10, 51–60.

    Google Scholar 

  • — and F. C. Kafatos. 1984. “A Convenient and Adaptable Package of Computer Programs for DNA and Protein Sequence Management. Analysis, and Homology Determination.”Nucl. Acids Res. 12, 643–655.

    Google Scholar 

  • Queen, C. M. and L. J. Korn. 1984. “A Computer Sequence Analysis Package for the IBM Personal Computer.”Nucl. Acids Res. 12, 581–600.

    Google Scholar 

  • —, M. N. Wegman and L. J. Korn. 1982. “Improvements to a Program for DNA Analysis: A Procedure to Find Homology among Many Sequences.”Nucl. Acids Res. 10, 449–456.

    Google Scholar 

  • Sankoff, D. 1972. “Matching Sequences under Delection/Insertion Constraints.”Proc. natn. Acad. Sci. U.S.A. 69, 4–6.

    Article  MATH  MathSciNet  Google Scholar 

  • — and R. J. Cedergren. 1983. “Simultaneous Comparison of Three or More Sequences Related by a Tree.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. J. Sankoff and J. B. Kruskall (Editors), pp. 253–264. Reading, Massachusetts: Addison-Wesley.

    Google Scholar 

  • — and P. H. Sellers. 1973. “Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets.”Discrete Math. 4, 287–293.

    Article  MATH  MathSciNet  Google Scholar 

  • Sellers, P. H. 1974a. “An Algorithm for the Distance Between Two Finite Sequences.”J. combinator. Theor. A16, 253–258.

    Article  MathSciNet  Google Scholar 

  • —. 1974b. “On the Theory and Computation of Evolutionary Distances.”SIAM J. app. Math. 26, 787–793.

    Article  MATH  MathSciNet  Google Scholar 

  • —. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Alg. 1, 359–373.

    MATH  MathSciNet  Google Scholar 

  • —. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.

    Article  MATH  MathSciNet  Google Scholar 

  • Smith, T. F. and M. S. Waterman. 1981. “Identification of Common Molecular Subsequences.”J. molec. Biol. 147, 195–197.

    Article  Google Scholar 

  • —— and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.

    Google Scholar 

  • —— and W. Fitch. 1981. “Comparative Biosequence Metrics.”J. molec. Evol. 18, 36–46.

    Article  Google Scholar 

  • Snyder, M., M. Hunkapiller, D. Yuen, D. Silver, J. Fistrom and N. Davidson. 1982. “Cuticle Protein Genes ofDrosophila: Structure, Organization and Evolution of Four Clustered Genes.”Cell 29, 1027–1040.

    Article  Google Scholar 

  • Stormo, G. D., T. D. Schneider, L. Gold and A. Ahernfeucht. 1982. “Use of the ‘Perception’ Algorithm to Distinguish Translation Initiation Sites inE. coli.”Nucl. Acids Res. 10, 2997–3011.

    Google Scholar 

  • Taylor, P. 1984. “A Fast Homology Program for Aligning Biological Sequences.”Nucl. Acids Res. 12, 3365–3387.

    Google Scholar 

  • Ukkonen, E. 1983. “On Approximate String Matching.”Proc. Int. Conference on the Foundations of Computer Theory. Lecture Notes in Computer Science, Vol. 158, pp. 487–496.

    MATH  MathSciNet  Google Scholar 

  • Ulam, S. M. 1972. InApplications of Number Theory to Numerical Analysis, S. K. Zaremba (Editor), pp. 1–3. New York: Academic Press.

    Google Scholar 

  • Waterman, M. S. 1983. “Frequencies of Restriction Sites.”Nucl. Acids Res. 11, 8951–8956.

    Google Scholar 

  • —. 1984a. “General Methods of Sequence Comparison.”Bull. math. Biol. 46, 473–500.

    Article  MATH  MathSciNet  Google Scholar 

  • —. 1984b. “Sequence Alignment in the Neighborhood of the Optimum with General Application to Dynamic Programming.”Proc. natn. Acad. Sci. U.S.A. 80, 3213–3214.

    Google Scholar 

  • —. 1984c. “Efficient Sequence Alignment Algorithms.”J. theor. Biol. 108, 333–337.

    MathSciNet  Google Scholar 

  • Waterman, M. S. 1985. “Probability Distributions for DNA Sequence Comparison.” Manuscript.

  • —, Arratia, R. and D. J. Galas. 1984. “Pattern Recognition in Several Sequences: Consensus and Alignment.”Bull. math. Biol. 46, 515–527.

    Article  MATH  MathSciNet  Google Scholar 

  • Wilbur, W. J. and D. J. Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Databanks.”Proc. natn. Acad. Sci. U.S.A. 80, 726–730.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davison, D. Sequence similarity (‘Homology’) searching for molecular biologists. Bltn Mathcal Biology 47, 437–474 (1985). https://doi.org/10.1007/BF02460006

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02460006

Keywords

Navigation