Abstract
Major types of sequence similarity searching (often, and incorrectly, called ‘homology’ searching) are reviewed and examples of each are presented. The features and limitations of each type of program, and individual implementations of each type are discussed. Two pairs of sequences are used as examples to show how implementations of each type differ in their results and their presentation. Both local and global alignment programs are examined, and the programs reviewed run on many different types of computer architectures, from laboratory computers such as the IBM PC, minicomputers such as the VAX, to large mainframe computers such as DEC-10/20 series.
Similar content being viewed by others
Literature
Arratia, R. and M. S. Waterman 1985. “An Erdos-Renyi Law with Shifts.”Adv. Math. (in press).
Arratia, R. and M. S. Waterman. 1985. “An Extreme Value Theory for Sequence Matching.” Manuscript.
Beyers, T. H. and M. S. Waterman. 1984. “Determining All Optimal and Near-optimal Solutions when Solving Shortest Path Problems by Dynamic Programming.”Opl. Res. Q 32, 1381–1384.
Boswell, D. R. and A. D. MacLachlan. 1984. “Sequence Comparison by Exponentially Damped Alignment.”Nucl. Acids Res. 12, 457–464.
Brutlag, D., J. Clayton, P. Friedland and L. H. Kedes. 1982. “SEQ: A Nucleotide Sequence Analysis and Recombination System.”Nucl. Acids Res. 10, 279–294.
Collins, J. F. and A. W. F. Coulson. 1984. “Applications of Parallel Processing Algorithms for DNA Sequence Analysis.”Nucl. Acids Res. 12, 181–192.
Conrad, B. and D. Mount. 1982. “Microcomputer Programs for DNA Sequence Analysis.”Nucl. Acids Res. 10, 31–37.
Davison, D. and K. H. Thompson. 1984. “A Non-metric Sequence Alignment Program.”Bull. math. Biol. 46, 579–590.
—, C. H. Chapman, C. Wedeen and P. M. Bingham. 1985. “Genetic and Physical Studies of a Portion of theWhite Locus Participating in Transcriptional Regulation and in Synapsis Dependent Interactions inDrosophila Adult Tissues.”Genetics 110, 479–494.
Delaney, A. D. 1982. “A DNA Sequence Handling Program.”Nucl. Acids Res. 10, 61–67.
Dumas, J.-P. and J. Ninio. 1982. “Efficient Algorithms for Folding and Comparison.”Nucl. Acids Res. 10, 197–206.
Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. J. Sankoff and J. B. Kruskal (Editors), pp. 55–90. Reading, Massachusetts: Addison-Wesley.
Feller, W. 1968.Introduction to Probability Theory and Statistics, 3rd edn. New York: Wiley.
Fickett, J. 1984. “Fast Optimal Sequence Alignment.”Nucl. Acids Res. 12, 175–180.
Fitch, W. and T. Smith. 1983. “Optimal Sequence Alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1286.
Goad, W. and M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetry.”Nucl. Acids Res. 10, 247–263.
Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. molec. Biol. 162, 705–708.
Jungck, J. R. and R. M. Friedman. 1984. “Mathematical Tools for Molecular Genetics Data: An Annotated Bibliography.”Bull. math. Biol. 46, 699–744.
Kanehisa, M. 1984. “Use of Statistical Criteria for Screening Potential Homologies in Nucleic Acid Sequences.”Nucl. Acids Res. 12, 203–213.
Karlin, S., G. Ghandour, F. Ost, S. Tavare and L. J. Korn. 1983. “New Approaches for Computer Analysis of Nucleic Acid Sequences.”Proc. natn. Acad. Sci. U.S.A. 80, 5660–5664.
—— and D. E. Foulser and L. J. Korn. 1984. “Comparative Analysis of Human and Bovine Papovaviruses.”Mol. Biol. Evol. 1, 367–370.
—— and M. W. Wegman. 1977. “Computer Analysis of Nucleic Acid Regulatory Sequences.”Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405.
Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. Sankoff and J. B. Kruskal (Editors), pp. 1–40. Reading, Massachusetts: Addison-Wesley.
Lipman, D. J. and W. R. Pearson. 1985. “Rapid and Sensitive Protein Similarity Searches.”Science 227, 1435–1440.
—, W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic Acid Similarities.”Nucl. Acids Res. 12, 215–226.
Maizel, J. and R. Lenk. 1981. “Enhanced Graphic Analysis of Nucleic Acid and Protein Sequences.”Proc. natn. Acad. Sci. U.S.A. 78, 7665–7669.
Martinez, H. 1983. “An Efficient Method for Finding Repeats in Molecular Sequences.”Nucl. Acids. Res. 11, 4629–4634.
Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. molec. Biol. 48, 444–453.
Novotny, J. 1982. “Matrix Program to Analyze Primary Structure Homologies.”Nucl. Acids Res. 10, 127–131.
Nussinov, R. 1980. “Some Rules for Ordering of Nucleotides in the DNA.”Nucl. Acids Res. 8, 4545–4562.
Ohtsubo, H. and Ohtsubo, E. 1978. “Nucleotide Sequence of an Insertion Element, IS1.”Proc. natn. Acad. Sci. U.S.A. 75, 615–619.
—, K. Nyman, W. Dososzkiewicz and E. Ohtsubo. 1981. “Multiple Copies of Isoinsertion Sequences of IS1 inShigella dysenteriae Chromosome.”Nature 292, 640–643.
Pustell, J. and F. C. Kafatos. 1982. “A Convenient and Adaptable Package of DNA Sequence Analysis Programs for Microcomputers.”Nucl. Acids Res. 10, 51–60.
— and F. C. Kafatos. 1984. “A Convenient and Adaptable Package of Computer Programs for DNA and Protein Sequence Management. Analysis, and Homology Determination.”Nucl. Acids Res. 12, 643–655.
Queen, C. M. and L. J. Korn. 1984. “A Computer Sequence Analysis Package for the IBM Personal Computer.”Nucl. Acids Res. 12, 581–600.
—, M. N. Wegman and L. J. Korn. 1982. “Improvements to a Program for DNA Analysis: A Procedure to Find Homology among Many Sequences.”Nucl. Acids Res. 10, 449–456.
Sankoff, D. 1972. “Matching Sequences under Delection/Insertion Constraints.”Proc. natn. Acad. Sci. U.S.A. 69, 4–6.
— and R. J. Cedergren. 1983. “Simultaneous Comparison of Three or More Sequences Related by a Tree.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. J. Sankoff and J. B. Kruskall (Editors), pp. 253–264. Reading, Massachusetts: Addison-Wesley.
— and P. H. Sellers. 1973. “Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets.”Discrete Math. 4, 287–293.
Sellers, P. H. 1974a. “An Algorithm for the Distance Between Two Finite Sequences.”J. combinator. Theor. A16, 253–258.
—. 1974b. “On the Theory and Computation of Evolutionary Distances.”SIAM J. app. Math. 26, 787–793.
—. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Alg. 1, 359–373.
—. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.
Smith, T. F. and M. S. Waterman. 1981. “Identification of Common Molecular Subsequences.”J. molec. Biol. 147, 195–197.
—— and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.
—— and W. Fitch. 1981. “Comparative Biosequence Metrics.”J. molec. Evol. 18, 36–46.
Snyder, M., M. Hunkapiller, D. Yuen, D. Silver, J. Fistrom and N. Davidson. 1982. “Cuticle Protein Genes ofDrosophila: Structure, Organization and Evolution of Four Clustered Genes.”Cell 29, 1027–1040.
Stormo, G. D., T. D. Schneider, L. Gold and A. Ahernfeucht. 1982. “Use of the ‘Perception’ Algorithm to Distinguish Translation Initiation Sites inE. coli.”Nucl. Acids Res. 10, 2997–3011.
Taylor, P. 1984. “A Fast Homology Program for Aligning Biological Sequences.”Nucl. Acids Res. 12, 3365–3387.
Ukkonen, E. 1983. “On Approximate String Matching.”Proc. Int. Conference on the Foundations of Computer Theory. Lecture Notes in Computer Science, Vol. 158, pp. 487–496.
Ulam, S. M. 1972. InApplications of Number Theory to Numerical Analysis, S. K. Zaremba (Editor), pp. 1–3. New York: Academic Press.
Waterman, M. S. 1983. “Frequencies of Restriction Sites.”Nucl. Acids Res. 11, 8951–8956.
—. 1984a. “General Methods of Sequence Comparison.”Bull. math. Biol. 46, 473–500.
—. 1984b. “Sequence Alignment in the Neighborhood of the Optimum with General Application to Dynamic Programming.”Proc. natn. Acad. Sci. U.S.A. 80, 3213–3214.
—. 1984c. “Efficient Sequence Alignment Algorithms.”J. theor. Biol. 108, 333–337.
Waterman, M. S. 1985. “Probability Distributions for DNA Sequence Comparison.” Manuscript.
—, Arratia, R. and D. J. Galas. 1984. “Pattern Recognition in Several Sequences: Consensus and Alignment.”Bull. math. Biol. 46, 515–527.
Wilbur, W. J. and D. J. Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Databanks.”Proc. natn. Acad. Sci. U.S.A. 80, 726–730.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Davison, D. Sequence similarity (‘Homology’) searching for molecular biologists. Bltn Mathcal Biology 47, 437–474 (1985). https://doi.org/10.1007/BF02460006
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02460006