Sequence similarity (‘Homology’) searching for molecular biologists

Davison, Dan

doi:10.1007/BF02460006

Sequence similarity (‘Homology’) searching for molecular biologists

Published: July 1985

Volume 47, pages 437–474, (1985)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Dan Davison¹^nAff2

74 Accesses
9 Citations
Explore all metrics

Abstract

Major types of sequence similarity searching (often, and incorrectly, called ‘homology’ searching) are reviewed and examples of each are presented. The features and limitations of each type of program, and individual implementations of each type are discussed. Two pairs of sequences are used as examples to show how implementations of each type differ in their results and their presentation. Both local and global alignment programs are examined, and the programs reviewed run on many different types of computer architectures, from laboratory computers such as the IBM PC, minicomputers such as the VAX, to large mainframe computers such as DEC-10/20 series.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature

Arratia, R. and M. S. Waterman 1985. “An Erdos-Renyi Law with Shifts.”Adv. Math. (in press).
Arratia, R. and M. S. Waterman. 1985. “An Extreme Value Theory for Sequence Matching.” Manuscript.
Beyers, T. H. and M. S. Waterman. 1984. “Determining All Optimal and Near-optimal Solutions when Solving Shortest Path Problems by Dynamic Programming.”Opl. Res. Q 32, 1381–1384.
Article Google Scholar
Boswell, D. R. and A. D. MacLachlan. 1984. “Sequence Comparison by Exponentially Damped Alignment.”Nucl. Acids Res. 12, 457–464.
Google Scholar
Brutlag, D., J. Clayton, P. Friedland and L. H. Kedes. 1982. “SEQ: A Nucleotide Sequence Analysis and Recombination System.”Nucl. Acids Res. 10, 279–294.
Google Scholar
Collins, J. F. and A. W. F. Coulson. 1984. “Applications of Parallel Processing Algorithms for DNA Sequence Analysis.”Nucl. Acids Res. 12, 181–192.
Google Scholar
Conrad, B. and D. Mount. 1982. “Microcomputer Programs for DNA Sequence Analysis.”Nucl. Acids Res. 10, 31–37.
Google Scholar
Davison, D. and K. H. Thompson. 1984. “A Non-metric Sequence Alignment Program.”Bull. math. Biol. 46, 579–590.
Article MathSciNet Google Scholar
—, C. H. Chapman, C. Wedeen and P. M. Bingham. 1985. “Genetic and Physical Studies of a Portion of theWhite Locus Participating in Transcriptional Regulation and in Synapsis Dependent Interactions inDrosophila Adult Tissues.”Genetics 110, 479–494.
Google Scholar
Delaney, A. D. 1982. “A DNA Sequence Handling Program.”Nucl. Acids Res. 10, 61–67.
Google Scholar
Dumas, J.-P. and J. Ninio. 1982. “Efficient Algorithms for Folding and Comparison.”Nucl. Acids Res. 10, 197–206.
Google Scholar
Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. J. Sankoff and J. B. Kruskal (Editors), pp. 55–90. Reading, Massachusetts: Addison-Wesley.
Google Scholar
Feller, W. 1968.Introduction to Probability Theory and Statistics, 3rd edn. New York: Wiley.
Google Scholar
Fickett, J. 1984. “Fast Optimal Sequence Alignment.”Nucl. Acids Res. 12, 175–180.
Google Scholar
Fitch, W. and T. Smith. 1983. “Optimal Sequence Alignments.”Proc. natn. Acad. Sci. U.S.A. 80, 1382–1286.
Article Google Scholar
Goad, W. and M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A General Method for Finding Local Homologies and Symmetry.”Nucl. Acids Res. 10, 247–263.
Google Scholar
Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. molec. Biol. 162, 705–708.
Article Google Scholar
Jungck, J. R. and R. M. Friedman. 1984. “Mathematical Tools for Molecular Genetics Data: An Annotated Bibliography.”Bull. math. Biol. 46, 699–744.
Article MATH MathSciNet Google Scholar
Kanehisa, M. 1984. “Use of Statistical Criteria for Screening Potential Homologies in Nucleic Acid Sequences.”Nucl. Acids Res. 12, 203–213.
Google Scholar
Karlin, S., G. Ghandour, F. Ost, S. Tavare and L. J. Korn. 1983. “New Approaches for Computer Analysis of Nucleic Acid Sequences.”Proc. natn. Acad. Sci. U.S.A. 80, 5660–5664.
Article MATH Google Scholar
—— and D. E. Foulser and L. J. Korn. 1984. “Comparative Analysis of Human and Bovine Papovaviruses.”Mol. Biol. Evol. 1, 367–370.
Google Scholar
—— and M. W. Wegman. 1977. “Computer Analysis of Nucleic Acid Regulatory Sequences.”Proc. natn. Acad. Sci. U.S.A. 74, 4401–4405.
Article Google Scholar
Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. Sankoff and J. B. Kruskal (Editors), pp. 1–40. Reading, Massachusetts: Addison-Wesley.
Google Scholar
Lipman, D. J. and W. R. Pearson. 1985. “Rapid and Sensitive Protein Similarity Searches.”Science 227, 1435–1440.
Google Scholar
—, W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic Acid Similarities.”Nucl. Acids Res. 12, 215–226.
Google Scholar
Maizel, J. and R. Lenk. 1981. “Enhanced Graphic Analysis of Nucleic Acid and Protein Sequences.”Proc. natn. Acad. Sci. U.S.A. 78, 7665–7669.
Article MathSciNet Google Scholar
Martinez, H. 1983. “An Efficient Method for Finding Repeats in Molecular Sequences.”Nucl. Acids. Res. 11, 4629–4634.
Google Scholar
Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. molec. Biol. 48, 444–453.
Article Google Scholar
Novotny, J. 1982. “Matrix Program to Analyze Primary Structure Homologies.”Nucl. Acids Res. 10, 127–131.
Google Scholar
Nussinov, R. 1980. “Some Rules for Ordering of Nucleotides in the DNA.”Nucl. Acids Res. 8, 4545–4562.
Google Scholar
Ohtsubo, H. and Ohtsubo, E. 1978. “Nucleotide Sequence of an Insertion Element, IS1.”Proc. natn. Acad. Sci. U.S.A. 75, 615–619.
Article Google Scholar
—, K. Nyman, W. Dososzkiewicz and E. Ohtsubo. 1981. “Multiple Copies of Isoinsertion Sequences of IS1 inShigella dysenteriae Chromosome.”Nature 292, 640–643.
Article Google Scholar
Pustell, J. and F. C. Kafatos. 1982. “A Convenient and Adaptable Package of DNA Sequence Analysis Programs for Microcomputers.”Nucl. Acids Res. 10, 51–60.
Google Scholar
— and F. C. Kafatos. 1984. “A Convenient and Adaptable Package of Computer Programs for DNA and Protein Sequence Management. Analysis, and Homology Determination.”Nucl. Acids Res. 12, 643–655.
Google Scholar
Queen, C. M. and L. J. Korn. 1984. “A Computer Sequence Analysis Package for the IBM Personal Computer.”Nucl. Acids Res. 12, 581–600.
Google Scholar
—, M. N. Wegman and L. J. Korn. 1982. “Improvements to a Program for DNA Analysis: A Procedure to Find Homology among Many Sequences.”Nucl. Acids Res. 10, 449–456.
Google Scholar
Sankoff, D. 1972. “Matching Sequences under Delection/Insertion Constraints.”Proc. natn. Acad. Sci. U.S.A. 69, 4–6.
Article MATH MathSciNet Google Scholar
— and R. J. Cedergren. 1983. “Simultaneous Comparison of Three or More Sequences Related by a Tree.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of String Comparison, D. J. Sankoff and J. B. Kruskall (Editors), pp. 253–264. Reading, Massachusetts: Addison-Wesley.
Google Scholar
— and P. H. Sellers. 1973. “Shortcuts, Diversions, and Maximal Chains in Partially Ordered Sets.”Discrete Math. 4, 287–293.
Article MATH MathSciNet Google Scholar
Sellers, P. H. 1974a. “An Algorithm for the Distance Between Two Finite Sequences.”J. combinator. Theor. A16, 253–258.
Article MathSciNet Google Scholar
—. 1974b. “On the Theory and Computation of Evolutionary Distances.”SIAM J. app. Math. 26, 787–793.
Article MATH MathSciNet Google Scholar
—. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Alg. 1, 359–373.
MATH MathSciNet Google Scholar
—. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.
Article MATH MathSciNet Google Scholar
Smith, T. F. and M. S. Waterman. 1981. “Identification of Common Molecular Subsequences.”J. molec. Biol. 147, 195–197.
Article Google Scholar
—— and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.
Google Scholar
—— and W. Fitch. 1981. “Comparative Biosequence Metrics.”J. molec. Evol. 18, 36–46.
Article Google Scholar
Snyder, M., M. Hunkapiller, D. Yuen, D. Silver, J. Fistrom and N. Davidson. 1982. “Cuticle Protein Genes ofDrosophila: Structure, Organization and Evolution of Four Clustered Genes.”Cell 29, 1027–1040.
Article Google Scholar
Stormo, G. D., T. D. Schneider, L. Gold and A. Ahernfeucht. 1982. “Use of the ‘Perception’ Algorithm to Distinguish Translation Initiation Sites inE. coli.”Nucl. Acids Res. 10, 2997–3011.
Google Scholar
Taylor, P. 1984. “A Fast Homology Program for Aligning Biological Sequences.”Nucl. Acids Res. 12, 3365–3387.
Google Scholar
Ukkonen, E. 1983. “On Approximate String Matching.”Proc. Int. Conference on the Foundations of Computer Theory. Lecture Notes in Computer Science, Vol. 158, pp. 487–496.
MATH MathSciNet Google Scholar
Ulam, S. M. 1972. InApplications of Number Theory to Numerical Analysis, S. K. Zaremba (Editor), pp. 1–3. New York: Academic Press.
Google Scholar
Waterman, M. S. 1983. “Frequencies of Restriction Sites.”Nucl. Acids Res. 11, 8951–8956.
Google Scholar
—. 1984a. “General Methods of Sequence Comparison.”Bull. math. Biol. 46, 473–500.
Article MATH MathSciNet Google Scholar
—. 1984b. “Sequence Alignment in the Neighborhood of the Optimum with General Application to Dynamic Programming.”Proc. natn. Acad. Sci. U.S.A. 80, 3213–3214.
Google Scholar
—. 1984c. “Efficient Sequence Alignment Algorithms.”J. theor. Biol. 108, 333–337.
MathSciNet Google Scholar
Waterman, M. S. 1985. “Probability Distributions for DNA Sequence Comparison.” Manuscript.
—, Arratia, R. and D. J. Galas. 1984. “Pattern Recognition in Several Sequences: Consensus and Alignment.”Bull. math. Biol. 46, 515–527.
Article MATH MathSciNet Google Scholar
Wilbur, W. J. and D. J. Lipman. 1983. “Rapid Similarity Searches of Nucleic Acid and Protein Databanks.”Proc. natn. Acad. Sci. U.S.A. 80, 726–730.
Article Google Scholar

Download references

Author information

Dan Davison
Present address: Department of Biochemical and Biophysical Sciences, Science and Research No. 1, University of Houston, University Park, 77004, Houston, TX, U.S.A.

Authors and Affiliations

Graduate Program in Genetics, State University of New York at Stony Brook, 11794, Stony Brook, NY, U.S.A.
Dan Davison

Authors

Dan Davison
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Davison, D. Sequence similarity (‘Homology’) searching for molecular biologists. Bltn Mathcal Biology 47, 437–474 (1985). https://doi.org/10.1007/BF02460006

Download citation

Received: 06 June 1985
Issue Date: July 1985
DOI: https://doi.org/10.1007/BF02460006

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sequence similarity (‘Homology’) searching for molecular biologists

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

Siamese Neural Networks: An Overview

Classical Molecular Dynamics in a Nutshell

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequence similarity (‘Homology’) searching for molecular biologists

Abstract

Access this article

Similar content being viewed by others

Longest Common Substring with Approximately k Mismatches

Siamese Neural Networks: An Overview

Classical Molecular Dynamics in a Nutshell

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation