Skip to main content
Log in

A nonlinear measure of subalignment similarity and its significance levels

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

A new measure of subalignment similarity is introduced. Specifically, similaritys(l,c) is defined as the logarithm to the basep of the probability of findingc or fewer mismatches in a subalignment of lengthl, wherep is the probability of a match. Previous algorithms can not use this measure to find locally optimal subalignments because, unlike Needleman-Wunsch and Sellers similarities, this measure is nonlinear. A new pattern recognition algorithm is described for finding all locally optimal subalignments of two nucleotide sequences. The DD algorithm can uses(l, c) or any other reasonable similarity function to assess the relative interest of subalignments. The DD algorithm searches only the diagonal graph, which lacks insertions and deletions. This search strategy greatly decreases the computation time and does not require an arbitrary choice of gap cost. The paths of the resulting DD graph usually draw attention to likely locations for insertions and deletions. A heuristic formula is derived for estimating significance levels fors(l, c) in the context of the lengths of the two aligned sequences. The DD algorithm has been used to find interesting subalignments between the nucleotide sequences for human and murine interleukin 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Altschul, S. F. and B. W. Erickson. 1985. “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation that Preserves Dinucleotide and Codon Usage.”Molec. Biol. Evol. 2, 526–538.

    Google Scholar 

  • — and —. 1986a. “Optimal Sequence Alignment Using Affine Gap Costs.”Bull. math. Biol. 48, 603–616.

    Article  MATH  MathSciNet  Google Scholar 

  • — and —. 1986b. “Locally Optimal Subalignments Using Nonlinear Similarity Functions.”Bull. math. Biol. 48, 633–660.

    Article  MATH  MathSciNet  Google Scholar 

  • Arratia, R. and M. S. Waterman. 1985. “Critical Phenomena in Sequence Matching.”Ann. Prob. 13, 1236–1249.

    MATH  MathSciNet  Google Scholar 

  • —, L. Gordon and M. S. Waterman. 1986. “An Extreme Value Theory for Sequence Matching.”Ann. Stat. 14, 971–993.

    MATH  MathSciNet  Google Scholar 

  • Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 55–91. Reading, MA: Addison-Wesley.

    Google Scholar 

  • —, L. T. May and P. B. Sehgal. 1984. “Internal Duplication in Human Alpha-1 and Beta-1 Interferon.”Proc. natn. Acad. Sci. U.S.A. 81, 7171–7175.

    Article  Google Scholar 

  • Fitch, W. M. 1983. “Calculating the Expected Frequencies of Potential Secondary Structure in Nucleic Acids as a Function of Stem Length, Loop Size, Base Composition and Nearest-neighbor Frequencies.”Nucl. Acids Res. 11, 4655–4663.

    Google Scholar 

  • Fujita, T., C. Takaoka, H. Matsui and T. Taniguchi. 1983. “Structure of the Human Interleukin 2 Gene.”Proc. natn. Acad. Sci. U.S.A. 80, 7437–7441.

    Article  Google Scholar 

  • Gordon, L., M. F. Schilling and M. S. Waterman. 1986. “An Extreme Value Theory for Long Head Runs.”Prob. theor. Rel. 72, 279–287.

    Article  MATH  MathSciNet  Google Scholar 

  • Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. molec. Biol. 162, 705–708.

    Article  Google Scholar 

  • Gumbel, E. J. 1962. “Statistical Theory of Extreme Values (Main Results)”. InContributions to Order Statistics, A. E. Sarhan and B. G. Greenberg (Eds), pp. 56–93. New York: Wiley.

    Google Scholar 

  • Karlin, S. and G. Ghandour. 1985. “Comparative Statistics for DNA and Protein Sequences: Single Sequence Analysis.”Proc. natn. Acad. Sci. U.S.A. 82, 5800–5804.

    Article  Google Scholar 

  • Kruskal, J. B. and D. Sankoff. 1983. “An Anthology of Algorithms and Concepts for Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 265–310. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Lewis, R. V. and B. W. Erickson. 1986. “Evolution of Proenkephalin and Prodynorphin.”Am. Zool., in press.

  • Lipman, D. J., W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic-acid Similarities.”Nucl. Acids Res. 12, 215–226.

    Google Scholar 

  • Litman, G. W., L. Berger, K. Murphy, R. Litman, K. Hinds, C. L. Jahn and B. W. Erickson. 1983. “Complete Nucleotide Sequence of an Immunoglobulin VH Gene Homologue fromCaiman, a Phylogenetically Ancient Reptile.”Nature 303, 349–352.

    Article  Google Scholar 

  • ————, F. Podlaski, K. Hinds, C. L. Jahn, G. Dingerkus and B. W. Erickson. 1984. “Phylogenetic Diversification of ImmunoglobulinV H Genes.”Dev. Comp. Immunol. 8, 499–514.

    Article  Google Scholar 

  • —, K. Murphy, L. Berger, R. Litman, K. Hinds and B. W. Erickson. 1985a. “Complete Nucleotide Sequence of Three VH Genes inCaiman, a Phylogenetically Ancient Reptile: Evolutionary Diversification in Coding Segments and Variation in the Structure and Organization of Recombination Elements.”Proc natn. Acad. Sci. U.S.A. 82, 844–848.

    Article  Google Scholar 

  • ————— and — 1985b. “Immunoglobulin VH Gene Structure and Diversity inHeterodontus, a Phylogenetically Primitive Shark.”Proc. natn. Acad. Sci. U.S.A. 82, 2082–2086.

    Article  Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. molec. Biol. 48, 443–453.

    Article  Google Scholar 

  • Sellers, P. H. 1974. “On the Theory and Computation of Evolutionary Distances.”SIAM J. appl. Math. 26, 787–793.

    Article  MATH  MathSciNet  Google Scholar 

  • —. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Algorithms 1, 359–373.

    Article  MATH  MathSciNet  Google Scholar 

  • — 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.

    Article  MATH  MathSciNet  Google Scholar 

  • Shaw, M. W., R. A. Lamb, D. J. Briedis, B. W. Erickson and P. W. Choppin. 1982. “Complete Nucleotide Sequence of the Neuraminidase Gene of Influenza B Virus.”Proc. natn. Acad. Sci. U.S.A. 79, 6817–6821.

    Article  Google Scholar 

  • Smith, T. F. and M. S. Waterman. 1981. “Comparison of Biosequences.”Adv. appl. Math. 2, 482–489.

    Article  MATH  MathSciNet  Google Scholar 

  • —— and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.

    Google Scholar 

  • —— and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucl. Acids Res. 11, 2205–2220.

    Google Scholar 

  • Swartz, M. N., T. A. Trautner and A. Kornberg. 1962. “Enzymatic Synthesis of Deoxyribonucleic Acid. XI. Further Studies on Nearest Neighbor Base Sequences in Deoxyribonucleic Acids.”J. biol. Chem. 237, 1961–1967.

    Google Scholar 

  • Taniguchi, T., H. Matsui, T. Fujita, C. Takaoka, N. Kashima, R. Yoshimoto and J. Hamuro. 1983. “Structure and Expression of a Cloned cDNA for Human Interleukin-2.”Nature 302, 305–310.

    Article  Google Scholar 

  • Waterman, M. S. 1984. “Efficient Sequence Alignment Algorithms.”J. theor. Biol. 108, 333–337.

    MathSciNet  Google Scholar 

  • —, T. F. Smith and W. A. Beyer. 1976. “Some Biological Sequence Metrics.”Adv. Math. 20, 367–387.

    Article  MATH  MathSciNet  Google Scholar 

  • Yokota, T., N. Arai, F. Lee, D. Rennick, T. Mosmann and K. Arai. 1985. “Use of a cDNA Expression Vector for Isolation of Mouse Interleukin 2 cDNA Clones: Expression of T-cell Growth-factor Activity After Transfection of Monkey Cells.”Proc. natn. Acad. Sci U.S.A. 82, 68–72.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altschul, S.F., Erickson, B.W. A nonlinear measure of subalignment similarity and its significance levels. Bltn Mathcal Biology 48, 617–632 (1986). https://doi.org/10.1007/BF02462327

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02462327

Keywords

Navigation