A nonlinear measure of subalignment similarity and its significance levels

Altschul, Stephen F.; Erickson, Bruce W.

doi:10.1007/BF02462327

A nonlinear measure of subalignment similarity and its significance levels

Published: September 1986

Volume 48, pages 617–632, (1986)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Stephen F. Altschul^1,2 &
Bruce W. Erickson¹

69 Accesses
45 Citations
Explore all metrics

Abstract

A new measure of subalignment similarity is introduced. Specifically, similaritys(l,c) is defined as the logarithm to the basep of the probability of findingc or fewer mismatches in a subalignment of lengthl, wherep is the probability of a match. Previous algorithms can not use this measure to find locally optimal subalignments because, unlike Needleman-Wunsch and Sellers similarities, this measure is nonlinear. A new pattern recognition algorithm is described for finding all locally optimal subalignments of two nucleotide sequences. The DD algorithm can uses(l, c) or any other reasonable similarity function to assess the relative interest of subalignments. The DD algorithm searches only the diagonal graph, which lacks insertions and deletions. This search strategy greatly decreases the computation time and does not require an arbitrary choice of gap cost. The paths of the resulting DD graph usually draw attention to likely locations for insertions and deletions. A heuristic formula is derived for estimating significance levels fors(l, c) in the context of the lengths of the two aligned sequences. The DD algorithm has been used to find interesting subalignments between the nucleotide sequences for human and murine interleukin 2.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature

Altschul, S. F. and B. W. Erickson. 1985. “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation that Preserves Dinucleotide and Codon Usage.”Molec. Biol. Evol. 2, 526–538.
Google Scholar
— and —. 1986a. “Optimal Sequence Alignment Using Affine Gap Costs.”Bull. math. Biol. 48, 603–616.
Article MATH MathSciNet Google Scholar
— and —. 1986b. “Locally Optimal Subalignments Using Nonlinear Similarity Functions.”Bull. math. Biol. 48, 633–660.
Article MATH MathSciNet Google Scholar
Arratia, R. and M. S. Waterman. 1985. “Critical Phenomena in Sequence Matching.”Ann. Prob. 13, 1236–1249.
MATH MathSciNet Google Scholar
—, L. Gordon and M. S. Waterman. 1986. “An Extreme Value Theory for Sequence Matching.”Ann. Stat. 14, 971–993.
MATH MathSciNet Google Scholar
Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 55–91. Reading, MA: Addison-Wesley.
Google Scholar
—, L. T. May and P. B. Sehgal. 1984. “Internal Duplication in Human Alpha-1 and Beta-1 Interferon.”Proc. natn. Acad. Sci. U.S.A. 81, 7171–7175.
Article Google Scholar
Fitch, W. M. 1983. “Calculating the Expected Frequencies of Potential Secondary Structure in Nucleic Acids as a Function of Stem Length, Loop Size, Base Composition and Nearest-neighbor Frequencies.”Nucl. Acids Res. 11, 4655–4663.
Google Scholar
Fujita, T., C. Takaoka, H. Matsui and T. Taniguchi. 1983. “Structure of the Human Interleukin 2 Gene.”Proc. natn. Acad. Sci. U.S.A. 80, 7437–7441.
Article Google Scholar
Gordon, L., M. F. Schilling and M. S. Waterman. 1986. “An Extreme Value Theory for Long Head Runs.”Prob. theor. Rel. 72, 279–287.
Article MATH MathSciNet Google Scholar
Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. molec. Biol. 162, 705–708.
Article Google Scholar
Gumbel, E. J. 1962. “Statistical Theory of Extreme Values (Main Results)”. InContributions to Order Statistics, A. E. Sarhan and B. G. Greenberg (Eds), pp. 56–93. New York: Wiley.
Google Scholar
Karlin, S. and G. Ghandour. 1985. “Comparative Statistics for DNA and Protein Sequences: Single Sequence Analysis.”Proc. natn. Acad. Sci. U.S.A. 82, 5800–5804.
Article Google Scholar
Kruskal, J. B. and D. Sankoff. 1983. “An Anthology of Algorithms and Concepts for Sequence Comparison.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 265–310. Reading, MA: Addison-Wesley.
Google Scholar
Lewis, R. V. and B. W. Erickson. 1986. “Evolution of Proenkephalin and Prodynorphin.”Am. Zool., in press.
Lipman, D. J., W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic-acid Similarities.”Nucl. Acids Res. 12, 215–226.
Google Scholar
Litman, G. W., L. Berger, K. Murphy, R. Litman, K. Hinds, C. L. Jahn and B. W. Erickson. 1983. “Complete Nucleotide Sequence of an Immunoglobulin VH Gene Homologue fromCaiman, a Phylogenetically Ancient Reptile.”Nature 303, 349–352.
Article Google Scholar
————, F. Podlaski, K. Hinds, C. L. Jahn, G. Dingerkus and B. W. Erickson. 1984. “Phylogenetic Diversification of ImmunoglobulinV _H Genes.”Dev. Comp. Immunol. 8, 499–514.
Article Google Scholar
—, K. Murphy, L. Berger, R. Litman, K. Hinds and B. W. Erickson. 1985a. “Complete Nucleotide Sequence of Three VH Genes inCaiman, a Phylogenetically Ancient Reptile: Evolutionary Diversification in Coding Segments and Variation in the Structure and Organization of Recombination Elements.”Proc natn. Acad. Sci. U.S.A. 82, 844–848.
Article Google Scholar
————— and — 1985b. “Immunoglobulin VH Gene Structure and Diversity inHeterodontus, a Phylogenetically Primitive Shark.”Proc. natn. Acad. Sci. U.S.A. 82, 2082–2086.
Article Google Scholar
Needleman, S. B. and C. D. Wunsch. 1970. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequences of Two Proteins.”J. molec. Biol. 48, 443–453.
Article Google Scholar
Sellers, P. H. 1974. “On the Theory and Computation of Evolutionary Distances.”SIAM J. appl. Math. 26, 787–793.
Article MATH MathSciNet Google Scholar
—. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Algorithms 1, 359–373.
Article MATH MathSciNet Google Scholar
— 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.
Article MATH MathSciNet Google Scholar
Shaw, M. W., R. A. Lamb, D. J. Briedis, B. W. Erickson and P. W. Choppin. 1982. “Complete Nucleotide Sequence of the Neuraminidase Gene of Influenza B Virus.”Proc. natn. Acad. Sci. U.S.A. 79, 6817–6821.
Article Google Scholar
Smith, T. F. and M. S. Waterman. 1981. “Comparison of Biosequences.”Adv. appl. Math. 2, 482–489.
Article MATH MathSciNet Google Scholar
—— and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.
Google Scholar
—— and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucl. Acids Res. 11, 2205–2220.
Google Scholar
Swartz, M. N., T. A. Trautner and A. Kornberg. 1962. “Enzymatic Synthesis of Deoxyribonucleic Acid. XI. Further Studies on Nearest Neighbor Base Sequences in Deoxyribonucleic Acids.”J. biol. Chem. 237, 1961–1967.
Google Scholar
Taniguchi, T., H. Matsui, T. Fujita, C. Takaoka, N. Kashima, R. Yoshimoto and J. Hamuro. 1983. “Structure and Expression of a Cloned cDNA for Human Interleukin-2.”Nature 302, 305–310.
Article Google Scholar
Waterman, M. S. 1984. “Efficient Sequence Alignment Algorithms.”J. theor. Biol. 108, 333–337.
MathSciNet Google Scholar
—, T. F. Smith and W. A. Beyer. 1976. “Some Biological Sequence Metrics.”Adv. Math. 20, 367–387.
Article MATH MathSciNet Google Scholar
Yokota, T., N. Arai, F. Lee, D. Rennick, T. Mosmann and K. Arai. 1985. “Use of a cDNA Expression Vector for Isolation of Mouse Interleukin 2 cDNA Clones: Expression of T-cell Growth-factor Activity After Transfection of Monkey Cells.”Proc. natn. Acad. Sci U.S.A. 82, 68–72.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Rockefeller University, 10021, New York, NY, U.S.A.
Stephen F. Altschul & Bruce W. Erickson
Department of Applied Mathematics, Massachusetts Institute of Technology, 02139, Cambridge, MA, U.S.A.
Stephen F. Altschul

Authors

Stephen F. Altschul
View author publications
You can also search for this author in PubMed Google Scholar
Bruce W. Erickson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altschul, S.F., Erickson, B.W. A nonlinear measure of subalignment similarity and its significance levels. Bltn Mathcal Biology 48, 617–632 (1986). https://doi.org/10.1007/BF02462327

Download citation

Received: 07 October 1985
Revised: 09 June 1986
Issue Date: September 1986
DOI: https://doi.org/10.1007/BF02462327

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A nonlinear measure of subalignment similarity and its significance levels

Abstract

Access this article

Similar content being viewed by others

Dynamic Programming

Theoretical and Computational Aspects of Protein Structural Alignment

Theoretical and Computational Aspects of Protein Structural Alignment

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A nonlinear measure of subalignment similarity and its significance levels

Abstract

Access this article

Similar content being viewed by others

Dynamic Programming

Theoretical and Computational Aspects of Protein Structural Alignment

Theoretical and Computational Aspects of Protein Structural Alignment

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation