Significance levels for biological sequence comparison using non-linear similarity functions

Altschul, Stephen F.; Erickson, Bruce W.

doi:10.1007/BF02459979

Significance levels for biological sequence comparison using non-linear similarity functions

Published: January 1988

Volume 50, pages 77–92, (1988)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Stephen F. Altschul¹ &
Bruce W. Erickson²

48 Accesses
14 Citations
Explore all metrics

Abstract

A class of non-linear similarity functionss ₁ has been proposed for comparing subalignments of biological sequences. The distribution of maximals ₁-similarities is well approximated by the extreme value distribution. The significance levels ofs ₁ are studied for a variety of nucleotide frequency distributions as well as for several matrices of amino acid substitution costs. Also, the significance levels ofs ₁ are explored for comparing three biological sequences. Several previously described subalignments of bovine proenkephalin and porcine prodynorphin are shown to be highly significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evolutionary algorithms and their applications to engineering problems

Article Open access 16 March 2020

Siamese Neural Networks: An Overview

Introduction to Bioinformatics

Literature

Altschul, S. F. 1987. “Aspects of Biological Sequence Comparison.” Ph.D. thesis, Massachusetts Institute of Technology.
— and B. W. Erickson. 1985. “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation That Preserves Dinucleotide and Codon Usage.”Mol. Biol. Evol. 2, 526–538.
Google Scholar
— and —. 1986a. “A Non-linear Measure of Subalignment Similarity and its Significance Levels.”Bull. math. Biol. 48, 617–632.
MATH MathSciNet Google Scholar
— and —. 1986b. “Locally Optimal Subalignments Using Non-linear Similarity Functions.”Bull. math. Biol. 48, 633–660.
MATH MathSciNet Google Scholar
Arratia, R., L. Gordon and M. S. Waterman. 1986. “An Extreme Value Theory for Sequence Matching.”Ann. Stat. 14, 971–993.
MATH MathSciNet Google Scholar
— and M. S. Waterman. 1985. “Critical Phenomena in Sequence Matching.”Ann. Prob. 13, 1236–1249.
MATH MathSciNet Google Scholar
Dayhoff, M. O., R. M. Schwartz and B. C. Orcutt. 1978. “A Model of Evolutionary Change in Proteins.” InAtlas of Protein Sequence and Structure, Vol. 5, (Suppl. 3), M. O. Dayhoff (Ed.), pp. 345–352. Washington: National Biomedical Research Foundation.
Google Scholar
Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 55–91. Reading, MA: Addison-Wesley.
Google Scholar
Fitch, W. M. 1983a. “Calculating the Expected Frequencies of Potential Secondary Structure in Nucleic Acids as a Function of Stem Length, Loop Size, Base Composition and Nearest-Neighbor Frequencies.”Nucl. Acids Res. 11, 4655–4663.
Google Scholar
—. 1983b. “Random Sequences.”J. mol. Biol. 163, 171–176.
Article Google Scholar
Goad, W. B. and M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A. General Method for Finding Local Homologies and Symmetries.”Nucl. Acids Res. 10, 247–263.
Google Scholar
Gordon, L., M. F. Schilling and M. S. Waterman. 1986. “An Extreme Value Theory for Long Head Runs.”Prob. Th. Rel. 72, 279–287.
Article MATH MathSciNet Google Scholar
Gumbel, E. J. 1962. “Statistical Theory of Extreme Values (Main Results).” InContributions to Order Statistics, A. E. Sarhan and B. G. Greenberg (Eds), pp. 56–93. New York: Wiley.
Google Scholar
Kakidani, H., Y. Furutani, H. Takahashi, M. Noda, Y. Morimoto, T. Hirose, M. Asai, S. Inayama, S. Nakanishi and S. Numa. 1982. “Cloning and Sequence Analysis of cDNA for Porcine β-Neo-endorphin/Dynorphin Precursor.”Nature 298, 577–579.
Article Google Scholar
Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 1–44. Reading, MA: Addison-Wesley.
Google Scholar
Larsen, R. J. and M. L. Marx. 1981.An Introduction to Mathematical Statistics and its Applications. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Lawrence, C. B., D. A. Goldman and R. T. Hood. 1986. “Optimized Homology Searches of the Gene and Protein Sequence Data Banks.”Bull. math. Biol. 48, 569–583.
MATH Google Scholar
Lewis, R. V. and B. W. Erickson. 1986. “Evolution of Proenkephalin and Prodynorphin.”Am. Zool. 26, 1027–1032.
Google Scholar
Lipman, D. J., W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic-Acid Similarities.”Nucl. Acids Res. 12, 215–226.
Google Scholar
Noda, M., Y. Furutani, H. Takahashi, M. Toyosata, T. Hirose, S. Inayama, S. Nakanishi and S. Numa. 1982. “Cloning and Sequence Analysis of cDNA for Bovine Adrenal Preproenkephalin.”Nature 295, 202–206.
Article Google Scholar
Schwartz, R. M. and M. O. Dayhoff. 1978. “Matrices for Detecting Distant Relationships.” InAtlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, M. O. Dayhoff (Ed.), pp. 353–358. Washington: National Biomedical Research Foundation.
Google Scholar
Sellers, P. H. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.
Article MATH MathSciNet Google Scholar
Smith, T. F., M. S. Waterman and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.
Google Scholar
—— and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucl. Acids Res. 11, 2205–2220.
Google Scholar
Swartz, M. N., T. A. Trautner and A. Kornberg. 1962. “Enzymatic Synthesis of Deoxyribonucleic Acid—XI. Further Studies on Nearest Neighbor Base Sequences in Deoxyribonucleic Acids.”J. biol. Chem. 237, 1961–1967.
Google Scholar

Download references

Author information

Authors and Affiliations

Mathematical Research Branch, NIDDK, National Institutes of Health, 20892, Bethesda, MD, USA
Stephen F. Altschul
Department of Chemistry, The University of North Carolina, 27599, Chapel Hill, NC, USA
Bruce W. Erickson

Authors

Stephen F. Altschul
View author publications
You can also search for this author in PubMed Google Scholar
Bruce W. Erickson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altschul, S.F., Erickson, B.W. Significance levels for biological sequence comparison using non-linear similarity functions. Bltn Mathcal Biology 50, 77–92 (1988). https://doi.org/10.1007/BF02459979

Download citation

Received: 02 June 1987
Revised: 03 November 1987
Issue Date: January 1988
DOI: https://doi.org/10.1007/BF02459979

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Significance levels for biological sequence comparison using non-linear similarity functions

Abstract

Access this article

Similar content being viewed by others

Evolutionary algorithms and their applications to engineering problems

Siamese Neural Networks: An Overview

Introduction to Bioinformatics

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Significance levels for biological sequence comparison using non-linear similarity functions

Abstract

Access this article

Similar content being viewed by others

Evolutionary algorithms and their applications to engineering problems

Siamese Neural Networks: An Overview

Introduction to Bioinformatics

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation