Skip to main content
Log in

Significance levels for biological sequence comparison using non-linear similarity functions

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

A class of non-linear similarity functionss 1 has been proposed for comparing subalignments of biological sequences. The distribution of maximals 1-similarities is well approximated by the extreme value distribution. The significance levels ofs 1 are studied for a variety of nucleotide frequency distributions as well as for several matrices of amino acid substitution costs. Also, the significance levels ofs 1 are explored for comparing three biological sequences. Several previously described subalignments of bovine proenkephalin and porcine prodynorphin are shown to be highly significant.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Altschul, S. F. 1987. “Aspects of Biological Sequence Comparison.” Ph.D. thesis, Massachusetts Institute of Technology.

  • — and B. W. Erickson. 1985. “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation That Preserves Dinucleotide and Codon Usage.”Mol. Biol. Evol. 2, 526–538.

    Google Scholar 

  • — and —. 1986a. “A Non-linear Measure of Subalignment Similarity and its Significance Levels.”Bull. math. Biol. 48, 617–632.

    MATH  MathSciNet  Google Scholar 

  • — and —. 1986b. “Locally Optimal Subalignments Using Non-linear Similarity Functions.”Bull. math. Biol. 48, 633–660.

    MATH  MathSciNet  Google Scholar 

  • Arratia, R., L. Gordon and M. S. Waterman. 1986. “An Extreme Value Theory for Sequence Matching.”Ann. Stat. 14, 971–993.

    MATH  MathSciNet  Google Scholar 

  • — and M. S. Waterman. 1985. “Critical Phenomena in Sequence Matching.”Ann. Prob. 13, 1236–1249.

    MATH  MathSciNet  Google Scholar 

  • Dayhoff, M. O., R. M. Schwartz and B. C. Orcutt. 1978. “A Model of Evolutionary Change in Proteins.” InAtlas of Protein Sequence and Structure, Vol. 5, (Suppl. 3), M. O. Dayhoff (Ed.), pp. 345–352. Washington: National Biomedical Research Foundation.

    Google Scholar 

  • Erickson, B. W. and P. H. Sellers. 1983. “Recognition of Patterns in Genetic Sequences.” InTime Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 55–91. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Fitch, W. M. 1983a. “Calculating the Expected Frequencies of Potential Secondary Structure in Nucleic Acids as a Function of Stem Length, Loop Size, Base Composition and Nearest-Neighbor Frequencies.”Nucl. Acids Res. 11, 4655–4663.

    Google Scholar 

  • —. 1983b. “Random Sequences.”J. mol. Biol. 163, 171–176.

    Article  Google Scholar 

  • Goad, W. B. and M. I. Kanehisa. 1982. “Pattern Recognition in Nucleic Acid Sequences. I. A. General Method for Finding Local Homologies and Symmetries.”Nucl. Acids Res. 10, 247–263.

    Google Scholar 

  • Gordon, L., M. F. Schilling and M. S. Waterman. 1986. “An Extreme Value Theory for Long Head Runs.”Prob. Th. Rel. 72, 279–287.

    Article  MATH  MathSciNet  Google Scholar 

  • Gumbel, E. J. 1962. “Statistical Theory of Extreme Values (Main Results).” InContributions to Order Statistics, A. E. Sarhan and B. G. Greenberg (Eds), pp. 56–93. New York: Wiley.

    Google Scholar 

  • Kakidani, H., Y. Furutani, H. Takahashi, M. Noda, Y. Morimoto, T. Hirose, M. Asai, S. Inayama, S. Nakanishi and S. Numa. 1982. “Cloning and Sequence Analysis of cDNA for Porcine β-Neo-endorphin/Dynorphin Precursor.”Nature 298, 577–579.

    Article  Google Scholar 

  • Kruskal, J. B. 1983. “An Overview of Sequence Comparison.” InTime Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison, D. Sankoff and J. B. Kruskal (Eds), pp. 1–44. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Larsen, R. J. and M. L. Marx. 1981.An Introduction to Mathematical Statistics and its Applications. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Lawrence, C. B., D. A. Goldman and R. T. Hood. 1986. “Optimized Homology Searches of the Gene and Protein Sequence Data Banks.”Bull. math. Biol. 48, 569–583.

    MATH  Google Scholar 

  • Lewis, R. V. and B. W. Erickson. 1986. “Evolution of Proenkephalin and Prodynorphin.”Am. Zool. 26, 1027–1032.

    Google Scholar 

  • Lipman, D. J., W. J. Wilbur, T. F. Smith and M. S. Waterman. 1984. “On the Statistical Significance of Nucleic-Acid Similarities.”Nucl. Acids Res. 12, 215–226.

    Google Scholar 

  • Noda, M., Y. Furutani, H. Takahashi, M. Toyosata, T. Hirose, S. Inayama, S. Nakanishi and S. Numa. 1982. “Cloning and Sequence Analysis of cDNA for Bovine Adrenal Preproenkephalin.”Nature 295, 202–206.

    Article  Google Scholar 

  • Schwartz, R. M. and M. O. Dayhoff. 1978. “Matrices for Detecting Distant Relationships.” InAtlas of Protein Sequence and Structure, Vol. 5, Suppl. 3, M. O. Dayhoff (Ed.), pp. 353–358. Washington: National Biomedical Research Foundation.

    Google Scholar 

  • Sellers, P. H. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. math. Biol. 46, 501–514.

    Article  MATH  MathSciNet  Google Scholar 

  • Smith, T. F., M. S. Waterman and C. Burks. 1985. “The Statistical Distribution of Nucleic Acid Similarities.”Nucl. Acids Res. 13, 645–656.

    Google Scholar 

  • —— and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucl. Acids Res. 11, 2205–2220.

    Google Scholar 

  • Swartz, M. N., T. A. Trautner and A. Kornberg. 1962. “Enzymatic Synthesis of Deoxyribonucleic Acid—XI. Further Studies on Nearest Neighbor Base Sequences in Deoxyribonucleic Acids.”J. biol. Chem. 237, 1961–1967.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Altschul, S.F., Erickson, B.W. Significance levels for biological sequence comparison using non-linear similarity functions. Bltn Mathcal Biology 50, 77–92 (1988). https://doi.org/10.1007/BF02459979

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02459979

Keywords

Navigation