Skip to main content

Statistical significance and extremal ensemble of gapped local hybrid alignment

  • Chapter
  • First Online:
Biological Evolution and Statistical Physics

Part of the book series: Lecture Notes in Physics ((LNP,volume 585))

Abstract

A “semi-probabilistic” alignment algorithm which combines ideas from Smith-Waterman and probabilistic alignment is proposed and studied in detail. It is predicted that the score statistics of this “hybrid” algorithm is of the universal Gumbel form, with the key Gumbel parameter λ taking on a fixed asymptotic value for a wide variety of scoring parameters.We have also characterized the “extremal ensemble”, i.e., the collection of sequence pairs exhibiting similarities that a given scoring system is most sensitive to. Based on this extremal ensemble, a simple recipe for the computation of the “relative entropy”, and from it the correction to λ due to finite sequence length is also given. This allows us to assign p-values to the alignment results for arbitrary scoring parameters and gap costs. The predictions compare well with direct numerical simulations for a broad range of sequence lengths with various choices of the substitution scores and affine gap parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J., 1990. Basic Local Alignment Search Tool. J. Mol. Biol. 215: 403–410.

    Google Scholar 

  2. Altschul, S.F., 1991. Substitution Matrices from an Information Theoretic Perspective. J. Mol. Biol. 119: 555–565.

    Google Scholar 

  3. Altschul, S.F., and Gish, W., 1996. Local Alignment Statistics. Methods in Enzymology 266: 460–480.

    Google Scholar 

  4. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25: 3389–3402.

    Google Scholar 

  5. Altschul, S.F., Bundschuh, R., Hwa, T., and Olsen, R., 2001. The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Research 29: 351–361.

    Google Scholar 

  6. Arratia, R., Morris, P., and Waterman, M.S., 1988. Stochastic scrabbles: a law of large numbers for sequence matching with scores. J. Appl. Prob. 25: 106–119.

    Google Scholar 

  7. Bishop, M.J., and Thompson, E.A., 1986. Maximum likelihood alignment of DNA sequences. J. Mol. Biol. 190: 159–165.

    Google Scholar 

  8. Bundschuh, R., 2000. An Analytic Approach to Significance Assessment in Local Sequence Alignment with Gaps. RECOMB 2000.

    Google Scholar 

  9. Collins, J.F., Coulson, A.F.W., and Lyall, A., 1988. The significance of protein sequence similarities. CABIOS 4: 67–71.

    Google Scholar 

  10. Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C., 1978. A Model of Evolutionary Change in Proteins. In Atlas of Protein Sequence and Structure, Dayho. M.O. and Eck, R.V., eds., 5 supp. 3: 345–358, Natl. Biomed. Res. Found.

    Google Scholar 

  11. Drasdo, D., Hwa, T., and Lassig, M., 1998. A Scaling Theory of Sequence Alignment with Gaps. ISMB98: 52–58.

    Google Scholar 

  12. Gumbel, E.J., 1958. Statistics of Extremes. New York, NY: Columbia University Press.

    Google Scholar 

  13. Heniko., S., and Heniko., J.G., 1994. Position-based Sequence Weights. J. Mol. Biol. 162: 705–708.

    Google Scholar 

  14. Hughey, R., and Krogh, A., 1996. Hidden Markov Models for Sequence Analysis: Extension and Analysis of the Basic Method. CABIOS 12: 95–107.

    Google Scholar 

  15. Hwa, T., and Nattermann, T., 1995. Disorder-induced depinning transition. Phys. Rev. B 51: 455–469.

    Google Scholar 

  16. Hwa, T., and Lässig, M., 1996. Similarity Detection and Localization. Phys. Rev. Lett. 76:2591–2594.

    Google Scholar 

  17. Karlin, S., and Altschul, S.F. 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87: 2264–2268.

    Google Scholar 

  18. Karlin, S., and Dembo, A., 1992. Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Prob. 24: 113–140.

    Google Scholar 

  19. Karlin, S., and Altschul, S.F., 1993. Applications and statistics for multiple highscoring segments in molecular sequences. Proc. Natl. Acad. Sci. USA 90: 5873–5877.

    Google Scholar 

  20. Mott, R., 1992. Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54: 59–75.

    Google Scholar 

  21. Needleman, S.B., and Wunsch, C.D., 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48: 443–453.

    Google Scholar 

  22. Olsen, R., Bundschuh, R., and Hwa, T., 1999. Rapid Assessment of Extremal Statistics for Gapped Local Alignment. Proceedings of The Seventh International Conference on Intelligent Systems for Molecular Biology (ISMB99). T. Lengauer et al. eds., 211–222 (AAAI Press, Menlo Park).

    Google Scholar 

  23. Pearson, W.R., 1988. Improved Tools for Biological Sequence Comparison. Proc. Natl. Acad. Sci. USA 85: 2444–2448.

    Google Scholar 

  24. Smith, T.F., and Waterman, M.S., 1981. Identification of Common Molecular Subsequences. J. Mol. Biol. 147: 195–197.

    Google Scholar 

  25. Smith, T.F., Waterman, M.S., and Burks, C., 1985. The statistical distribution of nucleic acid similarities. Nucleic Acids Research 13: 645–656.

    Google Scholar 

  26. Thorne, J.L., Kishino, H., and Felsenstein, J. 1991. An Evolutionary Model for Maximum Likelihood Alignment of DNA Sequences. J. Mol. Evol. 33: 114–124.

    Google Scholar 

  27. Thorne, J.L., Kishino, H., and Felsenstein, J., 1992. Inching toward Reality: An Improved Likelihood Model of Sequence Evolution. J. Mol. Evol. 34: 3–16.

    Google Scholar 

  28. Waterman, M.S., and Vingron, M., 1994a. Sequence Comparison Significance and Poisson Approximation. Stat. Sci. 9: 367–381.

    Google Scholar 

  29. Waterman, M.S., and Vingron, M., 1994b. Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Natl. Acad. Sci. U.S.A. 91: 4625–4628.

    Google Scholar 

  30. Yu, Y.-K., and Hwa, T., 1999 Statistical Significance of Probabilistic Sequence Alignment and Related Local Hidden Markov Models. Submitted to J. Comp. Biol..

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Yu, YK., Bundschuh, R., Hwa, T. (2002). Statistical significance and extremal ensemble of gapped local hybrid alignment. In: Lässig, M., Valleriani, A. (eds) Biological Evolution and Statistical Physics. Lecture Notes in Physics, vol 585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45692-9_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45692-9_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43188-6

  • Online ISBN: 978-3-540-45692-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics