Skip to main content

Evaluating the Statistical Significance of Multiple Distinct Local Alignments

  • Chapter
Theoretical and Computational Methods in Genome Research

Abstract

A comparison of two sequences may uncover multiple regions of local similarity. While the significance of each local alignment may be evaluated independently, sometimes a combined assessment is appropriate. This paper discusses a variety of statistical and algorithmic issues that such an assessment presents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smith, T.F. & Waterman, M.S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147:195–197.

    Article  PubMed  CAS  Google Scholar 

  2. Pearson, W.R. & Lipman, D.J. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85:2444–2448.

    Article  PubMed  CAS  Google Scholar 

  3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403–410.

    PubMed  CAS  Google Scholar 

  4. Smith, T.F., Waterman, M.S. & Burks, C. (1985). The statistical distribution of nucleic acid similarities. Nucl. Acids Res. 13:645–656.

    Article  PubMed  CAS  Google Scholar 

  5. Altschul, S.F. & Erickson, B.W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bull. Math. Biol. 48:617–632.

    PubMed  CAS  Google Scholar 

  6. Arratia, R., Gordon, L. & Waterman, M.S. (1986). An extreme value theory for sequence matching. Ann. Stat. 14:971–993.

    Article  Google Scholar 

  7. Collins, J.F., Coulson, A.F.W. & Lyall, A. (1988). The significance of protein sequence similarities. CABIOS 4:67–71.

    PubMed  CAS  Google Scholar 

  8. Arratia, R. & Waterman, M.S. (1989). The Erdos-Renyi strong law for pattern matching with a given proportion of mismatches. Ann. Prob. 17:1152–1169.

    Article  Google Scholar 

  9. Karlin, S. & Altschul, S.F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. U.S.A. 87:2264–2268

    Article  PubMed  CAS  Google Scholar 

  10. Dembo, A. & Karlin, S. (1991). Strong limit theorems of empirical functionals for large exceedances of partial sums of i.i.d. variables. Ann. Prob. 19:1737–1755.

    Article  Google Scholar 

  11. Dembo, A. & Karlin, S. (1991). Strong limit theorems of empirical distributions for large segmental exceedances of partial sums of Markov variables. Ann. Prob. 19:1756–1767.

    Article  Google Scholar 

  12. Mott, R. (1992). Maximum-likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54:59–75.

    Google Scholar 

  13. Altschul, S.F. (1993). A protein alignment scoring system sensitive at all evolutionary distances. J. Mol. Evol. 36:290–300.

    Article  PubMed  CAS  Google Scholar 

  14. Arratia, R. & Waterman, M.S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Prob. 4:200–225.

    Article  Google Scholar 

  15. Dembo, A., Karlin, S. & Zeitouni, O. (1994). Limit distribution of maximal non-aligned two-sequence segmental score. Ann. Prob. 22:2022–2039.

    Article  Google Scholar 

  16. Waterman, M.S. & Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Stat. Sci. 9:367–381.

    Article  Google Scholar 

  17. Altschul, S.F. & Gish, W. (1996). Local alignment statistics. Meth. Enzymol. 266:460–480.

    Article  PubMed  CAS  Google Scholar 

  18. Sellers, P.H. (1984). Pattern recognition in genetic sequences by mismatch density. Bull. Math. Biol. 46:501–514

    CAS  Google Scholar 

  19. Altschul, S.F. & Erickson, B.W. (1986). Locally optimal subalignments using nonlinear similarity functions. Bull. Math. Biol. 48:633–660.

    PubMed  CAS  Google Scholar 

  20. Waterman, M.S. & Eggert, M. (1987). A new algorithm for best subsequence alignments with applications to tRNA-rRNA comparisons. J. Mol. Biol. 197:723–728.

    Article  PubMed  CAS  Google Scholar 

  21. Huang, X., Hardison, R.C. & Miller, W. (1990). A space-efficient algorithm for local similarities. CABIOS 6:373–381.

    PubMed  CAS  Google Scholar 

  22. Karlin, S. & Altschul, S.F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. U.S.A. 90:5873–5877.

    Article  PubMed  CAS  Google Scholar 

  23. Gumbel, E.J. (1958). Statistics of extremes. Columbia University Press, New York.

    Google Scholar 

  24. Woodsmall, R.M. & Benson, D.A. (1993). Information resources at the National Center for Biotechnology Information. Bull. Med. Libr. Assoc. 81:282–284.

    PubMed  CAS  Google Scholar 

  25. Altschul, S.F., Boguski, M.S., Gish, W. & Wootton, J.C. (1994). Issues in searching molecular sequence databases. Nature Genet. 6:119–129.

    Article  PubMed  CAS  Google Scholar 

  26. Henikoff, S. & Henikoff, J.G. (1991). Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19:6565–6572.

    Article  PubMed  CAS  Google Scholar 

  27. Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F. & Wootton, J.C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262:208–214.

    Article  PubMed  CAS  Google Scholar 

  28. Altschul, S.F. (1991). Amino acid substitution matrices from an information theoretic perspective. J. Mol. Biol. 219:555–565.

    Article  PubMed  CAS  Google Scholar 

  29. Gotoh, O. (1982). An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705–708.

    Article  PubMed  CAS  Google Scholar 

  30. Fitch, W.M. & Smith, T.F. (1983). Optimal sequence alignments. Proc. Natl. Acad. Sci. USA 80:1382–1386.

    Article  PubMed  CAS  Google Scholar 

  31. Altschul, S.F. & Erickson, B.W. (1986). Optimal sequence alignment using affine gap costs. Bull. Math. Biol. 48:603–616.

    PubMed  CAS  Google Scholar 

  32. Myers, E.W. & Miller, W. (1988). Optimal alignments in linear space. CABIOS 4:11–17.

    PubMed  CAS  Google Scholar 

  33. Henikoff, S. & Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89:10915–10919.

    Article  PubMed  CAS  Google Scholar 

  34. Bairoch, A. & Boeckmann, B. (1994). The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res. 22:3578–3580.

    Article  PubMed  CAS  Google Scholar 

  35. Xu, G.F., O’Connell, P., Viskochil, D., Cawthon, R., Robertson, M., Culver, M., Dunn, D., Stevens, J., Gesteland, R., White, R. & Weiss, R. (1990). The neurofibromatosis type I gene encodes a protein related to GAP. Cell 62:599–608.

    Article  PubMed  CAS  Google Scholar 

  36. Cvrckova, F. & Nasmyth, K. (1993). Yeast G1 cyclins CLNI and CLN2 and a GAP-like protein have a role in bud formation. EMBO J. 12:5277–5286.

    PubMed  CAS  Google Scholar 

  37. Wootton, J.C. & Federhen, S. (1993). Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17:149–163.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media New York

About this chapter

Cite this chapter

Altschul, S.F. (1997). Evaluating the Statistical Significance of Multiple Distinct Local Alignments. In: Suhai, S. (eds) Theoretical and Computational Methods in Genome Research. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5903-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5903-0_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7708-5

  • Online ISBN: 978-1-4615-5903-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics