WABI 2009: Algorithms in Bioinformatics pp 233-245 | Cite as

A General Framework for Local Pairwise Alignment Statistics with Gaps

  • Pasi Rastas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5724)

Abstract

We present a novel dynamic programming framework that allows one to compute tight upper bounds for the p-values of gapped local alignments in pseudo–polynomial time. Our algorithms are fast and simple and unlike most earlier solutions, require no curve fitting by sampling. Moreover, our new methods do not suffer from the so–called edge effects, a by–product of the common practice used to compute p-values. These new methods also provide a way to get into very small p-values, that are needed when comparing sequences against large databases. Based on our experiments, accurate estimates of small p-values are difficult to get by curve fitting.

Keywords

Null Model Local Alignment Alignment Score Letter Pair Partial Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, 1st edn. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
  2. 2.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefPubMedGoogle Scholar
  3. 3.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefPubMedGoogle Scholar
  4. 4.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87, 2264–2268 (1990)CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRefPubMedGoogle Scholar
  7. 7.
    Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high–scoring segments from molecular sequences. The Annals of Statistics 18, 571–581 (1990)CrossRefGoogle Scholar
  8. 8.
    Karlin, S.: Statistical signals in bioinformatics. Proc. Natl. Acad. Sci. USA 102, 13355–13362 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Mercier, S., Cellier, D., Charlot, F., Daudin, J.J.: Exact and asymptotic distribution of the local score of one i.i.d. Random sequence. In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 74–83. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Pearson, W.: Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998)CrossRefPubMedGoogle Scholar
  11. 11.
    Mitrophanov, A., Borodovsky, M.: Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7, 2–24 (2006)CrossRefPubMedGoogle Scholar
  12. 12.
    Naor, D., Brutlag, D.: On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 179–196. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  13. 13.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 1st edn. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  14. 14.
    Graham, R., Knuth, D., Patashnik, O.: Concrete mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Reading (1994)Google Scholar
  15. 15.
    Cooley, J., Tukey, J.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19, 297–301 (1965)CrossRefGoogle Scholar
  16. 16.
    Bernstein, D.: Multidigit multiplication for mathematicians (2001)Google Scholar
  17. 17.
    Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory on NP-Completeness. W. H. Freeman and Company, New York (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Pasi Rastas
    • 1
  1. 1.Department of Computer Science & HIIT Basic Research UnitUniversity of HelsinkiFinland

Personalised recommendations