WABI 2009: Algorithms in Bioinformatics pp 233-245 | Cite as
A General Framework for Local Pairwise Alignment Statistics with Gaps
Abstract
We present a novel dynamic programming framework that allows one to compute tight upper bounds for the p-values of gapped local alignments in pseudo–polynomial time. Our algorithms are fast and simple and unlike most earlier solutions, require no curve fitting by sampling. Moreover, our new methods do not suffer from the so–called edge effects, a by–product of the common practice used to compute p-values. These new methods also provide a way to get into very small p-values, that are needed when comparing sequences against large databases. Based on our experiments, accurate estimates of small p-values are difficult to get by curve fitting.
Keywords
Null Model Local Alignment Alignment Score Letter Pair Partial ScorePreview
Unable to display preview. Download preview PDF.
References
- 1.Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, 1st edn. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
- 2.Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefPubMedGoogle Scholar
- 3.Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefPubMedGoogle Scholar
- 4.Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)CrossRefPubMedPubMedCentralGoogle Scholar
- 5.Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87, 2264–2268 (1990)CrossRefPubMedPubMedCentralGoogle Scholar
- 6.Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)CrossRefPubMedGoogle Scholar
- 7.Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high–scoring segments from molecular sequences. The Annals of Statistics 18, 571–581 (1990)CrossRefGoogle Scholar
- 8.Karlin, S.: Statistical signals in bioinformatics. Proc. Natl. Acad. Sci. USA 102, 13355–13362 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
- 9.Mercier, S., Cellier, D., Charlot, F., Daudin, J.J.: Exact and asymptotic distribution of the local score of one i.i.d. Random sequence. In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 74–83. Springer, Heidelberg (2001)CrossRefGoogle Scholar
- 10.Pearson, W.: Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998)CrossRefPubMedGoogle Scholar
- 11.Mitrophanov, A., Borodovsky, M.: Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7, 2–24 (2006)CrossRefPubMedGoogle Scholar
- 12.Naor, D., Brutlag, D.: On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 179–196. Springer, Heidelberg (1993)CrossRefGoogle Scholar
- 13.Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 1st edn. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
- 14.Graham, R., Knuth, D., Patashnik, O.: Concrete mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Reading (1994)Google Scholar
- 15.Cooley, J., Tukey, J.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19, 297–301 (1965)CrossRefGoogle Scholar
- 16.Bernstein, D.: Multidigit multiplication for mathematicians (2001)Google Scholar
- 17.Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory on NP-Completeness. W. H. Freeman and Company, New York (1979)Google Scholar