Abstract
We present a novel dynamic programming framework that allows one to compute tight upper bounds for the p-values of gapped local alignments in pseudo–polynomial time. Our algorithms are fast and simple and unlike most earlier solutions, require no curve fitting by sampling. Moreover, our new methods do not suffer from the so–called edge effects, a by–product of the common practice used to compute p-values. These new methods also provide a way to get into very small p-values, that are needed when comparing sequences against large databases. Based on our experiments, accurate estimates of small p-values are difficult to get by curve fitting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, 1st edn. Cambridge University Press, Cambridge (1998)
Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25, 3389–3402 (1997)
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. PNAS 87, 2264–2268 (1990)
Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708 (1982)
Karlin, S., Dembo, A., Kawabata, T.: Statistical composition of high–scoring segments from molecular sequences. The Annals of Statistics 18, 571–581 (1990)
Karlin, S.: Statistical signals in bioinformatics. Proc. Natl. Acad. Sci. USA 102, 13355–13362 (2005)
Mercier, S., Cellier, D., Charlot, F., Daudin, J.J.: Exact and asymptotic distribution of the local score of one i.i.d. Random sequence. In: Gascuel, O., Sagot, M.-F. (eds.) JOBIM 2000. LNCS, vol. 2066, pp. 74–83. Springer, Heidelberg (2001)
Pearson, W.: Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998)
Mitrophanov, A., Borodovsky, M.: Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7, 2–24 (2006)
Naor, D., Brutlag, D.: On suboptimal alignments of biological sequences. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1993. LNCS, vol. 684, pp. 179–196. Springer, Heidelberg (1993)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, 1st edn. Cambridge University Press, Cambridge (1997)
Graham, R., Knuth, D., Patashnik, O.: Concrete mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Reading (1994)
Cooley, J., Tukey, J.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19, 297–301 (1965)
Bernstein, D.: Multidigit multiplication for mathematicians (2001)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory on NP-Completeness. W. H. Freeman and Company, New York (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rastas, P. (2009). A General Framework for Local Pairwise Alignment Statistics with Gaps. In: Salzberg, S.L., Warnow, T. (eds) Algorithms in Bioinformatics. WABI 2009. Lecture Notes in Computer Science(), vol 5724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04241-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-04241-6_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04240-9
Online ISBN: 978-3-642-04241-6
eBook Packages: Computer ScienceComputer Science (R0)