Abstract
This paper describes a general method for controlling the running time of similarity search algorithms. Our method can be used in conjunction with the seed-and-extend paradigm employed by many search algorithms, including BLAST. We introduce the concept of a seed tree, and provide a seed tree-pruning algorithm that affects the specificity in a predictable manner. The algorithm uses a single parameter to control the speed of the similarity search. The parameter enables us to reach arbitrary levels between the exponential increases in running time that are typical of seed-and-extend methods.
Research supported by NSERC grant 250391-02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Miller, W.: Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397 (2001)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Myers, G., Durbin, R.: A table-driven full sensitivity similarity search algorithm. J. Comput. Biol. 10, 103–117 (2003)
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for largescale genome alignment and comparison. Nucleic Acids Res 30, 2478–2483 (2002)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Kent, W.J.: BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004) (to appear)
MGSC: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Friedkin, E.: Trie memory. Comm. ACM 3, 490–500 (1960)
Nicodème, P., Salvy, B., Flajolet, P.: Motif statistics. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 194–211. Springer, Heidelberg (1999)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Vingron, M., Istrail, S., Pevzner, P., Waterman, M. (eds.) Proc. 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75. ACM Press, New York (2003)
Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Brejová, B., Brown, D., Vinař, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Brudno, M., Chapman, M.A., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Csűrös, M. (2004). Performing Local Similarity Searches with Variable Length Seeds. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-27801-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive