Performing Local Similarity Searches with Variable Length Seeds

Csűrös, Miklós

doi:10.1007/978-3-540-27801-6_28

Miklós Csűrös¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

623 Accesses
7 Citations

Abstract

This paper describes a general method for controlling the running time of similarity search algorithms. Our method can be used in conjunction with the seed-and-extend paradigm employed by many search algorithms, including BLAST. We introduce the concept of a seed tree, and provide a seed tree-pruning algorithm that affects the specificity in a predictable manner. The algorithm uses a single parameter to control the speed of the similarity search. The parameter enables us to reach arbitrary levels between the exponential increases in running time that are typical of seed-and-extend methods.

Research supported by NSERC grant 250391-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Miller, W.: Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397 (2001)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Myers, G., Durbin, R.: A table-driven full sensitivity similarity search algorithm. J. Comput. Biol. 10, 103–117 (2003)
Article Google Scholar
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for largescale genome alignment and comparison. Nucleic Acids Res 30, 2478–2483 (2002)
Article Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)
Article Google Scholar
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Kent, W.J.: BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
MathSciNet Google Scholar
Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)
Article Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004) (to appear)
Google Scholar
MGSC: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)
Google Scholar
Friedkin, E.: Trie memory. Comm. ACM 3, 490–500 (1960)
Article Google Scholar
Nicodème, P., Salvy, B., Flajolet, P.: Motif statistics. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 194–211. Springer, Heidelberg (1999)
Google Scholar
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Vingron, M., Istrail, S., Pevzner, P., Waterman, M. (eds.) Proc. 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75. ACM Press, New York (2003)
Google Scholar
Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)
Article Google Scholar
Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)
Article MATH Google Scholar
Brejová, B., Brown, D., Vinař, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Chapter Google Scholar
Brudno, M., Chapman, M.A., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Département d’informatique et de recherche opérationnelle, Université de Montréal, C.P. 6128 succ. Centre-Ville, Montréal, Québec, H3C 3J7, Canada
Miklós Csűrös

Authors

Miklós Csűrös
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Suleyman Cenk Sahinalp
Google Inc., 76 9th Av, 4th Fl., 10011, New York, NY
S. Muthukrishnan
Tom Sawyer Software, 94612, Oakland, CA, USA
Ugur Dogrusoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Csűrös, M. (2004). Performing Local Similarity Searches with Variable Length Seeds. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-540-27801-6_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics