Skip to main content

Performing Local Similarity Searches with Variable Length Seeds

  • Conference paper
Combinatorial Pattern Matching (CPM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Abstract

This paper describes a general method for controlling the running time of similarity search algorithms. Our method can be used in conjunction with the seed-and-extend paradigm employed by many search algorithms, including BLAST. We introduce the concept of a seed tree, and provide a seed tree-pruning algorithm that affects the specificity in a predictable manner. The algorithm uses a single parameter to control the speed of the similarity search. The parameter enables us to reach arbitrary levels between the exponential increases in running time that are typical of seed-and-extend methods.

Research supported by NSERC grant 250391-02.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Miller, W.: Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397 (2001)

    Article  Google Scholar 

  2. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  3. Myers, G., Durbin, R.: A table-driven full sensitivity similarity search algorithm. J. Comput. Biol. 10, 103–117 (2003)

    Article  Google Scholar 

  4. Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, S.L.: Fast algorithms for largescale genome alignment and comparison. Nucleic Acids Res 30, 2478–2483 (2002)

    Article  Google Scholar 

  5. Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  6. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  7. Kent, W.J.: BLAT — the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

    MathSciNet  Google Scholar 

  8. Schwartz, S., Kent, W.J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R.C., Haussler, D., Miller, W.: Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003)

    Article  Google Scholar 

  9. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  10. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004) (to appear)

    Google Scholar 

  11. MGSC: Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002)

    Google Scholar 

  12. Friedkin, E.: Trie memory. Comm. ACM 3, 490–500 (1960)

    Article  Google Scholar 

  13. Nicodème, P., Salvy, B., Flajolet, P.: Motif statistics. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 194–211. Springer, Heidelberg (1999)

    Google Scholar 

  14. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Vingron, M., Istrail, S., Pevzner, P., Waterman, M. (eds.) Proc. 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75. ACM Press, New York (2003)

    Google Scholar 

  15. Brejová, B., Brown, D., Vinař, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)

    Article  Google Scholar 

  16. Karlin, S., Altschul, S.F.: Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268 (1990)

    Article  MATH  Google Scholar 

  17. Brejová, B., Brown, D., Vinař, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  18. Brudno, M., Chapman, M.A., Gottgens, B., Batzoglou, S., Morgenstern, B.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Csűrös, M. (2004). Performing Local Similarity Searches with Variable Length Seeds. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics