Skip to main content

Fast Computation of Good Multiple Spaced Seeds

  • Conference paper
Algorithms in Bioinformatics (WABI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Included in the following conference series:

Abstract

Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, the sensitivity of dynamic programming is approached at BLASTn speed. Whereas computing optimal multiple spaced seeds was proved to be NP-hard, we show that, from practical point of view, computing good ones can be very efficient. We give a simple heuristic algorithm which computes good multiple seeds in polynomial time. Computing sensitivity is not required. When allowing the computation of the sensitivity for few seeds, we obtain better multiple seeds than previous ones in much shorter time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and Psi-Blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comput. Biol. 1, 595–610 (2004)

    Article  Google Scholar 

  4. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proc. of RECOMB 2003, pp. 67–75. ACM Press, New York (2003)

    Chapter  Google Scholar 

  5. Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)

    Google Scholar 

  6. Choi, K.P., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. J. Comput. Sys. Sci. 68, 22–40 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  7. Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds for Homology Search. Bioinformatics 20, 1053–1059 (2004)

    Article  Google Scholar 

  8. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Univ. Press, Baltimore (1996)

    MATH  Google Scholar 

  9. Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: Proc. of BIOCOMP 2007 (to appear)

    Google Scholar 

  10. Karp, R., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Develop. 31, 249–260 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  11. Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 3, 253–263 (2004)

    Article  MathSciNet  Google Scholar 

  12. Kisman, D., Li, M., Ma, B., Wang, L.: tPatternHunter: Gapped, fast and sensitive translated homology search. Bioinformatics 21, 542–544 (2005)

    Article  Google Scholar 

  13. Kong, Y.: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. J. Comput. Biol. (to appear)

    Google Scholar 

  14. Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of BIBE 2004, Taiwan, pp. 387–394 (2004)

    Google Scholar 

  15. Li, M.: personal communication

    Google Scholar 

  16. Li, M., Ma, B., Kisman, D., Tromp, J.: Pattern-HunterII: highly sensitive and fast homology search. J. Bioinformatics and Comput. Biol. 2, 417–440 (2004)

    Article  Google Scholar 

  17. Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)

    Article  Google Scholar 

  18. Li, M., Ma, B., Zhang, L.: Superiority and complexity of spaced seeds. In: Proc. of SODA 2006. SIAM, pp. 444–453 (2006)

    Google Scholar 

  19. Ma, B.: personal communication

    Google Scholar 

  20. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  21. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  22. Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: A fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001)

    Article  Google Scholar 

  23. Noé, L., Kucherov, G.: Yass: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res. 33, 540–543 (2005)

    Article  Google Scholar 

  24. Pevzner, P., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13, 135–154 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  25. Preparata, F.P., Zhang, L., Choi, K.P.: Quick, practical selection of effective seeds for homology search. J. Comput. Biol. 12, 137–1152 (2005)

    Article  Google Scholar 

  26. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  27. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proc. of RECOMB 2004, pp. 76–85. ACM Press, New York (2004)

    Chapter  Google Scholar 

  28. Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)

    Google Scholar 

  29. Yang, I.-H., Wang, S.-H., Chen, H.-H., Huang, P.-H., Chao, K.-M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proc. of IEEE 4th Symp. on Bioinformatics and Bioengineering, Taiwan, pp. 411–418. IEEE Computer Society Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  30. Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ilie, L., Ilie, S. (2007). Fast Computation of Good Multiple Spaced Seeds. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74126-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74125-1

  • Online ISBN: 978-3-540-74126-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics