Skip to main content

A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3692))

Abstract

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem – a set of target alignments, an associated probability distribution, and a seed model – that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 51–70 (2003); Preliminary version in Combinatorial Pattern Matching 2001

    Google Scholar 

  2. Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  3. Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  4. Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Research 12, 656–664 (2002)

    MathSciNet  Google Scholar 

  5. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004); Earlier version in GIW 2003 (International Conference on Genome Informatics)

    Google Scholar 

  6. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the 8th Annual International Conference on Computational Molecular Biology. ACM Press, New York (2004)

    Google Scholar 

  7. Yang, I.H., Wang, S.H., Chen, Y.H., Huang, P.H., Ye, L., Huang, X., Chao, K.M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 411–416. IEEE Computer Society Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  8. Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE Transactions on Computational Biology and Bioinformatics 2, 51–61 (2005)

    Article  Google Scholar 

  9. Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2, 29–38 (2005)

    Article  Google Scholar 

  11. Chen, W., Sung, W.K.: On half gapped seed. Genome Informatics 14, 176–185 (2003); Preliminary version in the 14th International Conference on Genome Informatics (GIW)

    Google Scholar 

  12. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)

    Google Scholar 

  13. Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. to appear in Discrete Applied Mathematics (2002)

    Google Scholar 

  15. Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)

    Article  Google Scholar 

  16. Choi, K., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. Journal of Computer and System Sciences 68, 22–40 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  17. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology, pp. 67–75. ACM Press, New York (2003)

    Google Scholar 

  18. Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds For Homology Search. Bioinformatics 20, 1053–1059 (2004)

    Article  Google Scholar 

  19. Kucherov, G., Noé, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 387–394. IEEE Computer Society Press, Los Alamitos (2004)

    Chapter  Google Scholar 

  20. Ullman, J.D., Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  21. Finkelstein, A., Roytberg, M.: Computation of biopolymers: A general approach to different problems. BioSystems 30, 1–19 (1993)

    Article  Google Scholar 

  22. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  23. Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kucherov, G., Noé, L., Roytberg, M. (2005). A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_21

Download citation

  • DOI: https://doi.org/10.1007/11557067_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29008-7

  • Online ISBN: 978-3-540-31812-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics