A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds

  • Gregory Kucherov
  • Laurent Noé
  • Mikhail Roytberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3692)

Abstract

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem – a set of target alignments, an associated probability distribution, and a seed model – that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 51–70 (2003); Preliminary version in Combinatorial Pattern Matching 2001Google Scholar
  2. 2.
    Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)CrossRefGoogle Scholar
  3. 3.
    Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  4. 4.
    Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Research 12, 656–664 (2002)MathSciNetGoogle Scholar
  5. 5.
    Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004); Earlier version in GIW 2003 (International Conference on Genome Informatics) Google Scholar
  6. 6.
    Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the 8th Annual International Conference on Computational Molecular Biology. ACM Press, New York (2004)Google Scholar
  7. 7.
    Yang, I.H., Wang, S.H., Chen, Y.H., Huang, P.H., Ye, L., Huang, X., Chao, K.M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 411–416. IEEE Computer Society Press, Los Alamitos (2004)CrossRefGoogle Scholar
  8. 8.
    Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE Transactions on Computational Biology and Bioinformatics 2, 51–61 (2005)CrossRefGoogle Scholar
  9. 9.
    Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2, 29–38 (2005)CrossRefGoogle Scholar
  11. 11.
    Chen, W., Sung, W.K.: On half gapped seed. Genome Informatics 14, 176–185 (2003); Preliminary version in the 14th International Conference on Genome Informatics (GIW) Google Scholar
  12. 12.
    Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)Google Scholar
  13. 13.
    Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  14. 14.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. to appear in Discrete Applied Mathematics (2002)Google Scholar
  15. 15.
    Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)CrossRefGoogle Scholar
  16. 16.
    Choi, K., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. Journal of Computer and System Sciences 68, 22–40 (2004)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology, pp. 67–75. ACM Press, New York (2003)Google Scholar
  18. 18.
    Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds For Homology Search. Bioinformatics 20, 1053–1059 (2004)CrossRefGoogle Scholar
  19. 19.
    Kucherov, G., Noé, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 387–394. IEEE Computer Society Press, Los Alamitos (2004)CrossRefGoogle Scholar
  20. 20.
    Ullman, J.D., Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)MATHGoogle Scholar
  21. 21.
    Finkelstein, A., Roytberg, M.: Computation of biopolymers: A general approach to different problems. BioSystems 30, 1–19 (1993)CrossRefGoogle Scholar
  22. 22.
    Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Gregory Kucherov
    • 1
  • Laurent Noé
    • 1
  • Mikhail Roytberg
    • 2
  1. 1.INRIA/LORIAVillers-lès-NancyFrance
  2. 2.Institute of Mathematical Problems in BiologyPushchino, Moscow RegionRussia

Personalised recommendations