Abstract
We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem – a set of target alignments, an associated probability distribution, and a seed model – that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 51–70 (2003); Preliminary version in Combinatorial Pattern Matching 2001
Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Altschul, S., Madden, T., Schäffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)
Kent, W.J.: BLAT–the BLAST-like alignment tool. Genome Research 12, 656–664 (2002)
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology (2004); Earlier version in GIW 2003 (International Conference on Genome Informatics)
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the 8th Annual International Conference on Computational Molecular Biology. ACM Press, New York (2004)
Yang, I.H., Wang, S.H., Chen, Y.H., Huang, P.H., Ye, L., Huang, X., Chao, K.M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 411–416. IEEE Computer Society Press, Los Alamitos (2004)
Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE Transactions on Computational Biology and Bioinformatics 2, 51–61 (2005)
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Brown, D.: Optimizing multiple seeds for protein homology search. IEEE Transactions on Computational Biology and Bioinformatics 2, 29–38 (2005)
Chen, W., Sung, W.K.: On half gapped seed. Genome Informatics 14, 176–185 (2003); Preliminary version in the 14th International Conference on Genome Informatics (GIW)
Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)
Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 39–54. Springer, Heidelberg (2003)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. to appear in Discrete Applied Mathematics (2002)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)
Choi, K., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. Journal of Computer and System Sciences 68, 22–40 (2004)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology, pp. 67–75. ACM Press, New York (2003)
Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds For Homology Search. Bioinformatics 20, 1053–1059 (2004)
Kucherov, G., Noé, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proceedings of the IEEE 4th Symposium on Bioinformatics and Bioengineering, pp. 387–394. IEEE Computer Society Press, Los Alamitos (2004)
Ullman, J.D., Aho, A.V., Hopcroft, J.E.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)
Finkelstein, A., Roytberg, M.: Computation of biopolymers: A general approach to different problems. BioSystems 30, 1–19 (1993)
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for Hidden Markov Models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kucherov, G., Noé, L., Roytberg, M. (2005). A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_21
Download citation
DOI: https://doi.org/10.1007/11557067_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)