Fast Computation of Good Multiple Spaced Seeds
Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, the sensitivity of dynamic programming is approached at BLASTn speed. Whereas computing optimal multiple spaced seeds was proved to be NP-hard, we show that, from practical point of view, computing good ones can be very efficient. We give a simple heuristic algorithm which computes good multiple seeds in polynomial time. Computing sensitivity is not required. When allowing the computation of the sensitivity for few seeds, we obtain better multiple seeds than previous ones in much shorter time.
Keywordshomology search multiple spaced seeds sensitivity string overlaps PatternHunterII BLAST
Unable to display preview. Download preview PDF.
- 1.Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)Google Scholar
- 5.Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)Google Scholar
- 9.Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: Proc. of BIOCOMP 2007 (to appear)Google Scholar
- 13.Kong, Y.: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. J. Comput. Biol. (to appear)Google Scholar
- 14.Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of BIBE 2004, Taiwan, pp. 387–394 (2004)Google Scholar
- 15.Li, M.: personal communicationGoogle Scholar
- 18.Li, M., Ma, B., Zhang, L.: Superiority and complexity of spaced seeds. In: Proc. of SODA 2006. SIAM, pp. 444–453 (2006)Google Scholar
- 19.Ma, B.: personal communicationGoogle Scholar
- 28.Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)Google Scholar