Abstract
Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, the sensitivity of dynamic programming is approached at BLASTn speed. Whereas computing optimal multiple spaced seeds was proved to be NP-hard, we show that, from practical point of view, computing good ones can be very efficient. We give a simple heuristic algorithm which computes good multiple seeds in polynomial time. Computing sensitivity is not required. When allowing the computation of the sensitivity for few seeds, we obtain better multiple seeds than previous ones in much shorter time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and Psi-Blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comput. Biol. 1, 595–610 (2004)
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proc. of RECOMB 2003, pp. 67–75. ACM Press, New York (2003)
Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)
Choi, K.P., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. J. Comput. Sys. Sci. 68, 22–40 (2004)
Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds for Homology Search. Bioinformatics 20, 1053–1059 (2004)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Univ. Press, Baltimore (1996)
Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: Proc. of BIOCOMP 2007 (to appear)
Karp, R., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Develop. 31, 249–260 (1987)
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 3, 253–263 (2004)
Kisman, D., Li, M., Ma, B., Wang, L.: tPatternHunter: Gapped, fast and sensitive translated homology search. Bioinformatics 21, 542–544 (2005)
Kong, Y.: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. J. Comput. Biol. (to appear)
Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of BIBE 2004, Taiwan, pp. 387–394 (2004)
Li, M.: personal communication
Li, M., Ma, B., Kisman, D., Tromp, J.: Pattern-HunterII: highly sensitive and fast homology search. J. Bioinformatics and Comput. Biol. 2, 417–440 (2004)
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)
Li, M., Ma, B., Zhang, L.: Superiority and complexity of spaced seeds. In: Proc. of SODA 2006. SIAM, pp. 444–453 (2006)
Ma, B.: personal communication
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: A fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001)
Noé, L., Kucherov, G.: Yass: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res. 33, 540–543 (2005)
Pevzner, P., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13, 135–154 (1995)
Preparata, F.P., Zhang, L., Choi, K.P.: Quick, practical selection of effective seeds for homology search. J. Comput. Biol. 12, 137–1152 (2005)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proc. of RECOMB 2004, pp. 76–85. ACM Press, New York (2004)
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Yang, I.-H., Wang, S.-H., Chen, H.-H., Huang, P.-H., Chao, K.-M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proc. of IEEE 4th Symp. on Bioinformatics and Bioengineering, Taiwan, pp. 411–418. IEEE Computer Society Press, Los Alamitos (2004)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ilie, L., Ilie, S. (2007). Fast Computation of Good Multiple Spaced Seeds. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)