Fast Computation of Good Multiple Spaced Seeds

Ilie, Lucian; Ilie, Silvana

doi:10.1007/978-3-540-74126-8_32

Lucian Ilie¹ &
Silvana Ilie²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4645))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1067 Accesses
3 Citations

Abstract

Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. A significant fraction of computing power in the world is dedicated to performing such tasks. The introduction of optimal spaced seeds by Ma et al. has increased both the sensitivity and the speed of homology search and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, the sensitivity of dynamic programming is approached at BLASTn speed. Whereas computing optimal multiple spaced seeds was proved to be NP-hard, we show that, from practical point of view, computing good ones can be very efficient. We give a simple heuristic algorithm which computes good multiple seeds in polynomial time. Computing sensitivity is not required. When allowing the computation of the sensitivity for few seeds, we obtain better multiple seeds than previous ones in much shorter time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and Psi-Blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Brejova, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. J. Bioinf. and Comput. Biol. 1, 595–610 (2004)
Article Google Scholar
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proc. of RECOMB 2003, pp. 67–75. ACM Press, New York (2003)
Chapter Google Scholar
Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 73–85. Springer, Heidelberg (2001)
Google Scholar
Choi, K.P., Zhang, L.: Sensitivity analysis and efficient method for identifying optimal spaced seeds. J. Comput. Sys. Sci. 68, 22–40 (2004)
Article MATH MathSciNet Google Scholar
Choi, K.P., Zeng, F., Zhang, L.: Good Spaced Seeds for Homology Search. Bioinformatics 20, 1053–1059 (2004)
Article Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Univ. Press, Baltimore (1996)
MATH Google Scholar
Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: Proc. of BIOCOMP 2007 (to appear)
Google Scholar
Karp, R., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Develop. 31, 249–260 (1987)
Article MATH MathSciNet Google Scholar
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Appl. Math. 3, 253–263 (2004)
Article MathSciNet Google Scholar
Kisman, D., Li, M., Ma, B., Wang, L.: tPatternHunter: Gapped, fast and sensitive translated homology search. Bioinformatics 21, 542–544 (2005)
Article Google Scholar
Kong, Y.: Generalized correlation functions and their applications in selection of optimal multiple spaced seeds for homology search. J. Comput. Biol. (to appear)
Google Scholar
Kucherov, G., Noe, L., Ponty, Y.: Estimating seed sensitivity on homogeneous alignments. In: Proc. of BIBE 2004, Taiwan, pp. 387–394 (2004)
Google Scholar
Li, M.: personal communication
Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: Pattern-HunterII: highly sensitive and fast homology search. J. Bioinformatics and Comput. Biol. 2, 417–440 (2004)
Article Google Scholar
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)
Article Google Scholar
Li, M., Ma, B., Zhang, L.: Superiority and complexity of spaced seeds. In: Proc. of SODA 2006. SIAM, pp. 444–453 (2006)
Google Scholar
Ma, B.: personal communication
Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)
Article Google Scholar
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Article Google Scholar
Ning, Z., Cox, A.J., Mullikin, J.C.: SSAHA: A fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001)
Article Google Scholar
Noé, L., Kucherov, G.: Yass: enhancing the sensitivity of DNA similarity search. Nucleic Acids Res. 33, 540–543 (2005)
Article Google Scholar
Pevzner, P., Waterman, M.S.: Multiple filtration and approximate pattern matching. Algorithmica 13, 135–154 (1995)
Article MATH MathSciNet Google Scholar
Preparata, F.P., Zhang, L., Choi, K.P.: Quick, practical selection of effective seeds for homology search. J. Comput. Biol. 12, 137–1152 (2005)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Article Google Scholar
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: Proc. of RECOMB 2004, pp. 76–85. ACM Press, New York (2004)
Chapter Google Scholar
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Google Scholar
Yang, I.-H., Wang, S.-H., Chen, H.-H., Huang, P.-H., Chao, K.-M.: Efficient methods for generating optimal single and multiple spaced seeds. In: Proc. of IEEE 4th Symp. on Bioinformatics and Bioengineering, Taiwan, pp. 411–418. IEEE Computer Society Press, Los Alamitos (2004)
Chapter Google Scholar
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Western Ontario, N6A 5B7, London, Ontario, Canada
Lucian Ilie
Numerical Analysis, Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden
Silvana Ilie

Authors

Lucian Ilie
View author publications
You can also search for this author in PubMed Google Scholar
Silvana Ilie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Raffaele Giancarlo Sridhar Hannenhalli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ilie, L., Ilie, S. (2007). Fast Computation of Good Multiple Spaced Seeds. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-74126-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics