Efficient Seeding Techniques for Protein Similarity Search

Roytberg, Mikhail; Gambin, Anna; Noé, Laurent; Lasota, Sławomir; Furletova, Eugenia; Szczurek, Ewa; Kucherov, Gregory

doi:10.1007/978-3-540-70600-7_36

Efficient Seeding Techniques for Protein Similarity Search

Mikhail Roytberg¹,
Anna Gambin²,
Laurent Noé³,
Sławomir Lasota²,
Eugenia Furletova¹,
Ewa Szczurek⁴ &
…
Gregory Kucherov³

Conference paper

718 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Abstract

We apply the concept of subset seeds proposed in ? to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. JBCB 4(2), 553–570 (2006)
Google Scholar
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990)
Google Scholar
Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Article Google Scholar
Brown, D.: Optimizing multiple seed for protein homology search. IEEE/ACM TCBB 2(1), 29–38 (2004) (earlier version in WABI 2004)
Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. JBCB 2(3), 417–439 (2004) (earlier version in GIW 2003)
Google Scholar
Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005)
Article MATH MathSciNet Google Scholar
Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acid Res. 33, W540–W543 (2005)
Article Google Scholar
Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)
Article Google Scholar
Csürös, M., Ma, B.: Rapid homology search with neighbor seeds. Algorithmica 48(2), 187–202 (2007)
Article MATH MathSciNet Google Scholar
Zhou, L., Stanton, J., Florea, L.: Universal seeds for cDNA-to-genome comparison. BMC Bioinformatics 9(36) (2008)
Google Scholar
Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: RECOMB, pp. 76–84 (2004)
Google Scholar
Kucherov, G., Noé, L., Roytberg, M.: Multi-seed lossless filtration. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 297–310. Springer, Heidelberg (2004)
Google Scholar
Yang, I.H., et al.: Efficient methods for generating optimal single and multiple spaced seeds. In: IEEE BIBE, pp. 411–416 (2004)
Google Scholar
Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)
Google Scholar
Kisman, D., Li, M., Ma, B., Wang, L.: tPatternHunter: gapped, fast and sensitive translated homology search. Bioinformatics 21(4), 542–544 (2005)
Article Google Scholar
Peterlongo, P., et al.: Protein similarity search with subset seeds on a dedicated reconfigurable hardware. In: PBC. LNCS, vol. 4967 (2007)
Google Scholar
Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5(149) (2004)
Google Scholar
Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138(3), 253–263 (2004) (earlier version in 2002)
Article MATH MathSciNet Google Scholar
Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Journal of Protein Engineering 16, 323–330 (2003)
Article Google Scholar
Murphy, L., Wallqvist, A., Levy, R.: Simplified amino acid alphabets for protein fold recognition and implications for folding. J. of Prot. Eng. 13, 149–152 (2000)
Article Google Scholar
Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)
Article Google Scholar
Henikoff, S., Henikoff, J.: Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19(23), 6565–6572 (1991)
Article Google Scholar
Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: RECOMB, pp. 67–75 (2003)
Google Scholar
Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: BIOCOMP, pp. 3–8 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Mathematical Problems in Biology, Pushchino, Moscow Region, 142290, Russia
Mikhail Roytberg & Eugenia Furletova
Institute of Informatics, Warsaw University, Banacha 2, 02-097, Poland
Anna Gambin & Sławomir Lasota
LIFL/CNRS/INRIA, Bât. M3, Campus Scientifique, 59655, Villeneuve d’Ascq Cédex, France
Laurent Noé & Gregory Kucherov
Max Planck Institute for Molecular Genetics, Computational Molecular Biology, Ihnestr. 73, 14195, Berlin, Germany
Ewa Szczurek

Authors

Mikhail Roytberg
View author publications
You can also search for this author in PubMed Google Scholar
Anna Gambin
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Noé
View author publications
You can also search for this author in PubMed Google Scholar
Sławomir Lasota
View author publications
You can also search for this author in PubMed Google Scholar
Eugenia Furletova
View author publications
You can also search for this author in PubMed Google Scholar
Ewa Szczurek
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Kucherov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roytberg, M. et al. (2008). Efficient Seeding Techniques for Protein Similarity Search. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_36

Download citation

DOI: https://doi.org/10.1007/978-3-540-70600-7_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics