Skip to main content

Efficient Seeding Techniques for Protein Similarity Search

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Abstract

We apply the concept of subset seeds proposed in ? to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. JBCB 4(2), 553–570 (2006)

    Google Scholar 

  2. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990)

    Google Scholar 

  3. Altschul, S., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  4. Brown, D.: Optimizing multiple seed for protein homology search. IEEE/ACM TCBB 2(1), 29–38 (2004) (earlier version in WABI 2004)

    Google Scholar 

  5. Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  Google Scholar 

  6. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. JBCB 2(3), 417–439 (2004) (earlier version in GIW 2003)

    Google Scholar 

  7. Brejova, B., Brown, D., Vinar, T.: Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences 70(3), 364–380 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  8. Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acid Res. 33, W540–W543 (2005)

    Article  Google Scholar 

  9. Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22(14), e341–e349 (2006)

    Article  Google Scholar 

  10. Csürös, M., Ma, B.: Rapid homology search with neighbor seeds. Algorithmica 48(2), 187–202 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  11. Zhou, L., Stanton, J., Florea, L.: Universal seeds for cDNA-to-genome comparison. BMC Bioinformatics 9(36) (2008)

    Google Scholar 

  12. Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. In: RECOMB, pp. 76–84 (2004)

    Google Scholar 

  13. Kucherov, G., Noé, L., Roytberg, M.: Multi-seed lossless filtration. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 297–310. Springer, Heidelberg (2004)

    Google Scholar 

  14. Yang, I.H., et al.: Efficient methods for generating optimal single and multiple spaced seeds. In: IEEE BIBE, pp. 411–416 (2004)

    Google Scholar 

  15. Xu, J., Brown, D., Li, M., Ma, B.: Optimizing multiple spaced seeds for homology search. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 47–58. Springer, Heidelberg (2004)

    Google Scholar 

  16. Kisman, D., Li, M., Ma, B., Wang, L.: tPatternHunter: gapped, fast and sensitive translated homology search. Bioinformatics 21(4), 542–544 (2005)

    Article  Google Scholar 

  17. Peterlongo, P., et al.: Protein similarity search with subset seeds on a dedicated reconfigurable hardware. In: PBC. LNCS, vol. 4967 (2007)

    Google Scholar 

  18. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5(149) (2004)

    Google Scholar 

  19. Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138(3), 253–263 (2004) (earlier version in 2002)

    Article  MATH  MathSciNet  Google Scholar 

  20. Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Journal of Protein Engineering 16, 323–330 (2003)

    Article  Google Scholar 

  21. Murphy, L., Wallqvist, A., Levy, R.: Simplified amino acid alphabets for protein fold recognition and implications for folding. J. of Prot. Eng. 13, 149–152 (2000)

    Article  Google Scholar 

  22. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10915–10919 (1992)

    Article  Google Scholar 

  23. Henikoff, S., Henikoff, J.: Automated assembly of protein blocks for database searching. Nucleic Acids Res. 19(23), 6565–6572 (1991)

    Article  Google Scholar 

  24. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: RECOMB, pp. 67–75 (2003)

    Google Scholar 

  25. Ilie, L., Ilie, S.: Long spaced seeds for finding similarities between biological sequences. In: BIOCOMP, pp. 3–8 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Roytberg, M. et al. (2008). Efficient Seeding Techniques for Protein Similarity Search. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70600-7_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70598-7

  • Online ISBN: 978-3-540-70600-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics