Efficient Seeding Techniques for Protein Similarity Search

  • Mikhail Roytberg
  • Anna Gambin
  • Laurent Noé
  • Sławomir Lasota
  • Eugenia Furletova
  • Ewa Szczurek
  • Gregory Kucherov
Conference paper

DOI: 10.1007/978-3-540-70600-7_36

Part of the Communications in Computer and Information Science book series (CCIS, volume 13)
Cite this paper as:
Roytberg M. et al. (2008) Efficient Seeding Techniques for Protein Similarity Search. In: Elloumi M., Küng J., Linial M., Murphy R.F., Schneider K., Toma C. (eds) Bioinformatics Research and Development. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg

Abstract

We apply the concept of subset seeds proposed in ? to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Mikhail Roytberg
    • 1
  • Anna Gambin
    • 2
  • Laurent Noé
    • 3
  • Sławomir Lasota
    • 2
  • Eugenia Furletova
    • 1
  • Ewa Szczurek
    • 4
  • Gregory Kucherov
    • 3
  1. 1.Institute of Mathematical Problems in Biology, PushchinoMoscow RegionRussia
  2. 2.Institute of InformaticsWarsaw UniversityPoland
  3. 3.LIFL/CNRS/INRIAVilleneuve d’Ascq CédexFrance
  4. 4.Max Planck Institute for Molecular Genetics, Computational Molecular BiologyBerlinGermany

Personalised recommendations