Skip to main content

Efficiently Querying Protein Sequences with the Proteinus Index

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2011)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 6832))

Included in the following conference series:

  • 523 Accesses

Abstract

Finding similarities in protein sequences is a core problem in bioinformatics. It represents the first step in the functional characterization of novel protein sequences, and is also employed in protein evolution studies and for predicting biological structure. In this paper, we propose Proteinus, a new index aimed at similarity search of protein sequences. Proteinus is characterized by using a reduced amino acid alphabet to represent protein sequences and also by providing a persistent storage of the index on disk, as well as by allowing the execution of range queries. Performance tests with real-world protein sequences showed that the Proteinus index was very efficient. Compared with the BLASTP tool, Proteinus provided an impressive performance gain from 45% up to 93% for range query processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)

    Article  Google Scholar 

  2. Barsky, M., Stege, U., Thomo, A.: A survey of practical algorithms for suffix tree construction in external memory. Softw. Pract. Exper. 40(11), 965–988 (2010)

    Article  Google Scholar 

  3. Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: a practical guide to the analysis of genes and proteins. Wiley-Interscience, Hoboken (2005)

    Google Scholar 

  4. Ciferri, R.R., Ciferri, C.D.A., Carélo, C.C.M., Traina Jr., C.: nsP-index: a robust and persistent index for nucleotide sequences. In: ADBIS, pp. 28–41 (2008)

    Google Scholar 

  5. Kahveci, T., Singh, A.K.: Efficient index structures for string databases. In: VLDB, pp. 351–360 (2001)

    Google Scholar 

  6. Korf, I., Yandell, M., Bedell, J.: BLAST. O’Reilly, Sebastopol (2003)

    Google Scholar 

  7. Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16(5), 323–330 (2003)

    Article  Google Scholar 

  8. Novosad, T., Snásel, V., Abraham, A., Yang, J.Y.: Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures. IEEE TITB 14(6), 1378–1386 (2010)

    Google Scholar 

  9. Öztürk, O.: Feature Extraction and Similarity-based Analysis for Proteome and Genome Databases. Ph.D. thesis, The Ohio State University (2007)

    Google Scholar 

  10. Shibuya, T.: Geometric suffix tree: indexing protein 3-D structures. J. ACM 57(3), article 15 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

da Louza, F.A., Ciferri, R.R., de Aguiar Ciferri, C.D. (2011). Efficiently Querying Protein Sequences with the Proteinus Index. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2011. Lecture Notes in Computer Science(), vol 6832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22825-4_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22825-4_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22824-7

  • Online ISBN: 978-3-642-22825-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics