Abstract
Finding similarities in protein sequences is a core problem in bioinformatics. It represents the first step in the functional characterization of novel protein sequences, and is also employed in protein evolution studies and for predicting biological structure. In this paper, we propose Proteinus, a new index aimed at similarity search of protein sequences. Proteinus is characterized by using a reduced amino acid alphabet to represent protein sequences and also by providing a persistent storage of the index on disk, as well as by allowing the execution of range queries. Performance tests with real-world protein sequences showed that the Proteinus index was very efficient. Compared with the BLASTP tool, Proteinus provided an impressive performance gain from 45% up to 93% for range query processing.
Chapter PDF
References
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)
Barsky, M., Stege, U., Thomo, A.: A survey of practical algorithms for suffix tree construction in external memory. Softw. Pract. Exper. 40(11), 965–988 (2010)
Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: a practical guide to the analysis of genes and proteins. Wiley-Interscience, Hoboken (2005)
Ciferri, R.R., Ciferri, C.D.A., Carélo, C.C.M., Traina Jr., C.: nsP-index: a robust and persistent index for nucleotide sequences. In: ADBIS, pp. 28–41 (2008)
Kahveci, T., Singh, A.K.: Efficient index structures for string databases. In: VLDB, pp. 351–360 (2001)
Korf, I., Yandell, M., Bedell, J.: BLAST. O’Reilly, Sebastopol (2003)
Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16(5), 323–330 (2003)
Novosad, T., Snásel, V., Abraham, A., Yang, J.Y.: Searching protein 3-D structures for optimal structure alignment using intelligent algorithms and data structures. IEEE TITB 14(6), 1378–1386 (2010)
Öztürk, O.: Feature Extraction and Similarity-based Analysis for Proteome and Genome Databases. Ph.D. thesis, The Ohio State University (2007)
Shibuya, T.: Geometric suffix tree: indexing protein 3-D structures. J. ACM 57(3), article 15 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
da Louza, F.A., Ciferri, R.R., de Aguiar Ciferri, C.D. (2011). Efficiently Querying Protein Sequences with the Proteinus Index. In: Norberto de Souza, O., Telles, G.P., Palakal, M. (eds) Advances in Bioinformatics and Computational Biology. BSB 2011. Lecture Notes in Computer Science(), vol 6832. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22825-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-22825-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22824-7
Online ISBN: 978-3-642-22825-4
eBook Packages: Computer ScienceComputer Science (R0)