Abstract
Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structures are highly desired for research on proteins. This paper proposes a new data structure for indexing protein 3-D structures. For strings, there are many efficient indexing structures such as suffix trees, but it has been considered very difficult to design such sophisticated data structures against 3-D structures like proteins. Our index structure is based on the suffix trees and is called the geometric suffix tree. By using the geometric suffix tree for a set of protein structures, we can search for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit-vector root mean square deviations) to a given query 3-D structure are not larger than a given bound. Though there are O(N 2) substructures, our data structure requires only O(N) space where N is the sum of lengths of the set of proteins. We propose an O(N 2) construction algorithm for it, while a naive algorithm would require O(N 3) time to construct it. Moreover we propose an efficient search algorithm. We also show computational experiments to demonstrate the practicality of our data structure. The experiments show that the construction time of the geometric suffix tree is practically almost linear to the size of the database, when applied to a protein structure database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akutsu, T., Onizuka, K., Ishikawa, M.: New hashing techniques and their application to a protein database system. In: Proc. Hawaii Int. Conf. System Sciences (HICSS-28), vol. 5, pp. 197–206 (1995)
Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans Pattern Anal. Machine Intell. 9, 698–700 (1987)
Aung, Z., Fu, W., Tan, K.: An efficient index-based protein structure database searching method. In: Proc. Intl. Conf. on Database Systems for Advanced Applications, pp. 311–318 (2003)
Berman, H.M., Westbrook, J., Feng, Z., et al.: The protein data bank. Nucl. Acids Res. 28, 235–242 (2000)
Çamoğlu, O., Kahveci, T., Singh, A.: Towards index-based similarity search for protein structure databases. In: IEEE Computer Society Bioinformatics Conference, pp. 148–158 (2003)
Can, T., Wang, Y.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: IEEE Computer Society Bioinformatics Conference, pp. 169–179 (2003)
Chew, L.P., Huttenlocher, D., Kedem, K., Kleinberg, J.: Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6(3), 313–325 (1999)
Choi, I., Kwon, J., Kim, S.: Local feature frequency profile: A method to measure structural similarity in proteins. Proc. Natl. Acad. Sci. 101(11), 3797–3802 (2004)
Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Machine Vision and Applications 9, 272–290 (1997)
Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure Comparison and Structure Patterns. J. Computational Biology 7(5), 685–716 (2000)
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symp. Foundations of Computer Science, pp. 137–143 (1997)
Gao, F., Zaki, M.J.: PSIST: Indexing Protein Structures using Suffix Trees. In: Proc. IEEE Computational Systems Bioinformatics Conference (CSB), pp. 212–222 (2005)
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
Kedem, K., Chew, P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins: Struct. Funct. Genet. 38, 1–12 (1999)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Schwartz, J.T., Sharir, M.: Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Intl. J. of Robotics Res. 6, 29–44 (1987)
Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching. Algorithmica 39(1), 1–19 (2004)
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shibuya, T. (2006). Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_9
Download citation
DOI: https://doi.org/10.1007/11780441_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)