Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures

  • Tetsuo Shibuya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)


Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structures are highly desired for research on proteins. This paper proposes a new data structure for indexing protein 3-D structures. For strings, there are many efficient indexing structures such as suffix trees, but it has been considered very difficult to design such sophisticated data structures against 3-D structures like proteins. Our index structure is based on the suffix trees and is called the geometric suffix tree. By using the geometric suffix tree for a set of protein structures, we can search for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit-vector root mean square deviations) to a given query 3-D structure are not larger than a given bound. Though there are O(N 2) substructures, our data structure requires only O(N) space where N is the sum of lengths of the set of proteins. We propose an O(N 2) construction algorithm for it, while a naive algorithm would require O(N 3) time to construct it. Moreover we propose an efficient search algorithm. We also show computational experiments to demonstrate the practicality of our data structure. The experiments show that the construction time of the geometric suffix tree is practically almost linear to the size of the database, when applied to a protein structure database.


Singular Value Decomposition Index Structure Outgoing Edge Construction Algorithm Suffix Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akutsu, T., Onizuka, K., Ishikawa, M.: New hashing techniques and their application to a protein database system. In: Proc. Hawaii Int. Conf. System Sciences (HICSS-28), vol. 5, pp. 197–206 (1995)Google Scholar
  2. 2.
    Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans Pattern Anal. Machine Intell. 9, 698–700 (1987)CrossRefGoogle Scholar
  3. 3.
    Aung, Z., Fu, W., Tan, K.: An efficient index-based protein structure database searching method. In: Proc. Intl. Conf. on Database Systems for Advanced Applications, pp. 311–318 (2003)Google Scholar
  4. 4.
    Berman, H.M., Westbrook, J., Feng, Z., et al.: The protein data bank. Nucl. Acids Res. 28, 235–242 (2000)CrossRefGoogle Scholar
  5. 5.
    Çamoğlu, O., Kahveci, T., Singh, A.: Towards index-based similarity search for protein structure databases. In: IEEE Computer Society Bioinformatics Conference, pp. 148–158 (2003)Google Scholar
  6. 6.
    Can, T., Wang, Y.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: IEEE Computer Society Bioinformatics Conference, pp. 169–179 (2003)Google Scholar
  7. 7.
    Chew, L.P., Huttenlocher, D., Kedem, K., Kleinberg, J.: Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6(3), 313–325 (1999)CrossRefGoogle Scholar
  8. 8.
    Choi, I., Kwon, J., Kim, S.: Local feature frequency profile: A method to measure structural similarity in proteins. Proc. Natl. Acad. Sci. 101(11), 3797–3802 (2004)CrossRefGoogle Scholar
  9. 9.
    Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Machine Vision and Applications 9, 272–290 (1997)CrossRefGoogle Scholar
  10. 10.
    Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure Comparison and Structure Patterns. J. Computational Biology 7(5), 685–716 (2000)CrossRefGoogle Scholar
  11. 11.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symp. Foundations of Computer Science, pp. 137–143 (1997)Google Scholar
  12. 12.
    Gao, F., Zaki, M.J.: PSIST: Indexing Protein Structures using Suffix Trees. In: Proc. IEEE Computational Systems Bioinformatics Conference (CSB), pp. 212–222 (2005)Google Scholar
  13. 13.
    Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)Google Scholar
  14. 14.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
  15. 15.
    Kedem, K., Chew, P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins: Struct. Funct. Genet. 38, 1–12 (1999)Google Scholar
  16. 16.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)CrossRefMathSciNetzbMATHGoogle Scholar
  17. 17.
    Schwartz, J.T., Sharir, M.: Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Intl. J. of Robotics Res. 6, 29–44 (1987)CrossRefGoogle Scholar
  18. 18.
    Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching. Algorithmica 39(1), 1–19 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  19. 19.
    Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)CrossRefMathSciNetzbMATHGoogle Scholar
  20. 20.
    Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tetsuo Shibuya
    • 1
  1. 1.Human Genome Center, Institute of Medical ScienceUniversity of TokyoMinato-ku, TokyoJapan

Personalised recommendations