Skip to main content

Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures

  • Conference paper
Book cover Combinatorial Pattern Matching (CPM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4009))

Included in the following conference series:

Abstract

Protein structure analysis is one of the most important research issues in the post-genomic era, and faster and more accurate query data structures for such 3-D structures are highly desired for research on proteins. This paper proposes a new data structure for indexing protein 3-D structures. For strings, there are many efficient indexing structures such as suffix trees, but it has been considered very difficult to design such sophisticated data structures against 3-D structures like proteins. Our index structure is based on the suffix trees and is called the geometric suffix tree. By using the geometric suffix tree for a set of protein structures, we can search for all of their substructures whose RMSDs (root mean square deviations) or URMSDs (unit-vector root mean square deviations) to a given query 3-D structure are not larger than a given bound. Though there are O(N 2) substructures, our data structure requires only O(N) space where N is the sum of lengths of the set of proteins. We propose an O(N 2) construction algorithm for it, while a naive algorithm would require O(N 3) time to construct it. Moreover we propose an efficient search algorithm. We also show computational experiments to demonstrate the practicality of our data structure. The experiments show that the construction time of the geometric suffix tree is practically almost linear to the size of the database, when applied to a protein structure database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akutsu, T., Onizuka, K., Ishikawa, M.: New hashing techniques and their application to a protein database system. In: Proc. Hawaii Int. Conf. System Sciences (HICSS-28), vol. 5, pp. 197–206 (1995)

    Google Scholar 

  2. Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans Pattern Anal. Machine Intell. 9, 698–700 (1987)

    Article  Google Scholar 

  3. Aung, Z., Fu, W., Tan, K.: An efficient index-based protein structure database searching method. In: Proc. Intl. Conf. on Database Systems for Advanced Applications, pp. 311–318 (2003)

    Google Scholar 

  4. Berman, H.M., Westbrook, J., Feng, Z., et al.: The protein data bank. Nucl. Acids Res. 28, 235–242 (2000)

    Article  Google Scholar 

  5. Çamoğlu, O., Kahveci, T., Singh, A.: Towards index-based similarity search for protein structure databases. In: IEEE Computer Society Bioinformatics Conference, pp. 148–158 (2003)

    Google Scholar 

  6. Can, T., Wang, Y.: CTSS: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: IEEE Computer Society Bioinformatics Conference, pp. 169–179 (2003)

    Google Scholar 

  7. Chew, L.P., Huttenlocher, D., Kedem, K., Kleinberg, J.: Fast detection of common geometric substructure in proteins. J. Comput. Biol. 6(3), 313–325 (1999)

    Article  Google Scholar 

  8. Choi, I., Kwon, J., Kim, S.: Local feature frequency profile: A method to measure structural similarity in proteins. Proc. Natl. Acad. Sci. 101(11), 3797–3802 (2004)

    Article  Google Scholar 

  9. Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Machine Vision and Applications 9, 272–290 (1997)

    Article  Google Scholar 

  10. Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure Comparison and Structure Patterns. J. Computational Biology 7(5), 685–716 (2000)

    Article  Google Scholar 

  11. Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symp. Foundations of Computer Science, pp. 137–143 (1997)

    Google Scholar 

  12. Gao, F., Zaki, M.J.: PSIST: Indexing Protein Structures using Suffix Trees. In: Proc. IEEE Computational Systems Bioinformatics Conference (CSB), pp. 212–222 (2005)

    Google Scholar 

  13. Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)

    Google Scholar 

  14. Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  15. Kedem, K., Chew, P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins: Struct. Funct. Genet. 38, 1–12 (1999)

    Google Scholar 

  16. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  17. Schwartz, J.T., Sharir, M.: Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Intl. J. of Robotics Res. 6, 29–44 (1987)

    Article  Google Scholar 

  18. Shibuya, T.: Generalization of a suffix tree for RNA structural pattern matching. Algorithmica 39(1), 1–19 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  20. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Symposium on Switching and Automata Theory, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shibuya, T. (2006). Geometric Suffix Tree: A New Index Structure for Protein 3-D Structures. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_9

Download citation

  • DOI: https://doi.org/10.1007/11780441_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35455-0

  • Online ISBN: 978-3-540-35461-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics