Skip to main content

Searching Protein 3-D Structures in Linear Time

  • Conference paper
Research in Computational Molecular Biology (RECOMB 2009)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5541))

Abstract

Finding similar structures from 3-D structure databases of proteins is becoming more and more important issue in the post-genomic molecular biology. To compare 3-D structures of two molecules, biologists mostly use the RMSD (root mean square deviation) as the similarity measure. We propose new theoretically and practically fast algorithms for the fundamental problem of finding all the substructures of structures in a structure database of chain molecules (such as proteins), whose RMSDs to the query are within a given constant threshold. We first propose a breakthrough linear-expected-time algorithm for the problem, while the previous best-known time complexity was O(Nlogm), where N is the database size and m is the query size. For the expected time analysis, we propose to use the random-walk model (or the ideal chain model) as the model of average protein structures. We furthermore propose a series of preprocessing algorithms that enable faster queries. We checked the performance of our linear-expected-time algorithm through computational experiments over the whole PDB database. According to the experiments, our algorithm is 3.6 to 28 times faster than previously known algorithms for ordinary queries. Moreover, the experimental results support the validity of our theoretical analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Machine Intell. 9, 698–700 (1987)

    Article  CAS  Google Scholar 

  2. Aung, Z., Tan, K.-L.: Rapid retrieval of protein structures from databases. Drug Discovery Today 12, 732–739 (2007)

    Article  CAS  PubMed  Google Scholar 

  3. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucl. Acids Res. 28, 235–242 (2000)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Boyd, R.H., Phillips, P.J.: The Science of Polymer Molecules: An Introduction Concerning the Synthesis. In: Structure and Properties of the Individual Molecules That Constitute Polymeric Materials. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  5. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Article  Google Scholar 

  6. de Gennes, P.-G.: Scaling Concepts in Polymer Physics. Cornell University Press (1979)

    Google Scholar 

  7. Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Machine Vision and Applications 9, 272–290 (1997)

    Article  Google Scholar 

  8. Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure comparison and structure patterns. J. Comput. Biol. 7(5), 685–716 (2000)

    Article  CAS  PubMed  Google Scholar 

  9. Flory, P.J.: Statistical Mechanics of Chain Molecules. Interscience, New York (1969)

    Google Scholar 

  10. Gerstein, M.: Integrative database analysis in structural genomics. Nat. Struct. Biol., 960–963 (2000)

    Google Scholar 

  11. Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)

    Google Scholar 

  12. Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Cryst. A32, 922–923 (1976)

    Article  Google Scholar 

  13. Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Cryst. A34, 827–828 (1978)

    Article  Google Scholar 

  14. Kallenberg, O.: Foundations of Modern Probability. Springer, Heidelberg (1997)

    Google Scholar 

  15. Kramers, H.A.: The behavior of macromolecules in inhomogeneous flow. J. Chem. Phys. 14(7), 415–424 (1946)

    Article  CAS  Google Scholar 

  16. Schwartz, J.T., Sharir, M.: Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Intl. J. of Robotics Res. 6, 29–44 (1987)

    Article  Google Scholar 

  17. Shibuya, T.: Geometric suffix tree: a new index structure for protein 3-D structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 84–93. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Shibuya, T.: Prefix-shuffled geometric suffix tree. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 300–309. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  19. Shibuya, T.: Efficient substructure RMSD query algorithms. J. Comput. Biol. 14(9), 1201–1207 (2007)

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shibuya, T. (2009). Searching Protein 3-D Structures in Linear Time. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02008-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02007-0

  • Online ISBN: 978-3-642-02008-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics