Abstract
Finding similar structures from 3-D structure databases of proteins is becoming more and more important issue in the post-genomic molecular biology. To compare 3-D structures of two molecules, biologists mostly use the RMSD (root mean square deviation) as the similarity measure. We propose new theoretically and practically fast algorithms for the fundamental problem of finding all the substructures of structures in a structure database of chain molecules (such as proteins), whose RMSDs to the query are within a given constant threshold. We first propose a breakthrough linear-expected-time algorithm for the problem, while the previous best-known time complexity was O(Nlogm), where N is the database size and m is the query size. For the expected time analysis, we propose to use the random-walk model (or the ideal chain model) as the model of average protein structures. We furthermore propose a series of preprocessing algorithms that enable faster queries. We checked the performance of our linear-expected-time algorithm through computational experiments over the whole PDB database. According to the experiments, our algorithm is 3.6 to 28 times faster than previously known algorithms for ordinary queries. Moreover, the experimental results support the validity of our theoretical analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arun, K.S., Huang, T.S., Blostein, S.D.: Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Machine Intell. 9, 698–700 (1987)
Aung, Z., Tan, K.-L.: Rapid retrieval of protein structures from databases. Drug Discovery Today 12, 732–739 (2007)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucl. Acids Res. 28, 235–242 (2000)
Boyd, R.H., Phillips, P.J.: The Science of Polymer Molecules: An Introduction Concerning the Synthesis. In: Structure and Properties of the Individual Molecules That Constitute Polymeric Materials. Cambridge University Press, Cambridge (1996)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)
de Gennes, P.-G.: Scaling Concepts in Polymer Physics. Cornell University Press (1979)
Eggert, D.W., Lorusso, A., Fisher, R.B.: Estimating 3-D rigid body transformations: a comparison of four major algorithms. Machine Vision and Applications 9, 272–290 (1997)
Eidhammer, I., Jonassen, I., Taylor, W.R.: Structure comparison and structure patterns. J. Comput. Biol. 7(5), 685–716 (2000)
Flory, P.J.: Statistical Mechanics of Chain Molecules. Interscience, New York (1969)
Gerstein, M.: Integrative database analysis in structural genomics. Nat. Struct. Biol., 960–963 (2000)
Golub, G.H., Van Loan, C.F.: Matrix Computation, 3rd edn. John Hopkins University Press (1996)
Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Cryst. A32, 922–923 (1976)
Kabsch, W.: A discussion of the solution for the best rotation to relate two sets of vectors. Acta Cryst. A34, 827–828 (1978)
Kallenberg, O.: Foundations of Modern Probability. Springer, Heidelberg (1997)
Kramers, H.A.: The behavior of macromolecules in inhomogeneous flow. J. Chem. Phys. 14(7), 415–424 (1946)
Schwartz, J.T., Sharir, M.: Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Intl. J. of Robotics Res. 6, 29–44 (1987)
Shibuya, T.: Geometric suffix tree: a new index structure for protein 3-D structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 84–93. Springer, Heidelberg (2006)
Shibuya, T.: Prefix-shuffled geometric suffix tree. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 300–309. Springer, Heidelberg (2007)
Shibuya, T.: Efficient substructure RMSD query algorithms. J. Comput. Biol. 14(9), 1201–1207 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shibuya, T. (2009). Searching Protein 3-D Structures in Linear Time. In: Batzoglou, S. (eds) Research in Computational Molecular Biology. RECOMB 2009. Lecture Notes in Computer Science(), vol 5541. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02008-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-02008-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02007-0
Online ISBN: 978-3-642-02008-7
eBook Packages: Computer ScienceComputer Science (R0)