ISBRA 2017: Bioinformatics Research and Applications pp 108-119 | Cite as
Construction of Protein Backbone Fragments Libraries on Large Protein Sets Using a Randomized Spectral Clustering Algorithm
Abstract
The protein fragment libraries play an important role in a wide variety of structural biology applications. In this work, we present the use of a spectral clustering algorithm to analyze the fixed-length protein backbone fragment sets derived from the continuously growing Protein Data Bank (PDB) to construct libraries of protein fragments. Incorporating the rank-revealing randomized singular value decomposition algorithm into spectral clustering to fast approximate the dominant eigenvectors of the fragment affinity matrix enables the clustering algorithm to handle large-scale fragment sample sets. Compared to the popularly used protein fragment libraries developed by Kolodny et al., the fragments in our new libraries exhibit better representability across diverse protein structures in PDB. Moreover, using much larger fragment sample sets, libraries of longer fragments with length up to 20 residues are also generated. Our fragment libraries can be found at http://hpcr.cs.odu.edu/frag/.
Notes
Acknowledgements
Y. Li acknowledges support from National Science Foundation through Grant No. CCF-1066471. W. Elhefnawy acknowledges support from Old Dominion University Modeling and Simulation Fellowship.
References
- 1.Munoz, V., Serrano, L.: Local versus nonlocal interactions in protein folding and stability – an experimentalist’s point of view. Fold. Des. 1(4), R71–R77 (1996)CrossRefGoogle Scholar
- 2.Chikenji, G., Fujitsuka, Y., Takada, S.: Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc. Natl. Acad. Sci. 103(9), 3141–3146 (2006)CrossRefGoogle Scholar
- 3.Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian Scoring functions. J. Mol. Biol. 268, 209–225 (1997)CrossRefGoogle Scholar
- 4.de Oliveira, S.H.P., Shi, J., Deane, C.M.: Building a better fragment library for de novo protein structure prediction. PLoS ONE 10(4), e0123998 (2015)CrossRefGoogle Scholar
- 5.Rata, I., Li, Y., Jakobsson, E.: Backbone Statistical Potential from Local Sequence-Structure Interactions in Protein Loops. J. Phys. Chem. B 114(5), 1859–1869 (2010)CrossRefGoogle Scholar
- 6.Li, Y., Rata, I., Jakobsson, E.: Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J. Chem. Inf. Model. 51(7), 1656–1666 (2011)CrossRefGoogle Scholar
- 7.Li, Y.: Conformational sampling in template-free protein loop structure modeling: an overview. Comput. Struct. Biotechnol. J. 5(6), e201302003 (2013)CrossRefGoogle Scholar
- 8.Di Maio, F., Shavlik, J., Phillips, G.: A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22(14), 81–89 (2006)CrossRefGoogle Scholar
- 9.Terwiliger, T.C.: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr. D Biol. Crystallogr. 59(1), 38–44 (2003)CrossRefGoogle Scholar
- 10.Budowski-Tal, I., Nov, Y., Kolodny, R.: FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. 107, 3481–3486 (2010)CrossRefGoogle Scholar
- 11.Keasar, C., Kolodny, R.: Using protein fragments for searching and data-mining protein databases. In: Proceedings of AAAI workshop of Artificial Intelligence and Robotics Methods in Computational Biology (2013)Google Scholar
- 12.Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small Libraries of Protein Fragments Model Native Protein Structures Accurately. J. Mol. Biol. 323, 297–307 (2005)CrossRefGoogle Scholar
- 13.Denise, C.: Structural GENOMICS exploring the 3D protein landscape. Simbios (2010)Google Scholar
- 14.Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
- 15.Wang, G.L., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar
- 16.von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
- 17.Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)Google Scholar
- 18.Ji, H., Weinberg, S., Li, Y.: A revisit of block power methods for finite state markov chain applications. arXiv:1610.08881 (2016)
- 19.Ji, H., Yu, W., Li, Y.: A rank revealing randomized singular value decomposition (R3SVD) algorithm for low-rank matrix approximations. arXiv:1605.08134 (2016)
- 20.Gu, Y., Yu, W., Li, Y.: Efficient randomized algorithms for adaptive low-rank factorizations of large matrices. arXiv:1606.09402 (2016)
- 21.Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2009)MathSciNetCrossRefMATHGoogle Scholar
- 22.Chiang, Y.S., Gelfand, T.I., Kister, A.E., Gelfand, I.M.: New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 68(4), 915–921 (2007)CrossRefGoogle Scholar
- 23.Elhefnawy, W., Chen, L., Han, Y., Li, Y.: ICOSA: a distance-dependent, orientation-specific coarse-grain contact potential for protein structure modeling. J. Mol. Biol. 427(15), 2562–2576 (2015)CrossRefGoogle Scholar
- 24.Li, Y., Liu, H., Rata, I., Jakobsson, E.: Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. J. Chem. Inf. Model. 53(2), 500–508 (2013)CrossRefGoogle Scholar