Construction of Protein Backbone Fragments Libraries on Large Protein Sets Using a Randomized Spectral Clustering Algorithm

  • Wessam Elhefnawy
  • Min Li
  • Jianxin Wang
  • Yaohang Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10330)

Abstract

The protein fragment libraries play an important role in a wide variety of structural biology applications. In this work, we present the use of a spectral clustering algorithm to analyze the fixed-length protein backbone fragment sets derived from the continuously growing Protein Data Bank (PDB) to construct libraries of protein fragments. Incorporating the rank-revealing randomized singular value decomposition algorithm into spectral clustering to fast approximate the dominant eigenvectors of the fragment affinity matrix enables the clustering algorithm to handle large-scale fragment sample sets. Compared to the popularly used protein fragment libraries developed by Kolodny et al., the fragments in our new libraries exhibit better representability across diverse protein structures in PDB. Moreover, using much larger fragment sample sets, libraries of longer fragments with length up to 20 residues are also generated. Our fragment libraries can be found at http://hpcr.cs.odu.edu/frag/.

Notes

Acknowledgements

Y. Li acknowledges support from National Science Foundation through Grant No. CCF-1066471. W. Elhefnawy acknowledges support from Old Dominion University Modeling and Simulation Fellowship.

References

  1. 1.
    Munoz, V., Serrano, L.: Local versus nonlocal interactions in protein folding and stability – an experimentalist’s point of view. Fold. Des. 1(4), R71–R77 (1996)CrossRefGoogle Scholar
  2. 2.
    Chikenji, G., Fujitsuka, Y., Takada, S.: Shaping up the protein folding funnel by local interaction: lesson from a structure prediction study. Proc. Natl. Acad. Sci. 103(9), 3141–3146 (2006)CrossRefGoogle Scholar
  3. 3.
    Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian Scoring functions. J. Mol. Biol. 268, 209–225 (1997)CrossRefGoogle Scholar
  4. 4.
    de Oliveira, S.H.P., Shi, J., Deane, C.M.: Building a better fragment library for de novo protein structure prediction. PLoS ONE 10(4), e0123998 (2015)CrossRefGoogle Scholar
  5. 5.
    Rata, I., Li, Y., Jakobsson, E.: Backbone Statistical Potential from Local Sequence-Structure Interactions in Protein Loops. J. Phys. Chem. B 114(5), 1859–1869 (2010)CrossRefGoogle Scholar
  6. 6.
    Li, Y., Rata, I., Jakobsson, E.: Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J. Chem. Inf. Model. 51(7), 1656–1666 (2011)CrossRefGoogle Scholar
  7. 7.
    Li, Y.: Conformational sampling in template-free protein loop structure modeling: an overview. Comput. Struct. Biotechnol. J. 5(6), e201302003 (2013)CrossRefGoogle Scholar
  8. 8.
    Di Maio, F., Shavlik, J., Phillips, G.: A probabilistic approach to protein backbone tracing in electron density maps. Bioinformatics 22(14), 81–89 (2006)CrossRefGoogle Scholar
  9. 9.
    Terwiliger, T.C.: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr. D Biol. Crystallogr. 59(1), 38–44 (2003)CrossRefGoogle Scholar
  10. 10.
    Budowski-Tal, I., Nov, Y., Kolodny, R.: FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc. Natl. Acad. Sci. 107, 3481–3486 (2010)CrossRefGoogle Scholar
  11. 11.
    Keasar, C., Kolodny, R.: Using protein fragments for searching and data-mining protein databases. In: Proceedings of AAAI workshop of Artificial Intelligence and Robotics Methods in Computational Biology (2013)Google Scholar
  12. 12.
    Kolodny, R., Koehl, P., Guibas, L., Levitt, M.: Small Libraries of Protein Fragments Model Native Protein Structures Accurately. J. Mol. Biol. 323, 297–307 (2005)CrossRefGoogle Scholar
  13. 13.
    Denise, C.: Structural GENOMICS exploring the 3D protein landscape. Simbios (2010)Google Scholar
  14. 14.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
  15. 15.
    Wang, G.L., Dunbrack, R.L.: PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar
  16. 16.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural. Inf. Process. Syst. 14, 849–856 (2001)Google Scholar
  18. 18.
    Ji, H., Weinberg, S., Li, Y.: A revisit of block power methods for finite state markov chain applications. arXiv:1610.08881 (2016)
  19. 19.
    Ji, H., Yu, W., Li, Y.: A rank revealing randomized singular value decomposition (R3SVD) algorithm for low-rank matrix approximations. arXiv:1605.08134 (2016)
  20. 20.
    Gu, Y., Yu, W., Li, Y.: Efficient randomized algorithms for adaptive low-rank factorizations of large matrices. arXiv:1606.09402 (2016)
  21. 21.
    Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2009)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Chiang, Y.S., Gelfand, T.I., Kister, A.E., Gelfand, I.M.: New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage. Proteins 68(4), 915–921 (2007)CrossRefGoogle Scholar
  23. 23.
    Elhefnawy, W., Chen, L., Han, Y., Li, Y.: ICOSA: a distance-dependent, orientation-specific coarse-grain contact potential for protein structure modeling. J. Mol. Biol. 427(15), 2562–2576 (2015)CrossRefGoogle Scholar
  24. 24.
    Li, Y., Liu, H., Rata, I., Jakobsson, E.: Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment. J. Chem. Inf. Model. 53(2), 500–508 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Wessam Elhefnawy
    • 1
  • Min Li
    • 2
  • Jianxin Wang
    • 2
  • Yaohang Li
    • 1
  1. 1.Department of Computer ScienceOld Dominion UniversityNorfolkUSA
  2. 2.Department of Computer Science, School of Information Science and EngineeringCentral South UniversityChangshaChina

Personalised recommendations