Learning Organizations of Protein Energy Landscapes: An Application on Decoy Selection in Template-Free Protein Structure Prediction

  • Nasrin Akhter
  • Liban Hassan
  • Zahra Rajabi
  • Daniel Barbará
  • Amarda ShehuEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1958)


The protein energy landscape, which lifts the protein structure space by associating energies with structures, has been useful in improving our understanding of the relationship between structure, dynamics, and function. Currently, however, it is challenging to automatically extract and utilize the underlying organization of an energy landscape to the link structural states it houses to biological activity. In this chapter, we first report on two computational approaches that extract such an organization, one that ignores energies and operates directly in the structure space and another that operates on the energy landscape associated with the structure space. We then describe two complementary approaches, one based on unsupervised learning and another based on supervised learning. Both approaches utilize the extracted organization to address the problem of decoy selection in template-free protein structure prediction. The presented results make the case that learning organizations of protein energy landscapes advances our ability to link structures to biological activity.

Key words

Protein structure space Energy landscape Nearest neighbor graph Communities Basins Community detection Basin finding Unsupervised and supervised learning Decoy selection Template-free protein structure prediction 


  1. 1.
    Boehr DD, Wright PE (2008) How do proteins interact? Science 320(5882):1429–1430CrossRefGoogle Scholar
  2. 2.
    Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A (2016) Principles and overview of sampling methods for modeling macromolecular structure and dynamics. PLoS Comp Biol 12(4):e1004619CrossRefGoogle Scholar
  3. 3.
    Leaver-Fay A et al (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545–574CrossRefGoogle Scholar
  4. 4.
    Xu D, Zhang Y (2012) Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins: Struct Funct Bioinf 80(7):1715–1735. Scholar
  5. 5.
    Olson B, Shehu A (2013) Multi-objective stochastic search for sampling local minima in the protein energy surface. In: ACM conference on bioinformatics, computational biology (BCB), Washington, DC, pp 430–439Google Scholar
  6. 6.
    Clausen R, Shehu A (2014) A multiscale hybrid evolutionary algorithm to obtain sample-based representations of multi-basin protein energy landscapes. In: ACM conference on bioinformatics, computational biology (BCB), Newport Beach, CA, pp 269–278Google Scholar
  7. 7.
    Shehu A, Plaku E (2016) A survey of computational treatments of biomolecules by robotics-inspired methods modeling equilibrium structure and dynamics. J Artif Intell Res 597:509–572CrossRefGoogle Scholar
  8. 8.
    Shehu A, Clementi C, Kavraki LE (2007) Sampling conformation space to model equilibrium fluctuations in proteins. Algorithmica 48(4):303–327CrossRefGoogle Scholar
  9. 9.
    Okazaki K, Koga N, Takada S, Onuchic JN, Wolynes PG (2006) Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: structure-based molecular dynamics simulations. Proc Natl Acad Sci U S A 103(32):11844–11849CrossRefGoogle Scholar
  10. 10.
    Boehr DD, Nussinov R, Wright PE (2009) The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol 5(11):789–796CrossRefGoogle Scholar
  11. 11.
    Nussinov R, Wolynes PG (2014) A second molecular biology revolution? The energy landscapes of biomolecular function. Phys Chem Chem Phys 16(14):6321–6322CrossRefGoogle Scholar
  12. 12.
    Frauenfelder H, Sligar SG, Wolynes PG (1991) The energy landscapes and motion on proteins. Science 254(5038):1598–1603CrossRefGoogle Scholar
  13. 13.
    Bryngelson JD, Onuchic JN, Socci ND, Wolynes PG (1995) Funnels, pathways, and the energy landscape of protein folding: a synthesis. Proteins Struct Funct Genet 21(3):167–195CrossRefGoogle Scholar
  14. 14.
    Shehu A (2015) A review of evolutionary algorithms for computing functional conformations of protein molecules. In: Zhang W (ed) Computer-aided drug discovery, Springer methods in pharmacology and toxicology seriesGoogle Scholar
  15. 15.
    Samoilenko S (2008) Fitness landscapes of complex systems: insights and implications on managing a conflict environment of organizations. Complex Organ 10(4):38–45Google Scholar
  16. 16.
    Kryshtafovych A, Fidelis K, Tramontano A (2011) Evaluation of model quality predictions in CASP9. Proteins 79(Suppl 10):91–106CrossRefGoogle Scholar
  17. 17.
    Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramon- tano A (2014) Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins 82(Suppl 2):112–126CrossRefGoogle Scholar
  18. 18.
    Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2014) Critical assessment of methods of protein structure prediction (CASP)—round X. Proteins: Struct Funct Bioinf 82:109–115CrossRefGoogle Scholar
  19. 19.
    Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2018) Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins 86(Suppl 1):7–15. Scholar
  20. 20.
    Uziela K, Wallner B (2016) Proq2: estimation of model accuracy implemented in rosetta. Bioinformatics 32(9):1411–1413CrossRefGoogle Scholar
  21. 21.
    Liu T, Wang Y, Eickholt J, Wang Z (2016) Benchmarking deep networks for predicting residue-specific quality of individual protein models in casp11. Sci Rep 6(19):301Google Scholar
  22. 22.
    Ginalski K, Elofsson A, Fischer D, Rychlewski L (2003) 3D-Jury: a simple approach to improve protein structure predictions. Bioinformatics 19(8):1015–1018CrossRefGoogle Scholar
  23. 23.
    Wallner B, Elofsson A (2006) Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci 15(4):900–913CrossRefGoogle Scholar
  24. 24.
    Lorenzen S, Zhang Y (2007) Identification of near-native structures by clustering protein docking conformations. Proteins 68(1):187–194CrossRefGoogle Scholar
  25. 25.
    Zhang Y, Skolnick J (2004) Spicker: a clustering approach to identify near-native protein folds. J Comput Chem 25(6):865–871CrossRefGoogle Scholar
  26. 26.
    Molloy K, Saleh S, Shehu A (2013) Probabilistic search and energy guidance for biased decoy sampling in ab-initio protein structure prediction. IEEE/ACM Trans Bioinf Comput Biol 10(5):1162–1175CrossRefGoogle Scholar
  27. 27.
    Shehu A (2013) Probabilistic search and optimization for protein energy land- scapes. In: Aluru S, Singh A (eds) Handbook of computational molecular biology, Chapman & Hall/CRC Computer & Information Science SeriesBoca RatonGoogle Scholar
  28. 28.
    Guan W, Ozakin A, Gray A, et al (2011) Learning protein folding energy functions. In: International conference data mining. IEEE, pp 1062–1067Google Scholar
  29. 29.
    Jing X, Wang K, Lu R, Dong Q (2016) Sorting protein decoys by machine-learning-to-rank. Sci Rep 6(31):571Google Scholar
  30. 30.
    He Z, Alazmi M, Zhang J, Xu D (2013) Protein structural model selection by combining consensus and single scoring methods. PLoS One 8(9):e74006CrossRefGoogle Scholar
  31. 31.
    Pawlowski M, Kozlowski L, Kloczkowski A (2016) Mqapsingle: a quasi single-model approach for estimation of the quality of individual protein structure models. Proteins 84(8):1021–1028CrossRefGoogle Scholar
  32. 32.
    Cao R, Wang Z, Wang Y, Cheng J (2014) Smoq: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinform 15(1):120CrossRefGoogle Scholar
  33. 33.
    Nguyen SP, Shang Y, Xu D (2014) Dl-pro: a novel deep learning method for protein model quality assessment. In: International conference on neural networks (IJCNN). IEEE, pp 2071–2078Google Scholar
  34. 34.
    Manavalan B, Lee J, Lee J (2014) Random forest-based protein model quality assessment (rfmqa) using structural features and potential energy terms. PLoS One 9(9):e106542CrossRefGoogle Scholar
  35. 35.
    Chatterjee S, Ghosh S, Vishveshwara S (2013) Network properties of decoys and casp predicted models: a comparison with native protein structures. Mol BioSyst 9(7):1774–1788CrossRefGoogle Scholar
  36. 36.
    Mirzaei S, Sidi T, Keasar C, Crivelli S (2016) Purely structural protein scoring functions using support vector machine and ensemble learning. In: IEEE/ACM transactions on computational biology and bioinformaticsGoogle Scholar
  37. 37.
    Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Cryst A32:922–923CrossRefGoogle Scholar
  38. 38.
    Yang Z, Algesheimer R, Tessone CJ (2016) A comparative analysis of community detection algorithms on artificial networks. Sci Rep 6(30):750Google Scholar
  39. 39.
    Cazals F, Dreyfus T (2017) The structural bioinformatics library: modeling in biomolecular science and beyond. Bioinformatics 33(7):997–1004PubMedGoogle Scholar
  40. 40.
    Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press, Boca RatonCrossRefGoogle Scholar
  41. 41.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232CrossRefGoogle Scholar
  42. 42.
    Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794Google Scholar
  43. 43.
    Tomek I (1976) Two modifications of CNN. IEEE Trans Syst Man Cybernet 6:769–772Google Scholar
  44. 44.
    Akhter N, Shehu A (2017) From extraction of local structures of protein energy landscapes to improved decoy selection in template-free protein structure prediction. Molecules 23(1):216CrossRefGoogle Scholar
  45. 45.
    Berman HM, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10(12):980–980CrossRefGoogle Scholar
  46. 46.
    Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: International conference on data mining (ICDM), pp 745–754Google Scholar
  47. 47.
    Bastian M, Heymann S, Jacomy M (2009) Gephi: an open source software for exploring and manipulating networks. In: International AAAI conference on weblogs and social media. AAS, pp 361–362Google Scholar
  48. 48.
    Jacomy M, Venturini T, Heymann S, Bastian M (2014) ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software. PLoS One 9(6):e98679CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Nasrin Akhter
    • 1
  • Liban Hassan
    • 1
  • Zahra Rajabi
    • 1
  • Daniel Barbará
    • 1
  • Amarda Shehu
    • 1
    Email author
  1. 1.Department of Computer ScienceGeorge Mason UniversityFairfaxUSA

Personalised recommendations