Journal of Computer-Aided Molecular Design

, Volume 31, Issue 10, pp 943–958 | Cite as

Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization

  • Maria Kadukova
  • Sergei GrudininEmail author


We present a novel optimization approach to train a free-shape distance-dependent protein-ligand scoring function called Convex-PL. We do not impose any functional form of the scoring function. Instead, we decompose it into a polynomial basis and deduce the expansion coefficients from the structural knowledge base using a convex formulation of the optimization problem. Also, for the training set we do not generate false poses with molecular docking packages, but use constant RMSD rigid-body deformations of the ligands inside the binding pockets. This allows the obtained scoring function to be generally applicable to scoring of structural ensembles generated with different docking methods. We assess the Convex-PL scoring function using data from D3R Grand Challenge 2 submissions and the docking test of the CASF 2013 study. We demonstrate that our results outperform the other 20 methods previously assessed in CASF 2013. The method is available at


Machine learning Molecular docking Protein-ligand interactions Scoring function Knowledge-based potential 



The authors thank Georgy Cheremovskiy from Moscow Institute of Physics and Technology for the initial development of the potential, and Georgy Derevyanko from Concordia University who proposed the initial formulation of the optimization problem. The authors also thank Valentin Gordeliy from IBS Grenoble, and Vladimir Chupin and Petr Popov from MIPT Moscow for fruitful discussions during this work. This work was partially supported by RSF research Project 14-14-00995.

Supplementary material

10822_2017_68_MOESM1_ESM.pdf (235 kb)
Supplementary material 1 (pdf 235 KB)


  1. 1.
    Böhm HJ (1994) The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput-Aided Mol Des 8(3):243–256CrossRefGoogle Scholar
  2. 2.
    Böhm HJ (1998) Prediction of binding constants of protein ligands: a fast method for the prioritization of hits obtained from de novo design or 3d database search programs. J Comput-Aided Mol Des 12(4):309–309CrossRefGoogle Scholar
  3. 3.
    Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New YorkCrossRefGoogle Scholar
  4. 4.
    Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M (1983) Charmm: a program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4(2):187–217CrossRefGoogle Scholar
  5. 5.
    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) Blast+: architecture and applications. BMC Bioinf 10(1):421CrossRefGoogle Scholar
  6. 6.
    Carlson HA, Smith RD, Damm-Ganamet KL, Stuckey JA, Ahmed A, Convery MA, Somers DO, Kranz M, Elkins PA, Cui G, Lambert MH, Dunbar JB Jr (2016) CSAR 2014: a benchmark exercise using unpublished data from pharma. J Chem Inf Model 56:1063–1077CrossRefGoogle Scholar
  7. 7.
    Case DA, Cheatham TE, Darden T, Gohlke H, Luo R, Merz KM, Onufriev A, Simmerling C, Wang B, Woods RJ (2005) The amber biomolecular simulation programs. J Comput Chem 26(16):1668–1688CrossRefGoogle Scholar
  8. 8.
    Chae MH, Krull F, Lorenzen S, Knapp EW (2010) Predicting protein complex geometries with a neural network. Proteins 78(4):1026–1039CrossRefGoogle Scholar
  9. 9.
    Chaskar P, Zoete V, Röhrig UF (2014) Toward on-the-fly quantum mechanical/molecular mechanical (qm/mm) docking: development and benchmark of a scoring function. J Chem Inf Model 54(11):3137–3152CrossRefGoogle Scholar
  10. 10.
    Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model 49(4):1079–1093CrossRefGoogle Scholar
  11. 11.
    Chuang GY, Kozakov D, Brenke R, Comeau SR, Vajda S (2008) Dars (decoys as the reference state) potentials for protein-protein docking. Biophys J 95(9):4217–4227CrossRefGoogle Scholar
  12. 12.
    Corbeil CR, Williams CI, Labute P (2012) Variability in docking success rates due to dataset preparation. J Comput-Aided Mol Des 26(6):775–786CrossRefGoogle Scholar
  13. 13.
    Cross JB, Thompson DC, Rai BK, Baber JC, Fan KY, Hu Y, Humblet C (2009) Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J Chem Inf Model 49(6):1455–1474CrossRefGoogle Scholar
  14. 14.
    Damm-Ganamet KL, Smith RD, Dunbar JB Jr, Stuckey JA, Carlson HA (2013) CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J Chem Inf Model 53(8):1853–1870CrossRefGoogle Scholar
  15. 15.
    Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput-Aided Mol Des 11(5):425–445CrossRefGoogle Scholar
  16. 16.
    Ewing TJ, Makino S, Skillman AG, Kuntz ID (2001) Dock 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput-Aided Mol Des 15(5):411–428CrossRefGoogle Scholar
  17. 17.
    Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) Glide: A new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47(7):1739–1749CrossRefGoogle Scholar
  18. 18.
    Gabel J, Desaphy J, Rognan D (2014) Beware of machine learning-based scoring functions–on the danger of developing black boxes. J Chem Inf Model 54(10):2807–2815CrossRefGoogle Scholar
  19. 19.
    Gathiaka S, Liu S, Chiu M, Yang H, Stuckey JA, Kang YN, Delproposto J, Kubish G, Dunbar JB, Carlson HA et al (2016) D3R grand challenge 2015: evaluation of protein-ligand pose and affinity predictions. J Comput-Aided Mol Des 30(9):651–668CrossRefGoogle Scholar
  20. 20.
    Gehlhaar DK, Verkhivker GM, Rejto PA, Sherman CJ, Fogel DR, Fogel LJ, Freer ST (1995) Molecular recognition of the inhibitor ag-1343 by hiv-1 protease: conformationally flexible docking by evolutionary programming. Chem Biol (Oxford, UK) 2(5):317–324Google Scholar
  21. 21.
    Gohlke H, Hendlich M, Klebe G (2000) Knowledge-based scoring function to predict protein-ligand interactions. J Mol Biol 295(2):337–356CrossRefGoogle Scholar
  22. 22.
    Goto J, Kataoka R, Muta H, Hirayama N (2008) Asedock-docking based on alpha spheres and excluded volumes. J Chem Inf Model 48(3):583–590CrossRefGoogle Scholar
  23. 23.
    Grudinin S, Kadukova M, Eisenbarth A, Marillet S, Cazals F (2016) Predicting binding affinities for protein - ligand complexes in the 2015 d3r grand challenge using a physical model with a ridge regression parameter estimation. J Comput-Aided Mol Des 30:791–804CrossRefGoogle Scholar
  24. 24.
    Grudinin S, Popov P, Neveu E, Cheremovskiy G (2016) Predicting binding poses and affinities in the CSAR 2013–2014 docking exercises using the knowledge-based convex-pl potential. J Chem Inf Model 56(6):1053–1062CrossRefGoogle Scholar
  25. 25.
    Hess B, Kutzner C, Van Der Spoel D, Lindahl E (2008) Gromacs 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 4(3):435–447CrossRefGoogle Scholar
  26. 26.
    Homeyer N, Gohlke H (2013) FEW: a workflow tool for free energy calculations of ligand binding. J Comput Chem 34(11):965–973CrossRefGoogle Scholar
  27. 27.
    Hsieh JH, Yin S, Liu S, Sedykh A, Dokholyan NV, Tropsha A (2011) Combined application of cheminformatics and physical force field-based scoring functions improves binding affinity prediction for CSAR data sets. J Chem Inf Model 51(9):2027–2035CrossRefGoogle Scholar
  28. 28.
    Huang SY, Zou X (2008) An iterative knowledge-based scoring function for protein–protein recognition. Proteins 72(2):557–579CrossRefGoogle Scholar
  29. 29.
    Huang SY, Zou X (2010) Inclusion of solvation and entropy in the knowledge-based scoring function for protein-ligand interactions. J Chem Inf Model 50(2):262–273CrossRefGoogle Scholar
  30. 30.
    Huang SY, Zou X (2010) Mean-force scoring functions for protein-ligand binding. Annu Rep Comput Chem 6:280–296CrossRefGoogle Scholar
  31. 31.
    Huang SY, Zou X (2011) Scoring and lessons learned with the CSAR benchmark using an improved iterative knowledge-based scoring function. J Chem Inf Model 51(9):2097–2106CrossRefGoogle Scholar
  32. 32.
    Jain AN (1996) Scoring noncovalent protein-ligand interactions: a continuous differentiable function tuned to compute binding affinities. J Comput-Aided Mol Des 10(5):427–440CrossRefGoogle Scholar
  33. 33.
    Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 45:11225–11226CrossRefGoogle Scholar
  34. 34.
    Kadukova M, Grudinin S (2016) Knodle: a support vector machines-based automatic perception of organic molecules from 3d coordinates. J Chem Inf Model 56(8):1410–1419CrossRefGoogle Scholar
  35. 35.
    Kadukova M, Grudinin S (2017) Docking of small molecules to farnesoid X receptors using AutoDock Vina with the Convex-PL potential: lessons learned from D3R Grand Challenge 2. J Comput Aided Mol Des. doi: 10.1007/s10822-017-0062-1 Google Scholar
  36. 36.
    Kinnings SL, Liu N, Tonge PJ, Jackson RM, Xie L, Bourne PE (2011) A machine learning-based method to improve docking scoring functions and its application to drug repurposing. J Chem Inf Model 51(2):408–419CrossRefGoogle Scholar
  37. 37.
    Kitchen DB, Decornez H, Furr JR, Bajorath J (2004) Docking and scoring in virtual screening for drug discovery: methods and applications. Nat Rev Drug Discov 3(11):935–949CrossRefGoogle Scholar
  38. 38.
    Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAS 2:1137–1145Google Scholar
  39. 39.
    Korb O, Stutzle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with plants. J Chem Inf Model 49(1):84–96CrossRefGoogle Scholar
  40. 40.
    Krammer A, Kirchhoff PD, Jiang X, Venkatachalam C, Waldman M (2005) Ligscore: a novel scoring function for predicting binding affinities. J Mol Graphics Modell 23(5):395–407CrossRefGoogle Scholar
  41. 41.
    Kuhn B, Gerber P, Schulz-Gasch T, Stahl M (2005) Validation and use of the mm-pbsa approach for drug discovery. J Med Chem 48(12):4040–4048CrossRefGoogle Scholar
  42. 42.
    Labute P (2008) The generalized born/volume integral implicit solvent model: estimation of the free energy of hydration using london dispersion instead of atomic surface area. J Comput Chem 29(10):1693–1698CrossRefGoogle Scholar
  43. 43.
    Lee Y, Mangasarian O (2001) RSVM: Reduced support vector machines. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 00–07Google Scholar
  44. 44.
    Lensink MF, Velankar S, Kryshtafovych A, Huang SY, Schneidman-Duhovny D, Sali A, Segura J, Fernandez-Fuentes N, Viswanath S, Elber R, Grudinin S, Popov P, Neveu E, Lee H, Baek M, Park S, Heo L, Rie Lee G, Seok C, Qin S, Zhou H, Ritchie DW, Maigret B, Devignes MD, Ghoorah AW, Torchala M, Chaleil RA, Bates PA, Ben-Zeev E, Eisenstein M, Negi S, Weng Z, Vreven T, Pierce BG, Borrman TM, Yu J, Ochsenbein F, Guerois R, Vangone A, Rodrigues JP, van Zundert G, Nellen M, Xue L, Karaca E, Melquiond A, Visscher K, Kastritis PL, Bonvin AMJJ, Xu X, Qiu L, Yan C, Li J, Ma Z, Cheng J, Zou X, Shen Y, Peterson L, Kim H, Roy A, Han X, Esquivel-Rodriguez J, Kihara D, Yu X, Bruce N, Fuller J, Wade R, Anishchenko I, Kundrotas PJ, Vakser IA Vakser, Imai K, Yamada k, Oda T, Nakamura T, Tomii k, Pallara C, Romero-Durana M, Jimenez-Garcia B, Moal IH, Fernandez-Recio J, Young Joung J, Kim JY, Joo k, Lee J, Kozakov D, Vajda S, Chermak E, CavalloL, Oliva R, Tovchigrechko A, Wodak S (2016) Prediction of homo- and hetero-protein complexes by ab-initio and template-based docking: a CASP-CAPRI experiment. Proteins 84:323–348CrossRefGoogle Scholar
  45. 45.
    Li GB, Yang LL, Wang WJ, Li LL, Yang SY (2013) Id-score: a new empirical scoring function based on a comprehensive set of descriptors related to protein-ligand interactions. J Chem Inf Model 53(3):592–600CrossRefGoogle Scholar
  46. 46.
    Li Y, Han L, Liu Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J Chem Inf Model 54(6):1717–1736CrossRefGoogle Scholar
  47. 47.
    Li Y, Liu Z, Li J, Han L, Liu J, Zhao Z, Wang R (2014) Comparative assessment of scoring functions on an updated benchmark: 1. Compilation of the test set. J Chem Inf Model 54(6):1700–1716CrossRefGoogle Scholar
  48. 48.
    Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482CrossRefGoogle Scholar
  49. 49.
    Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55(3):475–482CrossRefGoogle Scholar
  50. 50.
    Maiorov VN, Grippen GM (1992) Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 227(3):876–888CrossRefGoogle Scholar
  51. 51.
    McInnes C (2007) Virtual screening strategies in drug discovery. Curr Opin Chem Biol 11(5):494–502CrossRefGoogle Scholar
  52. 52.
    Mooij W, Verdonk ML (2005) General and targeted statistical potentials for protein-ligand interactions. Proteins 61(2):272–287CrossRefGoogle Scholar
  53. 53.
    Muegge I (2000) A knowledge-based scoring function for protein-ligand interactions: Probing the reference state. In: Virtual screening: an alternative or complement to high throughput screening?, Springer, Berlin pp 99–114Google Scholar
  54. 54.
    Muegge I (2001) Effect of ligand volume correction on pmf scoring. J Comput Chem 22(4):418–425CrossRefGoogle Scholar
  55. 55.
    Muegge I, Martin YC (1999) A general and fast scoring function for protein-ligand interactions: a simplified potential approach. J Med Chem 42(5):791–804CrossRefGoogle Scholar
  56. 56.
    Neudert G, Klebe G (2011) Dsx: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51(10):2731–2745CrossRefGoogle Scholar
  57. 57.
    Neudert G, Klebe G (2011) fconv: format conversion, manipulation and feature computation of molecular data. Bioinformatics 27(7):1021–1022CrossRefGoogle Scholar
  58. 58.
    Neveu E, Ritchie DW, Popov P, Grudinin S (2016) Pepsi-dock: a detailed data-driven protein-protein interaction potential accelerated by polar fourier correlation. Bioinformatics 32(17):i693–i701CrossRefGoogle Scholar
  59. 59.
    Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Neural networks for signal processing [1997] VII. Proceedings of the 1997 IEEE Workshop, pp 276–285Google Scholar
  60. 60.
    Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods. MIT Press, Cambridge, MAGoogle Scholar
  61. 61.
    Popov P, Grudinin S (2014) Rapid determination of RMSDs corresponding to macromolecular rigid body motions. J Comput Chem 35(12):950–956CrossRefGoogle Scholar
  62. 62.
    Popov P, Grudinin S (2015) Knowledge of native protein-protein interfaces is sufficient to construct predictive models for the selection of binding candidates. J Chem Inf Model 55(10):2242–2255CrossRefGoogle Scholar
  63. 63.
    Qiu J, Elber R (2005) Atomically detailed potentials to recognize native and approximate protein structures. Proteins 61(1):44–55CrossRefGoogle Scholar
  64. 64.
    Quiroga R, Villarreal MA (2016) Vinardo: A scoring function based on autodock vina improves scoring, docking, and virtual screening. PLoS ONE 11(5):e0155183CrossRefGoogle Scholar
  65. 65.
    Samudrala R, Moult J (1998) An all-atom distance-dependent conditional probability discriminatory function for protein structure prediction. J Mol Biol 275(5):895–916CrossRefGoogle Scholar
  66. 66.
    BIOVIA DS, Discovery Studio Modeling Environment BIOVIA, (2016) Dassault Systemes, Realease 2017Google Scholar
  67. 67.
    Shen My, Sali A (2006) Statistical potential for assessment and prediction of protein structures. Protein Sci. 15(11):2507–2524CrossRefGoogle Scholar
  68. 68.
    Smith RD, Dunbar j Jr, Ung PM, Esposito EX, Yang CY, Wang S, Carlson HA (2011) CSAR benchmark exercise of 2010: Combined evaluation across all submitted scoring functions. J Chem Inf Model 51:2115–2131CrossRefGoogle Scholar
  69. 69.
    Sotriffer C (2012) Scoring functions for protein-ligand interactions. Protein-ligand interactions, First Edition pp 237–263 Wiley: WeinhamGoogle Scholar
  70. 70.
    Sotriffer C, Matter H (2011) Virtual screening: principles, challenges, and practical guidelines, chap 7. Wiley, WeinhamCrossRefGoogle Scholar
  71. 71.
    Tobi D, Bahar I (2006) Optimal design of protein docking potentials: efficiency and limitations. Proteins 62(4):970–981CrossRefGoogle Scholar
  72. 72.
    Trott O, Olson AJ (2010) Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461Google Scholar
  73. 73.
    Vapnik V (1979) Estimation of dependences based on empirical data. Nauka, MoscowGoogle Scholar
  74. 74.
    Vapnik V (2000) The nature of statistical learning theory. Springer, New YorkCrossRefGoogle Scholar
  75. 75.
    Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38(3):169–177CrossRefGoogle Scholar
  76. 76.
    Wang L, Berne B, Friesner RA (2012) On achieving high accuracy and reliability in the calculation of relative protein-ligand binding affinities. Proc Natl Acad Sci USA 109(6):1937–1942CrossRefGoogle Scholar
  77. 77.
    Wang L, Wu Y, Deng Y, Kim B, Pierce L, Krilov G, Lupyan D, Robinson S, Dahlgren MK, Greenwood J, Romero DL, Masse C, Knight JL, Steinbrecher T, Beuming T, Damm W, Harder E, Sherman W, Brewer M, Wester R, Murcko M, Frye L, Farid R, Lin T, Mobley DL, Jorgensen WL, Berne BJ, Friesner RA, Abel R (2015) Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J Am Chem Soc 137(7):2695–2703CrossRefGoogle Scholar
  78. 78.
    Wang R, Fang X, Lu Y, Wang S (2004) The PDB bind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J Med Chem 47(12):2977–2980CrossRefGoogle Scholar
  79. 79.
    Wang R, Fang X, Lu Y, Yang CY, Wang S (2005) The PDB bind database: methodologies and updates. J Med Chem 48(12):4111–4119CrossRefGoogle Scholar
  80. 80.
    Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput-Aided Mol Des 16(1):11–26CrossRefGoogle Scholar
  81. 81.
    Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46(12):2287–2303CrossRefGoogle Scholar
  82. 82.
    Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, Lindvall M, Nevins N, Semus SF, Senger S, Tedesco G, Wall ID, Woolven JM, Peishoff CE, Head MS (2006) A critical assessment of docking programs and scoring functions. J Med Chem 49(20):5912–5931CrossRefGoogle Scholar
  83. 83.
    Yan Z, Wang J (2016) Incorporating specificity into optimization: evaluation of spa using CSAR 2014 and casf 2013 benchmarks. J Comput-Aided Mol Des 30(3):219–227CrossRefGoogle Scholar
  84. 84.
    Zheng Z, Merz KM (2013) Development of the knowledge-based and empirical combined scoring algorithm (kecsa) to score protein-ligand interactions. J Chem Inf Model 53(5):1073–1083CrossRefGoogle Scholar
  85. 85.
    Zhou H, Skolnick J (2011) Goap: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 101(8):2043–2052CrossRefGoogle Scholar
  86. 86.
    Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11(11):2714–2726CrossRefGoogle Scholar
  87. 87.
    Zilian D, Sotriffer CA (2013) Sfcscorerf: a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J Chem Inf Model 53(8):1923–1933CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.LJKUniversity of Grenoble AlpesGrenobleFrance
  2. 2.LJKCNRSGrenobleFrance
  3. 3.InriaGrenobleFrance
  4. 4.Moscow Institute of Physics and TechnologyDolgoprudniyRussia

Personalised recommendations