Journal of Computer-Aided Molecular Design

, Volume 33, Issue 11, pp 943–953 | Cite as

Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions

  • Edelmiro MomanEmail author
  • Maria A. Grishina
  • Vladimir A. Potemkin


The computational prediction of ligand-biopolymer affinities is a crucial endeavor in modern drug discovery and one that still poses major challenges. The choice of the appropriate computational method often reveals itself as a trade-off between accuracy and speed, with mathematical devices referred to as scoring functions being the fastest. Among the many shortcomings of scoring functions there is the lack of universal applicability to every molecular system. This is so largely due to their reliance on atom type perception and/or parametrization. This article proposes the use of nonparametric Model of Effective Radii of Atoms descriptors that can be readily computed for the entire Periodic Table and demonstrate that, in combination with machine learning algorithms, they can yield competitive performances and chemically meaningful insights.


Chemical descriptors Nonparametric descriptors MERA Scoring function Machine learning 



This work was supported by the Government of the Russian Federation (Act 211, contract 02.A03.21.0011) and by the Ministry of Education and Science of the Russian Federation (Grants 4.8298.2017/8.9, 4.7279.2017/8.9).

Supplementary material

10822_2019_248_MOESM1_ESM.pdf (1.3 mb)
Supplementary file1 (PDF 1316 kb)


  1. 1.
    Śledź P, Caflisch A (2018) Protein structure-based drug design: from docking to molecular dynamics. Curr Opin Struct Biol 48:93–102PubMedPubMedCentralGoogle Scholar
  2. 2.
    Wang X, Song K, Li L, Chen L (2018) Structure-based drug design strategies and challenges. Curr Top Med Chem 18:998–1006PubMedPubMedCentralGoogle Scholar
  3. 3.
    Huang G, Yan F, Tan D (2018) A review of computational methods for predicting drug targets. Curr Protein Pept Sci 19:562–572Google Scholar
  4. 4.
    Hodos RA, Kidd BA, Shameer K et al (2016) In silico methods for drug repurposing and pharmacology. Wiley Interdiscip Rev Syst Biol Med 8:186–210PubMedPubMedCentralGoogle Scholar
  5. 5.
    Li J, Fu A, Zhang L (2019) An overview of scoring functions used for protein–ligand interactions in molecular docking. Interdiscip Sci. CrossRefPubMedGoogle Scholar
  6. 6.
    Wang J-C, Lin J-H (2013) Scoring functions for prediction of protein–ligand interactions. Curr Pharm Des 19:2174–2182PubMedPubMedCentralGoogle Scholar
  7. 7.
    Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inf Model 55:475–482PubMedGoogle Scholar
  8. 8.
    Zhenin M, Bahia MS, Marcou G et al (2018) Rescoring of docking poses under Occam’s Razor: are there simpler solutions? J Comput Aided Mol Des 32:877–888PubMedGoogle Scholar
  9. 9.
    Bazgier V, Berka K, Otyepka M, Banáš P (2016) Exponential repulsion improves structural predictability of molecular docking. J Comput Chem 37:2485–2494PubMedGoogle Scholar
  10. 10.
    Hill AD, Reilly PJ (2015) Scoring functions for AutoDock. Methods Mol Biol 1273:467–474PubMedGoogle Scholar
  11. 11.
    Crespo A, Rodriguez-Granillo A, Lim VT (2017) Quantum-mechanics methodologies in drug discovery: applications of docking and scoring in lead optimization. Curr Top Med Chem 17:2663–2680PubMedGoogle Scholar
  12. 12.
    Pecina A, Haldar S, Fanfrlík J et al (2017) SQM/COSMO scoring function at the DFTB3-D3H4 level: unique identification of native protein-ligand poses. J Chem Inf Model 57:127–132PubMedGoogle Scholar
  13. 13.
    Liu X, Liu J, Zhu T et al (2016) PBSA_E: a PBSA-based free energy estimator for protein-ligand binding affinity. J Chem Inf Model 56:854–861PubMedGoogle Scholar
  14. 14.
    Greenidge PA, Lewis RA, Ertl P (2016) Boosting pose ranking performance via rescoring with MM-GBSA. Chem Biol Drug Des 88:317–328PubMedGoogle Scholar
  15. 15.
    Pason LP, Sotriffer CA (2016) Empirical scoring functions for affinity prediction of protein-ligand complexes. Mol Inform 35:541–548PubMedGoogle Scholar
  16. 16.
    Guedes IA, Pereira FSS, Dardenne LE (2018) Empirical scoring functions for structure-based virtual screening: applications, critical aspects, and challenges. Front Pharmacol 9:1089. CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Cao Y, Dai W, Miao Z (2018) Evaluation of protein-ligand docking by cyscore. Methods Mol Biol 1762:233–243PubMedGoogle Scholar
  18. 18.
    Dittrich J, Schmidt D, Pfleger C, Gohlke H (2019) Converging a knowledge-based scoring function: DrugScore2018. J Chem Inf Model 59:509–521PubMedGoogle Scholar
  19. 19.
    Pei J, Zheng Z, Merz KM (2019) Random forest refinement of the KECSA2 knowledge-based scoring function for protein decoy detection. J Chem Inf Model. CrossRefPubMedGoogle Scholar
  20. 20.
    Yan C, Grinter SZ, Merideth BR et al (2016) Iterative knowledge-based scoring functions derived from rigid and flexible decoy structures: evaluation with the 2013 and 2014 CSAR benchmarks. J Chem Inf Model 56:1013–1021PubMedGoogle Scholar
  21. 21.
    Wójcikowski M, Ballester PJ, Siedlecki P (2017) Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep. CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ragoza M, Hochuli J, Idrobo E et al (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57:942–957PubMedPubMedCentralGoogle Scholar
  23. 23.
    Stepniewska-Dziubinska MM, Zielenkiewicz P, Siedlecki P (2018) Development and evaluation of a deep learning model for protein-ligand binding affinity prediction. Bioinformatics 34:3666–3674PubMedPubMedCentralGoogle Scholar
  24. 24.
    Jiménez J, Škalič M, Martínez-Rosell G, De Fabritiis G (2018) KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model 58:287–296PubMedGoogle Scholar
  25. 25.
    Wallach I, Dzamba M, Heifets A (2015) AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv:151002855 [cs, q-bio, stat]Google Scholar
  26. 26.
    Gomes J, Ramsundar B, Feinberg EN, Pande VS (2017) Atomic convolutional networks for predicting protein-ligand binding affinity. arXiv:170310603 [physics, stat]Google Scholar
  27. 27.
    Baek M, Shin W-H, Chung HW, Seok C (2017) GalaxyDock BP2 score: a hybrid scoring function for accurate protein-ligand docking. J Comput Aided Mol Des 31:653–666PubMedGoogle Scholar
  28. 28.
    Tanchuk VY, Tanin VO, Vovk AI, Poda G (2016) A new, improved hybrid scoring function for molecular docking and scoring based on AutoDock and AutoDock Vina. Chem Biol Drug Des 87:618–625PubMedGoogle Scholar
  29. 29.
    Ashtawy HM, Mahapatra NR (2015) A comparative assessment of predictive accuracies of conventional and machine learning scoring functions for protein-ligand binding affinity prediction. IEEE/ACM Trans Comput Biol Bioinform 12:335–347PubMedGoogle Scholar
  30. 30.
    Ashtawy HM, Mahapatra NR (2018) Boosted neural networks scoring functions for accurate ligand docking and ranking. J Bioinform Comput Biol 16:1850004. CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Ashtawy HM, Mahapatra NR (2018) Task-specific scoring functions for predicting ligand binding poses and affinity and for screening enrichment. J Chem Inf Model 58:119–133PubMedPubMedCentralGoogle Scholar
  32. 32.
    Kadukova M, Grudinin S (2017) Convex-PL: a novel knowledge-based potential for protein–ligand interactions deduced from structural databases using convex optimization. J Comput Aided Mol Des 31:943–958PubMedGoogle Scholar
  33. 33.
    Ballester PJ, Mitchell JBO (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26:1169–1175PubMedPubMedCentralGoogle Scholar
  34. 34.
    Liu J, Su M, Liu Z et al (2017) Enhance the performance of current scoring functions with the aid of 3D protein–ligand interaction fingerprints. BMC Bioinform 18:343. CrossRefGoogle Scholar
  35. 35.
    Potemkin VA, Pogrebnoy AA, Grishina MA (2009) Technique for energy decomposition in the study of “receptor-ligand” complexes. J Chem Inf Model 49:1389–1406PubMedPubMedCentralGoogle Scholar
  36. 36.
    Potemkin V, Potemkin A, Grishina M (2018) Internet resources for drug discovery and design. Curr Top Med Chem 18:1955–1975PubMedPubMedCentralGoogle Scholar
  37. 37.
    Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554PubMedPubMedCentralGoogle Scholar
  38. 38.
    Berman HM, Westbrook J, Feng Z et al (2000) The protein data bank. Nucleic Acids Res 28:235–242PubMedPubMedCentralGoogle Scholar
  39. 39.
    Liu Z, Su M, Han L et al (2017) Forging the basis for developing protein–ligand interaction scoring functions. ACC Chem Res 50:302–309Google Scholar
  40. 40.
    Breuza L, Poux S, Estreicher A et al (2016) The UniProtKB guide to the human proteome. Database (Oxford). CrossRefGoogle Scholar
  41. 41.
    Maloney PR, Parks DJ, Haffner CD et al (2000) Identification of a chemical tool for the orphan nuclear receptor FXR. J Med Chem 43:2971–2974PubMedGoogle Scholar
  42. 42.
    Pettersen EF, Goddard TD, Huang CC et al (2004) UCSF Chimera: a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612PubMedGoogle Scholar
  43. 43.
    Li Y, Yang J (2017) Structural and sequence similarity makes a significant impact on machine-learning-based scoring functions for protein–ligand interactions. J Chem Inf Model 57:1007–1012PubMedGoogle Scholar
  44. 44.
    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedPubMedCentralGoogle Scholar
  45. 45.
    Price MN, Dehal PS, Arkin AP (2010) FastTree 2: approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3):e9490. CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Menardo F, Loiseau C, Brites D et al (2018) Treemmer: a tool to reduce large phylogenetic datasets with minimal loss of diversity. BMC Bioinform 19:164. CrossRefGoogle Scholar
  47. 47.
    Kumar S, Stecher G, Li M et al (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35:1547–1549PubMedPubMedCentralGoogle Scholar
  48. 48.
    Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53:1893–1904PubMedPubMedCentralGoogle Scholar
  49. 49.
    Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461PubMedPubMedCentralGoogle Scholar
  50. 50.
    Morris GM, Huey R, Lindstrom W et al (2009) AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 30:2785–2791PubMedPubMedCentralGoogle Scholar
  51. 51.
    User guide: contents—scikit-learn 0.20.3 documentation. Accessed 9 Apr 2019
  52. 52.
    Li H, Peng J, Sidorov P et al (2019) Classical scoring functions for docking are unable to exploit large volumes of structural and interaction data. Bioinformatics. CrossRefPubMedPubMedCentralGoogle Scholar
  53. 53.
    Afifi K, Al-Sadek AF (2018) Improving classical scoring functions using random forest: the non-additivity of free energy terms’ contributions in binding. Chem Biol Drug Des 92:1429–1434PubMedGoogle Scholar
  54. 54.
    Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38:169–177PubMedGoogle Scholar
  55. 55.
    Ain QU, Aleksandrova A, Roessler FD, Ballester PJ (2015) Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip Rev Comput Mol Sci 5:405–424PubMedPubMedCentralGoogle Scholar
  56. 56.
    Gabel J, Desaphy J, Rognan D (2014) Beware of machine learning-based scoring functions-on the danger of developing black boxes. J Chem Inf Model 54:2807–2815PubMedGoogle Scholar
  57. 57.
    Koebel MR, Cooper A, Schmadeke G et al (2016) S···O and S···N sulfur bonding interactions in protein-ligand complexes: empirical considerations and scoring function. J Chem Inf Model 56:2298–2309PubMedGoogle Scholar
  58. 58.
    Catazaro J, Caprez A, Swanson D, Powers R (2019) Functional evolution of proteins. Proteins 87:492–501PubMedGoogle Scholar
  59. 59.
    Nogueira MS, Koch O (2019) The development of target-specific machine learning models as scoring functions for docking-based target prediction. J Chem Inf Model 59:1238–1252PubMedGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.South Ural State UniversityChelyabinskRussian Federation

Personalised recommendations