Journal of Computer-Aided Molecular Design

, Volume 33, Issue 1, pp 71–82 | Cite as

Mathematical deep learning for pose and binding affinity prediction and ranking in D3R Grand Challenges

  • Duc Duy Nguyen
  • Zixuan Cang
  • Kedi Wu
  • Menglun Wang
  • Yin Cao
  • Guo-Wei WeiEmail author


Advanced mathematics, such as multiscale weighted colored subgraph and element specific persistent homology, and machine learning including deep neural networks were integrated to construct mathematical deep learning models for pose and binding affinity prediction and ranking in the last two D3R Grand Challenges in computer-aided drug design and discovery. D3R Grand Challenge 2 focused on the pose prediction, binding affinity ranking and free energy prediction for Farnesoid X receptor ligands. Our models obtained the top place in absolute free energy prediction for free energy set 1 in stage 2. The latest competition, D3R Grand Challenge 3 (GC3), is considered as the most difficult challenge so far. It has five subchallenges involving Cathepsin S and five other kinase targets, namely VEGFR2, JAK2, p38-α, TIE2, and ABL1. There is a total of 26 official competitive tasks for GC3. Our predictions were ranked 1st in 10 out of these 26 tasks.


Drug design Pose prediction Binding affinity Machine learning Algebraic topology Graph theory 



This work was supported in part by NSF Grants IIS-1302285, DMS-1721024 and DMS-1761320 and MSU Center for Mathematical Molecular Biosciences Initiative.

Supplementary material

10822_2018_146_MOESM1_ESM.xlsx (14 kb)
Supplementary material 1 (XLSX 14 KB)


  1. 1.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucl Acids Res 28(1):35–242Google Scholar
  2. 2.
    Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R (2017) Forging the basis for developing protein–ligand interaction scoring functions. Acc Chem Res 50(2):302–309Google Scholar
  3. 3.
    Ahmed A, Smith RD, Clark JJ, Dunbar JB Jr, Carlson HA (2014) Recent improvements to binding moad: a resource for protein–ligand binding affinities and structures. Nucl Acids Res 43(D1):D465–D469Google Scholar
  4. 4.
    Kroemer RT (2007) Structure-based drug design: docking and scoring. Curr Protein Pept Sci 8(4):312–328Google Scholar
  5. 5.
    Leach AR, Shoichet BK, Peishoff CE (2006) Prediction of protein–ligand interactions. docking and scoring: successes and gaps. J Med Chem 49:5851–5855Google Scholar
  6. 6.
    Novikov FN, Zeifman AA, Stroganov OV, Stroylov VS, Kulkov V, Chilov GG (2011) CSAR scoring challenge reveals the need for new concepts in estimating protein–ligand binding affinity. J Chem Inform Model 51:2090–2096Google Scholar
  7. 7.
    Wang R, Lu Y, Wang S (2003) Comparative evaluation of 11 scoring functions for molecular docking. J Med Chem 46:2287–2303Google Scholar
  8. 8.
    Liu J, Wang R (2015) Classification of current scoring functions. J Chem Inform Model 55(3):475–482Google Scholar
  9. 9.
    Ortiz AR, Pisabarro MT, Gago F, Wade RC (1995) Prediction of drug binding affinities by comparative binding energy analysis. J Med Chem 38:2681–2691Google Scholar
  10. 10.
    Yin S, Biedermannova L, Vondrasek J, Dokholyan NV (2008) Medusascore: an acurate force field-based scoring function for virtual drug screening. J Chem Inform Model 48:1656–1662Google Scholar
  11. 11.
    Zheng Z, Wang T, Li P, Merz KM Jr (2015) KECSA-movable type implicit solvation model (KMTISM). J Chem Theor Comput 11:667–682Google Scholar
  12. 12.
    Muegge I, Martin Y (1999) A general and fast scoring function for protein–ligand interactions: a simplified potential approach. J Med Chem 42(5):791–804Google Scholar
  13. 13.
    Velec HFG, Gohlke H, Klebe G (2005) Knowledge-based scoring function derived from small molecule crystal data with superior recognition rate of near-native ligand poses and better affinity prediction. J Med Chem 48:6296–6303Google Scholar
  14. 14.
    Huang SY, Zou X (2006) An iterative knowledge-based scoring function to predict protein–ligand interactions: I. derivation of interaction potentials. J Comput Chem 27:1865–1875Google Scholar
  15. 15.
    Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structural based binding affinity prediction. J Comput Aided Mol Des 16:11–26Google Scholar
  16. 16.
    Verkhivker G, Appelt K, Freer ST, Villafranca JE (1995) Empirical free energy calculations of ligand–protein crystallographic complexes. I. Knowledge based ligand-protein interaction potentials applied to the prediction of human immunodeficiency virus protease binding affinity. Protein Eng 8:677–691Google Scholar
  17. 17.
    Eldridge MD, Murray CW, Auton TR, Paolini GV, Mee RP (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des 11:425–445Google Scholar
  18. 18.
    Baum B, Muley L, Smolinski M, Heine A, Hangauer D, Klebe G (2010) Non-additivity of functional group contributions in protein–ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. J Mol Biol 397(4):1042–1054Google Scholar
  19. 19.
    Li H, Leung K-S, Wong M-H, Ballester PJ (2014) Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: cyscore as a case study. BMC Bioinform 15(1):1Google Scholar
  20. 20.
    Nguyen DD, Xiao T, Wang ML, Wei GW (2017) Rigidity strengthening: a mechanism for protein–ligand binding. J Chem Inform Model 57:1715–1721Google Scholar
  21. 21.
    Cang ZX, Wei, GW (2018) “Integration of element specific persistent homology and machine learning for protein–ligand binding affinity prediction. Int J Numer Methods Biomed Eng.
  22. 22.
    Cang ZX, Wei GW (2017) TopologyNet: topology based deep convolutional and multi-task neural networks for biomolecular property predictions. PLoS Comput Biol 13(7):e1005690. Google Scholar
  23. 23.
    Cang ZX, Mu L, Wei GW (2018) Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput Biol 14(1):e1005929. Google Scholar
  24. 24.
    Bramer D, Wei G-W (2018) Multiscale weighted colored graphs for protein flexibility and rigidity analysis. J Chem Phys 148(5):054103Google Scholar
  25. 25.
    Kaczynski T, Mischaikow K, Mrozek M (2004) Computational homology. Springer, New YorkGoogle Scholar
  26. 26.
    Edelsbrunner H, Letscher D, Zomorodian A (2001) Topological persistence and simplification. Discrete Comput Geom 28:511–533Google Scholar
  27. 27.
    Zomorodian A, Carlsson G (2005) Computing persistent homology. Discrete Comput Geom 33:249–274Google Scholar
  28. 28.
    Frosini P, Landi C (1999) Size theory as a topological tool for computer vision. Pattern Recognit Image Anal 9(4):596–603Google Scholar
  29. 29.
    Kasson PM, Zomorodian A, Park S, Singhal N, Guibas LJ, Pande VS (2007) Persistent voids a new structural metric for membrane fusion. Bioinformatics 23:1753–1759Google Scholar
  30. 30.
    Gameiro M, Hiraoka Y, Izumi S, Kramar M, Mischaikow K, Nanda V (2014) Topological measurement of protein compressibility via persistence diagrams. Japn J Ind Appl Math 32:1–17Google Scholar
  31. 31.
    Dabaghian Y, Mémoli F, Frank L, Carlsson G (2012) A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Comput Biol 8(8):e1002581Google Scholar
  32. 32.
    Xia KL, Wei GW (2014) Persistent homology analysis of protein structure, flexibility and folding. Int J Numer Methods Biomed Eng 30:814–844Google Scholar
  33. 33.
    Xia KL, Feng X, Tong YY, Wei GW (2015) Persistent homology for the quantitative prediction of fullerene stability. J Comput Chem 36:408–422Google Scholar
  34. 34.
    Wang B, Wei GW (2016) Object-oriented persistent homology. J Comput Phys 305:276–299Google Scholar
  35. 35.
    Liu B, Wang B, Zhao R, Tong Y, Wei GW (2017) ESES: software for Eulerian solvent excluded surface. J Comput Chem 38:446–466Google Scholar
  36. 36.
    Xia KL, Wei GW (2015) Persistent topology for cryo-EM data analysis. Int J Numer Methods Biomed Eng 31:e02719Google Scholar
  37. 37.
    Cang ZX, Mu L, Wu K, Opron K, Xia K, Wei G-W (2015) A topological approach to protein classification. Mol Based Math Biol 3:140–162Google Scholar
  38. 38.
    Cang ZX, Wei GW (2017) Analysis and prediction of protein folding energy changes upon mutation by element specific persistent homology. Bioinformatics 33:3549–3557Google Scholar
  39. 39.
    Wu K, Wei G-W (2018) Quantitative toxicity prediction using topology based multitask deep neural networks. J Chem Inform Model.
  40. 40.
    Wu K, Zhao Z, Wang R, Wei G-W (2017) Topp-s: persistent homology based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility. arXiv preprint arXiv:1801.01558
  41. 41.
    Sastry GM, Adzhigirey M, Day T, Annabhimoju R, Sherman W (2013) Protein and ligand preparation: parameters, protocols, and influence on virtual screening enrichments. J Comput Aided Mol Des 27:221–234Google Scholar
  42. 42.
    Trott O, Olson AJ (2010) AutoDock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31(2):455–461Google Scholar
  43. 43.
    Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS, Olson AJ (2009) Autodock4 and autodocktools4: automated docking with selective receptor flexibility. J Comput Chem 30(16):2785–2791Google Scholar
  44. 44.
    Bell J, Cao Y, Gunn J, Day T, Gallicchio E, Zhou Z, Levy R, Farid R (2012) Primex and the Schrödinger computational chemistry suite of programs. Int Tables Crystallogr F18:534–538Google Scholar
  45. 45.
    Ye Z, Baumgartner MP, Wingert BM, Camacho CJ (2016) Optimal strategies for virtual screening of induced-fit and flexible target in the 2015 D3R Grand Challenge. J Comput Aided Mol Des 30(9):695–706Google Scholar
  46. 46.
    Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267(3):727–748Google Scholar
  47. 47.
    Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS (2004) Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 47:1739Google Scholar
  48. 48.
    O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3(1):1Google Scholar
  49. 49.
    Schrödinger LLC (2017) Schrödinger release 2017-4. Schrödinger LLC, New YorkGoogle Scholar
  50. 50.
    Dixon SL, Smondyrev AM, Knoll EH, Rao SN, Shaw DE, Friesner RA (2006) Phase: a new engine for pharmacophore perception, 3d qsar model development, and 3d database screening: 1. Methodology and preliminary results. J Comput Aided Mol Des 20(10–11):647–671Google Scholar
  51. 51.
    Dixon SL, Smondyrev AM, Rao SN (2006) Phase: a novel approach to pharmacophore modeling and 3d database searching. Chem Biol Drug Des 67(5):370–372Google Scholar
  52. 52.
    Jacobson MP, Pincus DL, Rapp CS, Day TJ, Honig B, Shaw DE, Friesner RA (2004) A hierarchical approach to all-atom protein loop prediction. Proteins Struct Funct Bioinform 55(2):351–367Google Scholar
  53. 53.
    Jacobson MP, Friesner RA, Xiang Z, Honig B (2002) On the role of the crystal environment in determining protein side-chain conformations. J Mol Biol 320(3):597–608Google Scholar
  54. 54.
    Farid R, Day T, Friesner RA, Pearlstein RA (2006) New insights about herg blockade obtained from protein modeling, potential energy mapping, and docking studies. Bioorg Med Chem 14(9):3160–3173Google Scholar
  55. 55.
    Sherman W, Day T, Jacobson MP, Friesner RA, Farid R (2006) Novel procedure for modeling ligand/receptor induced fit effects. J Med Chem 49(2):534–553Google Scholar
  56. 56.
    Sherman W, Beard HS, Farid R (2006) Use of an induced fit receptor structure in virtual screening. Chem Biol Drug Des 67(1):83–84Google Scholar
  57. 57.
    Borgatti SP (2005) Centrality and network flow. Soc Netw 27(1):55–71Google Scholar
  58. 58.
    Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239Google Scholar
  59. 59.
    Bavelas A (1950) Communication patterns in task-oriented groups. J Acoust Soc Am 22(6):725–730Google Scholar
  60. 60.
    Dekker A (2005) Conceptual distance in social network analysis. J Soc Struct 6Google Scholar
  61. 61.
    Edelsbrunner H (1992) Weighted alpha shapes. Technical report. University of Illinois, ChampaignGoogle Scholar
  62. 62.
    Nguyen DD, Wei GW (2018) Multiscale weighted colored algebraic graphs for biomolecules (to be submitted)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Duc Duy Nguyen
    • 1
  • Zixuan Cang
    • 1
  • Kedi Wu
    • 1
  • Menglun Wang
    • 1
  • Yin Cao
    • 1
  • Guo-Wei Wei
    • 1
    • 2
    • 3
    Email author
  1. 1.Department of MathematicsMichigan State UniversityEast Lansing USA
  2. 2.Department of Electrical and Computer EngineeringMichigan State UniversityEast Lansing USA
  3. 3.Department of Biochemistry and Molecular BiologyMichigan State University East LansingUSA

Personalised recommendations