A consistent description of HYdrogen bond and DEhydration energies in protein–ligand complexes: methods behind the HYDE scoring function


The estimation of free energy of binding is a key problem in structure-based design. We developed the scoring function HYDE based on a consistent description of HYdrogen bond and DEhydration energies in protein–ligand complexes. HYDE is applicable to all types of protein targets since it is not calibrated on experimental binding affinity data or protein–ligand complexes. The comprehensible atom-based score of HYDE is visualized by applying a very intuitive coloring scheme, thereby facilitating the analysis of protein–ligand complexes in the lead optimization process. In this paper, we have revised several aspects of the former version of HYDE which was described in detail previously. The revised HYDE version was already validated in large-scale redocking and screening experiments which were performed in the course of the Docking and Scoring Symposium at 241st ACS National Meeting. In this study, we additionally evaluate the ability of the revised HYDE version to predict binding affinities. On the PDBbind 2007 coreset, HYDE achieves a correlation coefficient of 0.62 between the experimental binding constants and the predicted binding energy, performing second best on this dataset compared to 17 other well-established scoring functions. Further, we show that the performance of HYDE in large-scale redocking and virtual screening experiments on the Astex diverse set and the DUD dataset respectively, is comparable to the best methods in this field.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. 1.

    Jorgensen WL (2004) The many roles of computation in drug discovery. Science 303:813–1818

    Article  Google Scholar 

  2. 2.

    Matter H, Sotriffer C (2011) In: Sotriffer C (ed) Virtual screening: principles, challenges and practical guidelines, 1st edn. Wiley-VCH, Weinheim

    Google Scholar 

  3. 3.

    Cheng T, Li X, Li Y, Liu Z, Wang R (2009) Comparative assessment of scoring functions on a diverse test set. J Chem Inf Mod 49:1079–1093

    Article  CAS  Google Scholar 

  4. 4.

    Moitessier N, Englebienne P, Lee D, Lawandi J, Corbeil CR (2008) Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br J Pharmacol 153:7–26

    Article  Google Scholar 

  5. 5.

    Sotriffer C, Matter H (2011) In: Sotriffer C (ed) Virtual screening: principles, challenges and practical guidelines, 1st edn. Wiley-VCH, Weinheim

    Google Scholar 

  6. 6.

    Böhm HJ (1994) The development of a simple empirical scoring function to estimate the binding constant for a protein–ligand complex of known three-dimensional structure. J Comput Aided Mol Design 8:243–256

    Article  Google Scholar 

  7. 7.

    Rarey M, Kramer B, Lengauer T, Klebe G (1996) A fast flexible docking method using an incremental construction algorithm. J Mol Biol 261:470–489

    Article  CAS  Google Scholar 

  8. 8.

    Savage HJ, Elliott CJ, Freeman CM, Finney JL (1993) Lost hydrogen bonds and buried surface area: rationalising stability in globular proteins. J Chem Soc, Faraday Trans 89:2609–2617

    Article  CAS  Google Scholar 

  9. 9.

    Bissantz C, Kuhn B, Stahl M (2010) A medicinal chemist’s guide to molecular interactions. J Med Chem 53(14):5061–5084

    Article  CAS  Google Scholar 

  10. 10.

    Pham TA, Jain AN (2006) Parameter estimation for scoring protein–ligand interactions using negative training data. J Med Chem 49:5856–5868

    Article  CAS  Google Scholar 

  11. 11.

    Krammer A, Kirchhoff PD, Jiang X, Venkatachalam CM, Waldman M (2005) LigScore: a novel scoring function for predicting binding affinities. J Mol Graph Model 23:395–407

    Article  CAS  Google Scholar 

  12. 12.

    Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, Sanschagrin PC, Mainz DT (2006) Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein–ligand complexes. J Med Chem 49:6177–6196

    Article  CAS  Google Scholar 

  13. 13.

    Sotriffer CA, Sanschagrin P, Matter H, Klebe G (2008) SFCscore: scoring functions for affinity prediction of protein–ligand complexes. Proteins 73:395–419

    Article  CAS  Google Scholar 

  14. 14.

    Mysinger MM, Shoichet BK (2010) Rapid context-dependent ligand desolvation in molecular docking. J Chem Inf Model 50:1561–1573

    Article  CAS  Google Scholar 

  15. 15.

    Kellogg GE, Burnett JC, Abraham DJ (2001) Very empirical treatment of solvation and entropy: a force field derived from Log Po/w. J Comput Aided Mol Des 15:381–393

    Article  CAS  Google Scholar 

  16. 16.

    Wang R, Lai L, Wang S (2002) Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J Comput Aided Mol Des 16:11–26

    Article  CAS  Google Scholar 

  17. 17.

    Reulecke I, Lange G, Albrecht J, Klein R, Rarey M (2008) Towards an integrated description of hydrogen bonding and dehydration: reducing false positives in virtual screening with the hyde scoring function. ChemMedChem 3(6):885–897

    Article  CAS  Google Scholar 

  18. 18.

    Lange G, Klein R, Albrecht J, Rarey M, Reulecke I (2010) European patent specification EP2084520

  19. 19.

    Schneider N, Hindle S, Lange G, Klein R, Albrecht J, Briem H, Beyer K, Claußen H, Gastreich M, Lemmen C, Rarey R (2012) Substantial improvements in large-scale redocking and screening using the novel HYDE scoring function. J Comput Aided Mol Des 26:701–723

    Article  CAS  Google Scholar 

  20. 20.

    Richards FM (1977) Areas, volumes, packing, and protein structures. Ann Rev Biophys Bioeng 6:151–176

    Article  CAS  Google Scholar 

  21. 21.

    Connolly ML (1983) Solvent-accessible surfaces of proteins and nucleic acids. Science 221:709–713

    Article  CAS  Google Scholar 

  22. 22.

    Connolly ML (1983) Analytical molecular surface calculation. J Appl Cryst 16:548–558

    Article  CAS  Google Scholar 

  23. 23.

    Stefano Forli, Olson AJ (2012) A force field with discrete waters and desolvation entropy for hydrated ligand docking. J Med Chem 55:623–638

    Article  Google Scholar 

  24. 24.

    Schneider N, Klein R, Lange G, Rarey M (2012) Nearly no scoring function without a Hansch-analysis. Mol Inf 31:503–507

    Article  CAS  Google Scholar 

  25. 25.

    Stahl M (2000) Modifications of the scoring function in FlexX for virtual screening applications. Perspect Drug Discov 20:83–98

    Article  CAS  Google Scholar 

  26. 26.

    LeadIT. BioSolveIT GmbH, Sankt Augustin. http://www.biosolveit.de/leadit/. Accessed 12 June 2012

  27. 27.

    Physprop database. http://www.syrres.com/esc/physprop.htm. Accessed 12 June 2012

  28. 28.

    Hansch C, Leo AJ (1985) Medchem project issue no. 26. Pomona College, Claremont, CA

  29. 29.

    Hansch C, Leo AJ (1987) The log P database. Pomona College, Claremont, CA

    Google Scholar 

  30. 30.

    Hansch C, Leo A, Hoekman D (1995) Exploring QSAR. Hydrophobic, electronic, and steric constants. American Chemical Society, Washington, DC

  31. 31.

    Leo AJ (1993) Calculating log Poct from structures. Chem Rev 93:1281–1306

    Article  CAS  Google Scholar 

  32. 32.

    Lee B, Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55:379–400

    Article  CAS  Google Scholar 

  33. 33.

    Shrake A, Rupley JA (1973) Environment and exposure to solvent of protein atoms, lysozyme and insulin. J Mol Biol 79:351–371

    Article  CAS  Google Scholar 

  34. 34.

    Bondi A (1964) Van der Waals volumes and radii. J Phys Chem 68:441–451

    Article  CAS  Google Scholar 

  35. 35.

    Hartshorn MJ, Verdonk ML, Chessari G, Brewerton SC, Mooij WTM, Mortenson PN, Murray CW (2007) Diverse, high-quality test set for the validation of protein–ligand docking performance. J Med Chem 50:726–741

    Article  CAS  Google Scholar 

  36. 36.

    Seebeck B, Reulecke I, Kämper A, Rarey M (2008) Modeling of metal interaction geometries for protein–ligand docking. Protein Struct Funct Bioinform 71:1237–1254

    Article  CAS  Google Scholar 

  37. 37.

    Lippert T, Rarey M (2009) Fast automated placement of polar hydrogen atoms in protein–ligand complexes. J Cheminf 1:13

    Article  Google Scholar 

  38. 38.

    Wang R, Fang X, Lu Y, Wang S (2004) The PDBbind database: collection of binding affinities for protein–ligand complexes with known three-dimensional structures. J Med Chem 47:2977–2980

    Article  CAS  Google Scholar 

  39. 39.

    Wang R, Fang X, Lu Y, Yang CY, Wang S (2005) The PDBbind database: methodologies and updates. J Med Chem 48:4111–4119

    Article  CAS  Google Scholar 

  40. 40.

    Jones G, Willett P, Glen RC (1995) Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol 245:43–53

    Article  CAS  Google Scholar 

  41. 41.

    Jones G, Willett P, Glen RC, Leach AR, Taylor R (1997) Development and validation of a genetic algorithm for flexible docking. J Mol Biol 267:727–748

    Article  CAS  Google Scholar 

  42. 42.

    Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD (2003) Improved protein–ligand docking using GOLD. Proteins 52:609–623

    Article  CAS  Google Scholar 

  43. 43.

    Korb O, Stützle T, Exner TE (2006) PLANTS: application of ant colony optimization to structure-based drug design. Lect Notes Comput Sci 4150:247–258

    Article  Google Scholar 

  44. 44.

    Korb O, Stützle T, Exner TE (2007) An ant colony optimization approach to flexible protein–ligand docking. Swarm Intel 1(2):115–134

    Article  Google Scholar 

  45. 45.

    Korb O, Stützle T, Exner TE (2009) Empirical scoring functions for advanced protein–ligand docking with PLANTS. J Chem Inf Mod 49:84–96

    Article  CAS  Google Scholar 

  46. 46.

    Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49(23):6789–6801

    Article  CAS  Google Scholar 

  47. 47.

    Baum B, Mohamed M, Zayed M, Gerlach C, Heine A, Hangauer D, Klebe G (2009) More than a simple lipophilic contact: a detailed thermodynamic analysis of nonbasic residues in the s1 pocket of thrombin. J Mol Biol 390:56–69

    Article  CAS  Google Scholar 

  48. 48.

    Regan J, Breitfelder S, Cirillo P, Gilmore T, Graham AG, Hickey E, Klaus B, Madwed J, Moriak M, Moss N, Pargellis C, Pav S, Proto A, Swinamer A, Tong L, Torcellini C (2002) Pyrazole urea-based inhibitors of p38 MAP kinase: from lead compound to clinical candidate. J Med Chem 45:2994–3008

    Article  CAS  Google Scholar 

  49. 49.

    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  CAS  Google Scholar 

  50. 50.

    Urbaczek S, Kolodzik A, Fischer JR, Lippert T, Heuser S, Groth I, Schulz-Gasch T, Rarey M (2011) NAOMI—on the almost trivial task of reading molecules from different file formats. J Chem Inf Mod 51:3199–3207

    Article  CAS  Google Scholar 

  51. 51.

    Tang YT, Marshall GR (2011) PHOENIX: a scoring function for affinity prediction derived using high-resolution crystal structures and calorimetry measurements. J Chem Inf Mod 51:214–228

    Article  CAS  Google Scholar 

  52. 52.

    Sondergaard CR, Garrett AE, Carstensen T, Pollastri G, Nielsen JE (2009) Structural artifacts in protein–ligand X-ray structures: implications for the development of docking scoring functions. J Med Chem 52:5673–5684

    Article  CAS  Google Scholar 

  53. 53.

    Sadowski J, Gasteiger J, Klebe G (1994) Comparison of automatic three-dimensional model builders using 639 X-ray structures. J Chem Inf Comput Sci 34:1000–1008

    Article  CAS  Google Scholar 

  54. 54.

    CORINA. Molecular Networks GmbH, Erlangen, Germany. http://www.molecular-networks.com/products/corina. Accessed 12 June 2011

  55. 55.

    Irwin JJ, Shoichet BK (2005) ZINC—a free database of commercially available compounds for virtual screening. J Chem Inf Model 45:177–182

    Article  CAS  Google Scholar 

  56. 56.

    Repasky MP, Murphy RB, Banks JL, Greenwood JR, Tubert-Brohman I, Bhat S, Friesner RA (2012) Docking performance of the glide program as evaluated on the Astex and DUD datasets: a complete set of glide SP results and selected results for a new scoring function integrating WaterMap and glide. J Comput Aided Mol Des 26:787–799

    Article  CAS  Google Scholar 

  57. 57.

    Liebeschuetz JW, Cole JC, Korb O (2012) Pose prediction and virtual screening performance of GOLD scoring functions in a standardized test. J Comput Aided Mol Des 26:737–748

    Article  CAS  Google Scholar 

  58. 58.

    Neves MAC, Totrov M, Abagyan R (2012) Docking and scoring with ICM: the benchmarking results and strategies for improvement. J Comput Aided Mol Des 26:675–686

    Article  CAS  Google Scholar 

  59. 59.

    McGann M (2011) FRED pose prediction and virtual screening accuracy. J Chem Inf Mod 51(3):578–596

    Article  CAS  Google Scholar 

  60. 60.

    Brozell SR, Mukherjee S, Balius TE, Roe DR, Case DA, Rizzo RC (2012) Evaluation of DOCK 6 as a pose generation and database enrichment tool. J Comput Aided Mol Des 26:749–773

    Article  CAS  Google Scholar 

Download references


The authors want to thank Hans Briem and Kristin Beyer of Bayer Pharma AG and Jürgen Albrecht of Bayer CropScience AG for many fruitful discussions and a successful cooperation. We also thank Holger Claussen, Marcus Gastreich and Christian Lemmen of BioSolveIT GmbH for their on-going support during the development of HYDE, particularly for the meticulous testing and analysis of HYDE and resulting valuable feedback. The HYDE project was funded by Bayer CropScience AG and Bayer Pharma AG.

Author information



Corresponding author

Correspondence to Matthias Rarey.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 183 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Schneider, N., Lange, G., Hindle, S. et al. A consistent description of HYdrogen bond and DEhydration energies in protein–ligand complexes: methods behind the HYDE scoring function. J Comput Aided Mol Des 27, 15–29 (2013). https://doi.org/10.1007/s10822-012-9626-2

Download citation


  • Protein–ligand interactions
  • Desolvation
  • Binding affinity
  • Virtual screening
  • Lead optimization
  • Docking