Journal of Computer-Aided Molecular Design

, Volume 32, Issue 4, pp 511–528 | Cite as

Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes

  • Sebastian Raschka
  • Alex J. Wolf
  • Joseph Bemister-Buffington
  • Leslie A. Kuhn


Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein–ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein–ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.


Interaction patterns Drug design Protein–ligand recognition Specificity determinants Ligand optimization Lipinski’s Rule of 5 





Class Architecture Topology Homologous superfamily


Hydrogen bonds


Merck Molecular Force Field


Protein Data Bank


Protein Recognition Index



This research was supported by funding from the Great Lakes Fishery Commission (Project ID: 2015_KUH_54031). We gratefully acknowledge OpenEye Scientific Software (Santa Fe, NM) for providing academic licenses for the use of their QUACPAC (molcharge) and OEChem software. We also thank the following lab graduates for their contributions to this research: Dr. Maria Zavodszky (now at GE Global Research Center), who observed that hydroxyl-rich ligands tended to result in false positives in screening, Dr. Amy Cayemberg McQuade (now at Carroll University) for carrying out the statistical analysis of protein-water-ligand hydrogen-bond bridges, and Dr. Jeffrey VanVoorst (now at Veritas Technologies, LLC) for developing the non-homologous dataset of 136 protein-small molecule complexes analyzed here. We thank Dr. Michael Feig (Michigan State University) for discussions on the biological basis for the prevalence of oxygen versus nitrogen in natural ligands and also appreciate the data he provided on the atomic composition of metabolites in Mycoplasma genitalium.

Supplementary material

10822_2018_105_MOESM1_ESM.xlsx (45 kb)
Supplementary material 1 (XLSX 45 KB)
10822_2018_105_MOESM2_ESM.txt (0 kb)
Supplementary material 2 (TXT 0 KB)
10822_2018_105_MOESM3_ESM.mol2 (573 kb)
Supplementary material 3 (MOL2 572 KB)
10822_2018_105_MOESM4_ESM.txt (189 kb)
Supplementary material 4 (TXT 188 KB)
10822_2018_105_MOESM5_ESM.pdf (348 kb)
Supplementary material 5 (PDF 348 KB)
10822_2018_105_MOESM6_ESM.xlsx (43 kb)
Supplementary material 6 (XLSX 42 KB)
10822_2018_105_MOESM7_ESM.xlsx (49 kb)
Supplementary material 7 (XLSX 48 KB)


  1. 1.
    Zavodszky MI, Sanschagrin PC, Korde RS, Kuhn LA (2002) Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening. J Comput Aided Mol Des 16:883–902CrossRefGoogle Scholar
  2. 2.
    Sukuru SCK, Crepin T, Milev Y, Marsh LC, Hill JB, Anderson RJ, Morris JC, Rohatgi A, O’Mahony G, Grøtli M et al. (2006) Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20:159–178CrossRefGoogle Scholar
  3. 3.
    Zavodszky MI, Rohatgi A, Van Voorst JR, Yan H, Kuhn LA (2009) Scoring ligand similarity in structure-based virtual screening. J Mol Recognit 22:280–292CrossRefGoogle Scholar
  4. 4.
    Van Voorst JR, Tong Y, Kuhn LA (2012) ArtSurf: a method for deformable partial matching of protein small-molecule binding sites. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp 36–43Google Scholar
  5. 5.
    Nittinger E, Inhester T, Bietz S, Meyder A, Schomburg KT, Lange G, Klein R, Rarey M (2017) Large-scale analysis of hydrogen bond interaction patterns in protein-ligand interfaces. J Med Chem 60:4245–4257CrossRefGoogle Scholar
  6. 6.
    McDonald I, Thornton JM (1994) Atlas of side-chain and main-chain hydrogen bonding. Biochemistry and Molecular Biology Department, University College London, London.
  7. 7.
    Panigrahi SK, Desiraju GR (2007) Strong and weak hydrogen bonds in the protein–ligand interface. Proteins Struct Funct Bioinform 67:128–141CrossRefGoogle Scholar
  8. 8.
    Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25CrossRefGoogle Scholar
  9. 9.
    Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I (2016) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45:289–295CrossRefGoogle Scholar
  10. 10.
    Ahmed A, Smith RD, Clark JJ, Dunbar JB Jr, Carlson HA (2014) Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res 43:465–469CrossRefGoogle Scholar
  11. 11.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2006) The Protein Data Bank. In: Rossmann MG, Arnold E (eds) International tables crystallography volume F: crystallography biological macromolecules. Springer, New York, pp 675–684CrossRefGoogle Scholar
  12. 12.
    Warren GL, Do TD, Kelley BP, Nicholls A, Warren SD (2012) Essential considerations for using protein–ligand structures in drug discovery. Drug Discov Today 17:1270–1281CrossRefGoogle Scholar
  13. 13.
    Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K (2009) Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins Struct Funct Bioinform 77:114–122CrossRefGoogle Scholar
  14. 14.
    Krieger E, Dunbrack RL, Hooft RWW, Krieger B (2012) Assignment of protonation states in proteins and ligands: combining pKa prediction with hydrogen bonding network optimization. Methods Mol Biol Comput Drug Discov Des 819:405–421CrossRefGoogle Scholar
  15. 15.
    Colominas C, Luque FJ, Orozco M (1996) Tautomerism and protonation of guanine and cytosine: implications in the formation of hydrogen-bonded complexes. J Am Chem Soc 118:6811–6821CrossRefGoogle Scholar
  16. 16.
    Krieger E, Darden T, Nabuurs SB, Finkelstein A, Vriend G (2004) Making optimal use of empirical energy functions: force-field parameterization in crystal space. Proteins Struct Funct Bioinform 57:678–683CrossRefGoogle Scholar
  17. 17.
    Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges: AM1-BCC model: II—parameterization and validation. J Comput Chem 23:1623–1641CrossRefGoogle Scholar
  18. 18.
    DeLano WL (2002) Pymol: an open-source molecular graphics tool. CCP4 News on Protein Crystallogr 40:82–92Google Scholar
  19. 19.
    Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519CrossRefGoogle Scholar
  20. 20.
    Tripos (2007) Tripos Mol2 file format. St Louis, MO,
  21. 21.
    Pauling L (1960) The nature of the chemical bond and the structure of molecules and crystals: an introduction to modern structural chemistry. Cornell University Press, IthacaGoogle Scholar
  22. 22.
    Ippolito JA, Alexander RS, Christianson DW (1990) Hydrogen bond stereochemistry in protein structure and function. J Mol Biol 215:457–471CrossRefGoogle Scholar
  23. 23.
    Prakash B, Renault L, Praefcke GJK, Herrmann C, Wittinghofer A (2000) Triphosphate structure of guanylate-binding protein 1 and implications for nucleotide binding and GTPase mechanism. EMBO J 19:4555–4564CrossRefGoogle Scholar
  24. 24.
    Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30CrossRefGoogle Scholar
  25. 25.
    Jones E, Oliphant T, Peterson P (2001) SciPy: open source scientific tools for Python,
  26. 26.
    McKinney W (2010) Data structures for statistical computing Python. In: Millman J, van der Walt S (eds) Proceeding of 9th Python Science Conference, pp 51–56Google Scholar
  27. 27.
    Raschka S (2017) BioPandas: working with molecular structures in pandas DataFrames. J Open Source Softw 2:1–3CrossRefGoogle Scholar
  28. 28.
    Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95CrossRefGoogle Scholar
  29. 29.
    Hong S, Kim D (2016) Interaction between bound water molecules and local protein structures: a statistical analysis of the hydrogen bond structures around bound water molecules. Proteins Struct Funct Bioinform 84:43–51CrossRefGoogle Scholar
  30. 30.
    Miyazawa S, Jernigan RL (1996) Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256:623–644CrossRefGoogle Scholar
  31. 31.
    Glaser F, Steinberg DM, Vakser IA, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein–protein interfaces. Proteins Struct Funct Bioinform 43:89–102CrossRefGoogle Scholar
  32. 32.
    Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA (1997) Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J Mol Biol 265:445–464CrossRefGoogle Scholar
  33. 33.
    Shan S, Herschlag D (1996) The change in hydrogen bond strength accompanying charge rearrangement: Implications for enzymatic catalysis. Proc Natl Acad Sci 93:14474–14479CrossRefGoogle Scholar
  34. 34.
    Bianchi A, Giorgi C, Ruzza P, Toniolo C, Milner-White EJ (2012) A synthetic hexapeptide designed to resemble a proteinaceous p-loop nest is shown to bind inorganic phosphate. Proteins Struct Funct Bioinform 80:1418–1424CrossRefGoogle Scholar
  35. 35.
    Coleman DE, Sprang SR (1999) Structure of Giα1·GppNHp, autoinhibition in a Gα protein-substrate complex. J Biol Chem 274:16669–16672CrossRefGoogle Scholar
  36. 36.
    Palumbi SR (2001) Humans as the world’s greatest evolutionary force. Science 293:1786–1790CrossRefGoogle Scholar
  37. 37.
    Taylor R, Kennard O (1984) Hydrogen-bond geometry in organic crystals. Acc Chem Res 17:320–326CrossRefGoogle Scholar
  38. 38.
    Sanschagrin PC, Kuhn LA (1998) Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Sci 7:2054–2064CrossRefGoogle Scholar
  39. 39.
    Kuhn LA, Swanson CA, Pique ME, Tainer JA, Getzoff ED (1995) Atomic and residue hydrophilicity in the context of folded protein structures. Proteins Struct Funct Bioinforma 23:536–547CrossRefGoogle Scholar
  40. 40.
    Gunner MR, Saleh MA, Cross E, Wise M et al. (2000) Backbone dipoles generate positive potentials in all proteins: origins and implications of the effect. Biophys J 78:1126–1144CrossRefGoogle Scholar
  41. 41.
    Rubin K Ask an Earth Scientist, Accessed 17 Jan 2018
  42. 42.
    Feig M, Harada R, Mori T, Yu I, Takahashi K, Sugita Y (2015) Complete atomistic model of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. J Mol Graph Model 58:1–9CrossRefGoogle Scholar
  43. 43.
    Raschka S, Bemister-Buffington J, Kuhn LA (2016) Detecting the native ligand orientation by interfacial rigidity: SiteInterlock. Proteins Struct Funct Bioinform 84:1888–1901CrossRefGoogle Scholar
  44. 44.
    Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461Google Scholar
  45. 45.
    Neudert G, Klebe G (2011) DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51:2731–2745CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Protein Structural Analysis and Design Lab, Department of Biochemistry and Molecular BiologyMichigan State UniversityEast LansingUSA
  2. 2.Department of Computer Science and EngineeringMichigan State UniversityEast LansingUSA

Personalised recommendations