Skip to main content
Log in

Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Understanding how proteins encode ligand specificity is fascinating and similar in importance to deciphering the genetic code. For protein–ligand recognition, the combination of an almost infinite variety of interfacial shapes and patterns of chemical groups makes the problem especially challenging. Here we analyze data across non-homologous proteins in complex with small biological ligands to address observations made in our inhibitor discovery projects: that proteins favor donating H-bonds to ligands and avoid using groups with both H-bond donor and acceptor capacity. The resulting clear and significant chemical group matching preferences elucidate the code for protein-native ligand binding, similar to the dominant patterns found in nucleic acid base-pairing. On average, 90% of the keto and carboxylate oxygens occurring in the biological ligands formed direct H-bonds to the protein. A two-fold preference was found for protein atoms to act as H-bond donors and ligand atoms to act as acceptors, and 76% of all intermolecular H-bonds involved an amine donor. Together, the tight chemical and geometric constraints associated with satisfying donor groups generate a hydrogen-bonding lock that can be matched only by ligands bearing the right acceptor-rich key. Measuring an index of H-bond preference based on the observed chemical trends proved sufficient to predict other protein–ligand complexes and can be used to guide molecular design. The resulting Hbind and Protein Recognition Index software packages are being made available for rigorously defining intermolecular H-bonds and measuring the extent to which H-bonding patterns in a given complex match the preference key.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Abbreviations

3D:

Three-dimensional

CATH:

Class Architecture Topology Homologous superfamily

H-bonds:

Hydrogen bonds

MMFF94:

Merck Molecular Force Field

PDB:

Protein Data Bank

PRI:

Protein Recognition Index

References

  1. Zavodszky MI, Sanschagrin PC, Korde RS, Kuhn LA (2002) Distilling the essential features of a protein surface for improving protein-ligand docking, scoring, and virtual screening. J Comput Aided Mol Des 16:883–902

    Article  CAS  Google Scholar 

  2. Sukuru SCK, Crepin T, Milev Y, Marsh LC, Hill JB, Anderson RJ, Morris JC, Rohatgi A, O’Mahony G, Grøtli M et al. (2006) Discovering new classes of Brugia malayi asparaginyl-tRNA synthetase inhibitors and relating specificity to conformational change. J Comput Aided Mol Des 20:159–178

    Article  CAS  Google Scholar 

  3. Zavodszky MI, Rohatgi A, Van Voorst JR, Yan H, Kuhn LA (2009) Scoring ligand similarity in structure-based virtual screening. J Mol Recognit 22:280–292

    Article  CAS  Google Scholar 

  4. Van Voorst JR, Tong Y, Kuhn LA (2012) ArtSurf: a method for deformable partial matching of protein small-molecule binding sites. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, pp 36–43

  5. Nittinger E, Inhester T, Bietz S, Meyder A, Schomburg KT, Lange G, Klein R, Rarey M (2017) Large-scale analysis of hydrogen bond interaction patterns in protein-ligand interfaces. J Med Chem 60:4245–4257

    Article  CAS  Google Scholar 

  6. McDonald I, Thornton JM (1994) Atlas of side-chain and main-chain hydrogen bonding. Biochemistry and Molecular Biology Department, University College London, London. http://www.biochem.ucl.ac.uk/bsm/atlas

  7. Panigrahi SK, Desiraju GR (2007) Strong and weak hydrogen bonds in the protein–ligand interface. Proteins Struct Funct Bioinform 67:128–141

    Article  CAS  Google Scholar 

  8. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (1997) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 23:3–25

    Article  CAS  Google Scholar 

  9. Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I (2016) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45:289–295

    Article  Google Scholar 

  10. Ahmed A, Smith RD, Clark JJ, Dunbar JB Jr, Carlson HA (2014) Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res 43:465–469

    Article  Google Scholar 

  11. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2006) The Protein Data Bank. In: Rossmann MG, Arnold E (eds) International tables crystallography volume F: crystallography biological macromolecules. Springer, New York, pp 675–684

    Chapter  Google Scholar 

  12. Warren GL, Do TD, Kelley BP, Nicholls A, Warren SD (2012) Essential considerations for using protein–ligand structures in drug discovery. Drug Discov Today 17:1270–1281

    Article  CAS  Google Scholar 

  13. Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, Tyka M, Baker D, Karplus K (2009) Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins Struct Funct Bioinform 77:114–122

    Article  CAS  Google Scholar 

  14. Krieger E, Dunbrack RL, Hooft RWW, Krieger B (2012) Assignment of protonation states in proteins and ligands: combining pKa prediction with hydrogen bonding network optimization. Methods Mol Biol Comput Drug Discov Des 819:405–421

    Article  CAS  Google Scholar 

  15. Colominas C, Luque FJ, Orozco M (1996) Tautomerism and protonation of guanine and cytosine: implications in the formation of hydrogen-bonded complexes. J Am Chem Soc 118:6811–6821

    Article  CAS  Google Scholar 

  16. Krieger E, Darden T, Nabuurs SB, Finkelstein A, Vriend G (2004) Making optimal use of empirical energy functions: force-field parameterization in crystal space. Proteins Struct Funct Bioinform 57:678–683

    Article  CAS  Google Scholar 

  17. Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges: AM1-BCC model: II—parameterization and validation. J Comput Chem 23:1623–1641

    Article  CAS  Google Scholar 

  18. DeLano WL (2002) Pymol: an open-source molecular graphics tool. CCP4 News on Protein Crystallogr 40:82–92

    Google Scholar 

  19. Halgren TA (1996) Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J Comput Chem 17:490–519

    Article  CAS  Google Scholar 

  20. Tripos (2007) Tripos Mol2 file format. St Louis, MO, http://www.tripos.com/data/support/mol2.pdf

  21. Pauling L (1960) The nature of the chemical bond and the structure of molecules and crystals: an introduction to modern structural chemistry. Cornell University Press, Ithaca

    Google Scholar 

  22. Ippolito JA, Alexander RS, Christianson DW (1990) Hydrogen bond stereochemistry in protein structure and function. J Mol Biol 215:457–471

    Article  CAS  Google Scholar 

  23. Prakash B, Renault L, Praefcke GJK, Herrmann C, Wittinghofer A (2000) Triphosphate structure of guanylate-binding protein 1 and implications for nucleotide binding and GTPase mechanism. EMBO J 19:4555–4564

    Article  CAS  Google Scholar 

  24. Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30

    Article  Google Scholar 

  25. Jones E, Oliphant T, Peterson P (2001) SciPy: open source scientific tools for Python, http://www.scipy.org

  26. McKinney W (2010) Data structures for statistical computing Python. In: Millman J, van der Walt S (eds) Proceeding of 9th Python Science Conference, pp 51–56

  27. Raschka S (2017) BioPandas: working with molecular structures in pandas DataFrames. J Open Source Softw 2:1–3

    Article  Google Scholar 

  28. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95

    Article  Google Scholar 

  29. Hong S, Kim D (2016) Interaction between bound water molecules and local protein structures: a statistical analysis of the hydrogen bond structures around bound water molecules. Proteins Struct Funct Bioinform 84:43–51

    Article  Google Scholar 

  30. Miyazawa S, Jernigan RL (1996) Residue–residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J Mol Biol 256:623–644

    Article  CAS  Google Scholar 

  31. Glaser F, Steinberg DM, Vakser IA, Ben-Tal N (2001) Residue frequencies and pairing preferences at protein–protein interfaces. Proteins Struct Funct Bioinform 43:89–102

    Article  CAS  Google Scholar 

  32. Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA (1997) Predicting conserved water-mediated and polar ligand interactions in proteins using a K-nearest-neighbors genetic algorithm. J Mol Biol 265:445–464

    Article  CAS  Google Scholar 

  33. Shan S, Herschlag D (1996) The change in hydrogen bond strength accompanying charge rearrangement: Implications for enzymatic catalysis. Proc Natl Acad Sci 93:14474–14479

    Article  CAS  Google Scholar 

  34. Bianchi A, Giorgi C, Ruzza P, Toniolo C, Milner-White EJ (2012) A synthetic hexapeptide designed to resemble a proteinaceous p-loop nest is shown to bind inorganic phosphate. Proteins Struct Funct Bioinform 80:1418–1424

    Article  CAS  Google Scholar 

  35. Coleman DE, Sprang SR (1999) Structure of Giα1·GppNHp, autoinhibition in a Gα protein-substrate complex. J Biol Chem 274:16669–16672

    Article  CAS  Google Scholar 

  36. Palumbi SR (2001) Humans as the world’s greatest evolutionary force. Science 293:1786–1790

    Article  CAS  Google Scholar 

  37. Taylor R, Kennard O (1984) Hydrogen-bond geometry in organic crystals. Acc Chem Res 17:320–326

    Article  CAS  Google Scholar 

  38. Sanschagrin PC, Kuhn LA (1998) Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Sci 7:2054–2064

    Article  CAS  Google Scholar 

  39. Kuhn LA, Swanson CA, Pique ME, Tainer JA, Getzoff ED (1995) Atomic and residue hydrophilicity in the context of folded protein structures. Proteins Struct Funct Bioinforma 23:536–547

    Article  CAS  Google Scholar 

  40. Gunner MR, Saleh MA, Cross E, Wise M et al. (2000) Backbone dipoles generate positive potentials in all proteins: origins and implications of the effect. Biophys J 78:1126–1144

    Article  CAS  Google Scholar 

  41. Rubin K Ask an Earth Scientist, https://www.soest.hawaii.edu/GG/ASK/atmo-nitrogen.html. Accessed 17 Jan 2018

  42. Feig M, Harada R, Mori T, Yu I, Takahashi K, Sugita Y (2015) Complete atomistic model of a bacterial cytoplasm for integrating physics, biochemistry, and systems biology. J Mol Graph Model 58:1–9

    Article  CAS  Google Scholar 

  43. Raschka S, Bemister-Buffington J, Kuhn LA (2016) Detecting the native ligand orientation by interfacial rigidity: SiteInterlock. Proteins Struct Funct Bioinform 84:1888–1901

    Article  CAS  Google Scholar 

  44. Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461

    CAS  Google Scholar 

  45. Neudert G, Klebe G (2011) DSX: a knowledge-based scoring function for the assessment of protein-ligand complexes. J Chem Inf Model 51:2731–2745

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by funding from the Great Lakes Fishery Commission (Project ID: 2015_KUH_54031). We gratefully acknowledge OpenEye Scientific Software (Santa Fe, NM) for providing academic licenses for the use of their QUACPAC (molcharge) and OEChem software. We also thank the following lab graduates for their contributions to this research: Dr. Maria Zavodszky (now at GE Global Research Center), who observed that hydroxyl-rich ligands tended to result in false positives in screening, Dr. Amy Cayemberg McQuade (now at Carroll University) for carrying out the statistical analysis of protein-water-ligand hydrogen-bond bridges, and Dr. Jeffrey VanVoorst (now at Veritas Technologies, LLC) for developing the non-homologous dataset of 136 protein-small molecule complexes analyzed here. We thank Dr. Michael Feig (Michigan State University) for discussions on the biological basis for the prevalence of oxygen versus nitrogen in natural ligands and also appreciate the data he provided on the atomic composition of metabolites in Mycoplasma genitalium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leslie A. Kuhn.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raschka, S., Wolf, A.J., Bemister-Buffington, J. et al. Protein–ligand interfaces are polarized: discovery of a strong trend for intermolecular hydrogen bonds to favor donors on the protein side with implications for predicting and designing ligand complexes. J Comput Aided Mol Des 32, 511–528 (2018). https://doi.org/10.1007/s10822-018-0105-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-018-0105-2

Keywords

Navigation