Skip to main content
Log in

Biomacromolecular quantitative structure–activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein–protein binding affinity

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

Quantitative structure–activity relationship (QSAR), a regression modeling methodology that establishes statistical correlation between structure feature and apparent behavior for a series of congeneric molecules quantitatively, has been widely used to evaluate the activity, toxicity and property of various small-molecule compounds such as drugs, toxicants and surfactants. However, it is surprising to see that such useful technique has only very limited applications to biomacromolecules, albeit the solved 3D atom-resolution structures of proteins, nucleic acids and their complexes have accumulated rapidly in past decades. Here, we present a proof-of-concept paradigm for the modeling, prediction and interpretation of the binding affinity of 144 sequence-nonredundant, structure-available and affinity-known protein complexes (Kastritis et al. Protein Sci 20:482–491, 2011) using a biomacromolecular QSAR (BioQSAR) scheme. We demonstrate that the modeling performance and predictive power of BioQSAR are comparable to or even better than that of traditional knowledge-based strategies, mechanism-type methods and empirical scoring algorithms, while BioQSAR possesses certain additional features compared to the traditional methods, such as adaptability, interpretability, deep-validation and high-efficiency. The BioQSAR scheme could be readily modified to infer the biological behavior and functions of other biomacromolecules, if their X-ray crystal structures, NMR conformation assemblies or computationally modeled structures are available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Hansch C, Fujita T (1964) ρ-σ-π analysis. A method for correlation of biological activity and chemical structure. J Am Chem Soc 86:1616–1626

    Article  CAS  Google Scholar 

  2. Katritzky AR, Lobanov VS, Karelson M (1995) QSPR: the correlation and quantitative prediction of chemical and physical properties from structure. Chem Soc Rev 24:279–287

    Article  CAS  Google Scholar 

  3. Siraki AG, Chevaldina T, Moridani MY, O’Brien PJ (2004) Quantitative structure–toxicity relationships by accelerated cytotoxicity mechanism screening. Curr Opin Drug Discov Devel 7:118–125

    CAS  Google Scholar 

  4. Mozrzymas A, Rózycka-Roszak B (2010) Prediction of critical micelle concentration of nonionic surfactants by a quantitative structure–property relationship. Comb Chem High Throughput Screen 13:39–44

    Article  CAS  Google Scholar 

  5. Fourches D, Pu D, Tassa C, Weissleder R, Shaw SY, Mumper RJ, Tropsha A (2010) Quantitative nanostructure–activity relationship modeling. ACS Nano 4:5703–5712

    Article  CAS  Google Scholar 

  6. Natesan S, Wang T, Lukacova V, Bartus V, Khandelwal A, Subramaniam R, Balaz S (2012) Cellular quantitative structure–activity relationship (Cell-QSAR): conceptual dissection of receptor binding and intracellular disposition in antifilarial activities of Selwood antimycins. J Med Chem 55:3699–3712

    Article  CAS  Google Scholar 

  7. Martin E, Mukherjee P, Sullivan D, Jansen J (2011) Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. J Chem Inf Model 51:1942–1956

    Article  CAS  Google Scholar 

  8. Winkler DA (2002) The role of quantitative structure–activity relationships (QSAR) in biomolecular discovery. Brief. Bioinform. 3:73–86

    Article  CAS  Google Scholar 

  9. Zhou P, Tian F, Wu Y, Li Z, Shang Z (2008) Quantitative sequence–activity model (QSAM): applying QSAR strategy to model and predict bioactivity and function of peptides, proteins and nucleic acids. Curr Comput Aided Drug Des 4:311–321

    Article  CAS  Google Scholar 

  10. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242

    Article  CAS  Google Scholar 

  11. Concu R, Podda G, González-Díaz H (2009) In quantitative structure-property relationships from bio-molecular to social networks. Nova Science Publisher, New York

    Google Scholar 

  12. González-Díaz H, Vilar S, Santana L, Uriarte E (2007) Medicinal chemistry and bioinformatics — current trends in drugs discovery with networks topological indices. Curr Top Med Chem 7:1025–1039

    Article  Google Scholar 

  13. González-Díaz H, Prado–Prado F, Perez-Montoto LG, Duardo-Sanchez A, Lopez-Diaz A (2009) QSAR models for proteins of parasitic organisms, plants and human guests: theory, applications, legal protection, taxes, and regulatory issues. Curr Proteomics 6:214–227

    Article  Google Scholar 

  14. Munteanu CR, González-Díaz H, Magalhaes AL (2008) Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. J Theor Biol 254:476–482

    Article  CAS  Google Scholar 

  15. González-Díaz H, Agüero-Chapin G, Varona J, Molina R, Delogu G, Santana L, Uriarte E, Gianni P (2007) 2D-RNAcoupling numbers: a new computational chemistry approach to link secondary structure topology with biological function. J Comput Chem 28:1049–1056

    Article  Google Scholar 

  16. Munteanu CR, Vázquez JM, Dorado J, Pazos-Sierra A, Sánchez-González A, Prado–Prado FJ, González-Díaz H (2009) Complex network spectral moments for ATCUN motif DNA cleavage: first predictive study on proteins of human pathogen parasites. J Proteome Res 8:5219–5228

    Article  CAS  Google Scholar 

  17. Neuvirth H, Raz R, Schreiber G (2004) ProMate: a structure based prediction program to identify the location of protein–protein binding sites. J Mol Biol 338:181–199

    Article  CAS  Google Scholar 

  18. Tian F, Lv Y, Yang L (2012) Structure-based prediction of protein–protein binding affinity with consideration of allosteric effect. Amino Acids 43:531–543

    Article  CAS  Google Scholar 

  19. Heuser P, Schomburg D (2007) Combination of scoring schemes for protein docking. BMC Bioinformatics 8:279

    Article  Google Scholar 

  20. Kastritis PL, Bonvin AM (2010) Are scoring functions in protein–protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res 9:2216–2225

    Article  CAS  Google Scholar 

  21. Kastritis PL, Moal IH, Hwang H, Weng Z, Bates PA, Bonvin AM, Janin J (2011) A structure-based benchmark for protein–protein binding affinity. Protein Sci 20:482–491

    Article  CAS  Google Scholar 

  22. Park C, Marqusee S (2004) Analysis of the stability of multimeric proteins by effective ΔG and effective m-values. Protein Sci 13:2553–2558

    Article  CAS  Google Scholar 

  23. Li H, Robertson AD, Jensen JH (2005) Very fast empirical prediction and rationalization of protein pK a values. Proteins 61:704–721

    Article  CAS  Google Scholar 

  24. Word JM, Lovell SC, Richardson JS, Richardson DC (1999) Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. J Mol Biol 285:1735–1747

    Article  CAS  Google Scholar 

  25. Krivov GG, Shapovalov MV, Dunbrack RL (2009) Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77:778–795

    Article  CAS  Google Scholar 

  26. Zhou P, Zou J, Tian F, Shang Z (2009) Fluorine bonding: how does it work in protein–ligand interactions? J Chem Inf Model 49:2344–2355

    Article  CAS  Google Scholar 

  27. Tian F, Lv Y, Zhou P, Yang L (2011) Characterization of PDZ domain–peptide interactions using an integrated protocol of QM/MM, PB/SA, and CFEA analyses. J Comput Aided Mol Des 25:947–958

    Article  CAS  Google Scholar 

  28. Zhou P, Tian F, Ren Y, Shang Z (2010) Systematic classification and analysis of themes in protein–DNA recognition. J Chem Inf Model 50:1476–1488

    Article  CAS  Google Scholar 

  29. Siggers TW, Silkov A, Honig B (2005) Structural alignment of protein–DNA interfaces: insights into the determinants of binding specificity. J Mol Biol 345:1027–1045

    Article  CAS  Google Scholar 

  30. McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238:777–793

    Article  CAS  Google Scholar 

  31. Word JM, Lovell SC, LaBean TH, Taylor HC, Zalis ME, Presley BK, Richardson JS, Richardson DC (1999) Visualizing and quantifying molecular goodness-of-fit: small-probe contact dots with explicit hydrogen atoms. J Mol Biol 285:1711–1733

    Article  CAS  Google Scholar 

  32. Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intel Lab Syst 58:109–130

    Article  CAS  Google Scholar 

  33. Stanton DT (2012) QSAR and QSPR model interpretation using partial least squares (PLS) analysis. Curr Comput Aided Drug Des 8:107–127

    Article  CAS  Google Scholar 

  34. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–293

    Google Scholar 

  35. Zhou P, Xiang C, Wu Y, Shang Z (2010) Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 38:199–212

    Article  CAS  Google Scholar 

  36. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    Google Scholar 

  37. Ren Y, Chen X, Feng M, Wang Q, Zhou P (2011) Gaussian process: a promising approach for the modeling and prediction of peptide binding affinity to MHC proteins. Protein Pept Lett 18:670–678

    Article  CAS  Google Scholar 

  38. Obrezanova O, Csanyi G, Gola JMR, Segall MD (2007) Gaussian processes: a method for automatic QSAR modeling of ADME properties. J Chem Inf Model 47:1847–1857

    Article  CAS  Google Scholar 

  39. Wolfe P (1969) Convergence conditions for ascent methods. SIAM Rev 11:226–235

    Article  Google Scholar 

  40. Zhou P, Tian F, Lv F, Shang Z (2009) Comprehensive comparison of eight statistical modelling methods used in quantitative structure–retention relationship studies for liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome. J Chromatogr A 1216:3107–3116

    Article  CAS  Google Scholar 

  41. Cho SJ, Hermsmeier MA (2002) Genetic algorithm guided selection: variable selection and subset selection. J Chem Inf Comput Sci 42:927–936

    Article  CAS  Google Scholar 

  42. Zhou P, Tian F, Chen X, Shang Z (2008) Modeling and prediction of binding affinities between the human amphiphysin SH3 domain and its peptide ligands using genetic algorithm-Gaussian processes. Biopolymers (Pept Sci) 90:792–802

    Article  CAS  Google Scholar 

  43. Tian F, Yang L, Lv F, Yang Q, Zhou P (2009) In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure–activity relationship approach. Amino Acids 36:535–554

    Article  CAS  Google Scholar 

  44. Golbraikh A, Tropsha A (2002) Beware of q 2! J Mol Graph Model 20:269–276

    Article  CAS  Google Scholar 

  45. Baroni M, Clementi S, Cruciani G, Kettaneh-Wold N, Wold S (1993) D-optimal designs in QSAR. Quant Struct Act Relat 12:225–231

    Article  CAS  Google Scholar 

  46. Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemometr 23:160–171

    Article  CAS  Google Scholar 

  47. Tian F, Zhang C, Fan X, Yang X, Wang X, Liang H (2010) Predicting the flexibility profile of ribosomal RNAs. Mol Inf 29:707–715

    Article  CAS  Google Scholar 

  48. Ren Y, Wu B, Pan Y, Lv F, Kong X, Luo X, Li Y, Yang Q (2011) Characterization of the binding profile of peptide to transporter associated with antigen processing (TAP) using Gaussian process regression. Comput Biol Med 41:865–870

    Article  CAS  Google Scholar 

  49. He P, Wu W, Wang HD, Yang K, Liao KL, Zhang W (2010) Toward quantitative characterization of the binding profile between the human amphiphysin-1 SH3 domain and its peptide ligands. Amino Acids 38:1209–1218

    Article  CAS  Google Scholar 

  50. Acharya KR, Lloyd MD (2005) The advantages and limitations of protein crystal structures. Trends Pharm Sci 26:10–14

    Article  CAS  Google Scholar 

  51. Dominguez C, Boelens R, Bonvin AM (2003) HADDOCK: a proteinprotein docking approach based on biochemical or biophysical information. J Am Chem Soc 125:1731–1737

    Article  CAS  Google Scholar 

  52. Chen R, Li L, Weng Z (2003) ZDOCK: an initial-stage protein-docking algorithm. Proteins 52:80–87

    Article  CAS  Google Scholar 

  53. Zhang C, Liu S, Zhu Q, Zhou Y (2005) A knowledge-based energy function for protein–ligand, protein–protein, and protein–DNA complexes. J Med Chem 48:2325–2335

    Article  CAS  Google Scholar 

  54. Jorgensen WL, Maxwell DS, Tirado-Rives J (1996) Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J Am Chem Soc 118:11225–11236

    Article  CAS  Google Scholar 

  55. Ponder JW, Richards FM (1987) An efficient newton-like method for molecular mechanics energy minimization of large molecules. J Comput Chem 8:1016–1024

    Article  CAS  Google Scholar 

  56. Qiu D, Shenkin PS, Hollinger FP, Still WC (1997) The GB/SA continuum model for solvation. A fast analytical method for the calculation of approximate Born radii. J Phys Chem 101:3005–3014

    Article  CAS  Google Scholar 

  57. Almlöf M, Brandsdal BO, Aqvist J (2004) Binding affinity prediction with different force fields: examination of the linear interaction energy method. J Comput Chem 25:1242–1254

    Article  Google Scholar 

  58. Khoruzhii O, Donchev AG, Galkin N, Illarionov A, Olevanov M, Ozrin V, Queen C, Tarasov V (2008) Application of a polarizable force field to calculations of relative protein–ligand binding affinities. Proc Natl Acad Sci USA 105:10378–10383

    Article  CAS  Google Scholar 

  59. Liu S, Zhang C, Zhou H, Zhou Y (2004) A physical reference state unifies the structure-derived potential of mean force for protein folding and binding. Proteins 56:93–101

    Article  CAS  Google Scholar 

  60. Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726

    Article  CAS  Google Scholar 

  61. Biela A, Sielaff F, Terwesten F, Heine A, Steinmetzer T, Klebe G (2006) Ligand binding stepwise disrupts water network in thrombin: enthalpic and entropic changes reveal classical hydrophobic effect. J Med Chem 55:6094–6110

    Article  Google Scholar 

  62. Freire E (2009) ITC: affinity is not everything. Eur Pharm Rev 14:44–47

    Google Scholar 

  63. Moreira IS, Fernandes PA, Ramos MJ (2007) Hot spots: a review of the protein–protein interface determinant amino-acid residues. Proteins 68:803–812

    Article  CAS  Google Scholar 

  64. Kortemme T, Baker D (2002) A simple physical model for binding energy hot spots in protein–protein complexes. Proc Natl Acad Sci USA 99:14116–14121

    Article  CAS  Google Scholar 

  65. Ofran Y, Rost B (2007) Protein–protein interaction hotspots carved into sequences. PLoS Comput Biol 3:e119

    Article  Google Scholar 

  66. Xu D, Tsai CJ, Nussinov R (1997) Hydrogen bonds and salt bridges across protein–protein interfaces. Protein Eng 10:999–1012

    Article  CAS  Google Scholar 

  67. Lo Conte L, Chothia C, Janin J (1999) The atomic structure of protein–protein recognition sites. J Mol Biol 285:2177–2198

    Article  CAS  Google Scholar 

  68. Petsalaki E, Russell RB (2008) Peptide-mediated interactions in biological systems: new discoveries and applications. Curr. Opin. Biotech. 19:344–350

    Article  CAS  Google Scholar 

  69. Tsai CJ, Nussinov R (1997) Hydrophobic folding units at protein–protein interfaces: implications to protein folding and to protein–protein association. Protein Sci 6:1426–1437

    Article  CAS  Google Scholar 

  70. Young L, Jernigan RL, Covell DG (1994) A role for surface hydrophobicity in protein–protein recognition. Protein Sci 3:717–729

    Article  CAS  Google Scholar 

  71. Tuffery P, Derreumaux P (2012) Flexibility and binding affinity in protein–ligand, protein–protein and multi-component protein interactions: limitations of current computational approaches. J R Soc Interface 9:20–33

    Article  CAS  Google Scholar 

  72. Burnett JC, Kellogg GE, Abraham DJ (2000) Computational methodology for estimating changes in free energies of biomolecular association upon mutation. The importance of bound water in dimer-tetramer assembly for beta 37 mutant hemoglobins. Biochemistry 39:1622–1633

    Article  CAS  Google Scholar 

  73. Jiang L, Kuhlman B, Kortemme T, Baker D (2005) A “solvated rotamer” approach to modeling water-mediated hydrogen bonds at protein–protein interfaces. Proteins 58:893–904

    Article  CAS  Google Scholar 

  74. Kumar S, Nussinov R (2002) Close-range electrostatic interactions in proteins. ChemBioChem 3:604–617

    Article  CAS  Google Scholar 

  75. Missimer JH, Steinmetz MO, Baron R, Winkler FK, Kammerer RA, Daura X, van Gunsteren WF (2007) Configurational entropy elucidates the role of salt-bridge networks in protein thermostability. Protein Sci 16:1349–1359

    Article  CAS  Google Scholar 

  76. Kumar S, Wolfson HJ, Nussinov R (2001) Protein flexibility and electrostatic interactions. IBM J Res Dev 45:499–512

    Article  CAS  Google Scholar 

  77. Marqusee S, Sauer RT (1994) Contributions of a hydrogen bond/salt bridge network to the stability of secondary and tertiary structure in lambda repressor. Protein Sci 3:2217–2225

    Article  CAS  Google Scholar 

  78. Zhou P, Tian F, Shang Z (2009) 2D depiction of nonbonding interactions for protein complexes. J Comput Chem 30:940–951

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 31200993), the Fundamental Research Funds for the Central Universities (No. ZYGX2012J111), the Young Teacher Doctoral Discipline Fund of Ministry of Education of China (No. 20120185120025) and the scientific research funds of UESTC.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Peng Zhou or Jian Huang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 64 kb)

Supplementary material 2 (PDF 97 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, P., Wang, C., Tian, F. et al. Biomacromolecular quantitative structure–activity relationship (BioQSAR): a proof-of-concept study on the modeling, prediction and interpretation of protein–protein binding affinity. J Comput Aided Mol Des 27, 67–78 (2013). https://doi.org/10.1007/s10822-012-9625-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-012-9625-3

Keywords

Navigation