Skip to main content

Advertisement

Log in

The continuous molecular fields approach to building 3D-QSAR models

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure–activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Kubinyi H (ed) (2000) 3D QSAR in drug design: vol 1: theory methods and applications (three-dimensional quantitative structure activity relationships). Kluwer/Escom, Dordrecht

    Google Scholar 

  2. Kubinyi H, Folkers G, Martin YC (eds) (2002) 3D QSAR in drug design. Vol 2: ligand-protein interactions and molecular similarity, vol 2. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  3. Kubinyi H, Folkers G, Martin YC (eds) (2002) 3D QSAR in drug design. Vol 3: recent advances. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  4. Cruciani G (ed) (2006) Molecular interaction fields; application to drug discovery and ADME prediction. Wiley-VCH, Weinheim

    Google Scholar 

  5. Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA) 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110(18):5959–5967. doi:10.1021/ja00226a005

    Article  CAS  Google Scholar 

  6. Testa B, Carrupt PA, Gaillard P, Billois F, Weber P (1996) Lipophilicity in molecular modeling. Pharm Res 13(3):335–343. doi:10.1023/a:1016024005429

    Article  CAS  Google Scholar 

  7. Kim KH, Greco G, Novellino E, Silipo C, Vittoria A (1993) Use of the hydrogen bond potential function in a comparative molecular field analysis (CoMFA) on a set of benzodiazepines. J Comput Aided Mol Des 7(3):263–280

    Article  CAS  Google Scholar 

  8. Waller CL, Marshall GR (1993) Three-dimensional quantitative structure-activity relationship of angiotesin-converting enzyme and thermolysin inhibitors. II. A comparison of CoMFA models incorporating molecular orbital fields and desolvation free energies based on active-analog and complementary-receptor-field alignment rules. J Med Chem 36(16):2390–2403

    Article  CAS  Google Scholar 

  9. Kellogg GE (1996) E-state fields: applications to 3D QSAR. J Comput Aided Mol Des 10(6):513–520

    Article  CAS  Google Scholar 

  10. Kroemer RT, Hecht P (1995) Replacement of steric 6–12 potential-derived interaction energies by atom-based indicator variables in CoMFA leads to models of higher consistency. J Comput Aided Mol Des 9(3):205–212

    Article  CAS  Google Scholar 

  11. Klebe G, Abraham U (1999) Comparative molecular similarity index analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J Comput Aided Mol Des 13(1):1–10

    Article  CAS  Google Scholar 

  12. Goodford P (2006) The basic principles of GRID. In: Cruciani G (ed) Molecular interaction fields. Applications in drug discovery and ADME prediction. Methods and principles in medicinal chemistry, vol 27. Wiley-VCH, Weinheim, pp 3–26

  13. Höskuldsson A (1988) PLS regression methods. J Chemom 2(3):211–228

    Article  Google Scholar 

  14. Martin RL, Gardiner E, Gillet VJ, Muñoz-Muriedas J, Senger S (2010) Wavelet approximation of GRID fields: application to quantitative structure-activity relationships. Mol Inform 29(8–9):603–620. doi:10.1002/minf.201000066

    Article  CAS  Google Scholar 

  15. Tetko IV, Kovalishyn VV, Livingstone DJ (2001) Volume learning algorithm artificial neural networks for 3D QSAR studies. J Med Chem 44(15):2411–2420

    Article  CAS  Google Scholar 

  16. Brown WM, Sasson A, Bellew DR, Hunsaker LA, Martin S, Leitao A, Deck LM, Vander Jagt DL, Oprea TI (2008) Efficient calculation of molecular properties from simulation using kernel molecular dynamics. J Chem Inf Model 48(8):1626–1637. doi:10.1021/ci8001233

    Article  CAS  Google Scholar 

  17. Cheeseright T, Mackey M, Rose S, Vinter A (2006) Molecular field extrema as descriptors of biological activity: definition and validation. J Chem Inf Model 46(2):665–676. doi:10.1021/ci050357s

    Article  CAS  Google Scholar 

  18. Carbo-Dorca R, Robert D, Amat L, Girones X, Besalu E (2000) Molecular quantum similarity in QSAR and drug design. Lecture notes in chemistry. Springer, Berlin

    Book  Google Scholar 

  19. Fradera X, Amat L, Besalu E, Carbo-Dorca R (1997) Application of molecular quantum similarity to QSAR. Quant Struct Act Relat 16(1):25–32

    Article  CAS  Google Scholar 

  20. Besalu E, Girones X, Amat L, Carbo-Dorca R (2002) Molecular quantum similarity and the fundamentals of QSAR. Acc Chem Res 35(5):289–295

    Article  CAS  Google Scholar 

  21. Van Damme S, Bultinck P (2009) 3D QSAR based on conceptual DFT molecular fields: antituberculotic activity. J Mol Struct THEOCHEM 943(1–3):83–89. doi:10.1016/j.theochem.2009.10.031

    Google Scholar 

  22. Zhokhova NI, Baskin II, Bakhronov DK, Palyulin VA, Zefirov NS (2009) Method of continuous molecular fields in the search for quantitative structure-activity relationships. Dokl Chem 429(1):273–276

    Article  CAS  Google Scholar 

  23. Karpov PV, Baskin II, Zhokhova NI, Zefirov NS (2011) Method of continuous molecular fields in the one-class classification task. Dokl Chem 440(2):263–265

    Article  CAS  Google Scholar 

  24. Karpov PV, Baskin II, Zhokhova NI, Nawrozkij MB, Zefirov AN, Yablokov AS, Novakov IA, Zefirov NS (2011) One-class approach: models for virtual screening of non-nucleoside HIV-1 reverse transcriptase inhibitors based on the concept of continuous molecular fields. Russ Chem Bull 60(11):2418–2424

    Article  CAS  Google Scholar 

  25. Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554

    Article  CAS  Google Scholar 

  26. Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin

    Book  Google Scholar 

  27. Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14(3):199–222

    Article  Google Scholar 

  28. Smola AJ, Scholkopf B, Muller KR (1998) The connection between regularization operators and support vector kernels. Neural Netw 11(4):637–649. doi:10.1016/s0893-6080(98)00032-x

    Article  Google Scholar 

  29. Bennett KP, Embrechts MJ (2003) An optimization perspective on kernel partial least squares regression. In: Suykens JAK, Horvath G, Basu S, Micchelli C, Vandewalle J (eds) Advances in learning theory: methods, models and applications. NATO science series III: computer and systems sciences, vol 190. IOS Press, Amsterdam, pp 227–250

    Google Scholar 

  30. Rasmussen CE, Williams CKI (2006) Gaussian processes in machine learning. Adaptive computation and machine learning. The MIT Press, Cambridge

    Google Scholar 

  31. Baskin II, Kireeva N, Varnek A (2010) The one-class classification approach to data description and to models applicability domain. Mol Inform 29(8–9):581–587. doi:10.1002/minf.201000063

    Article  CAS  Google Scholar 

  32. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York

    Google Scholar 

  33. Bader RFW (1985) Atoms in molecules. Acc Chem Res 18(1):9–15

    Article  CAS  Google Scholar 

  34. Tripos Inc., St. Louis, MO. http://www.tripos.com

  35. Artemenko NV, Baskin II, Palyulin VA, Zefirov NS (2001) Prediction of physical properties of organic compounds using artificial neural networks within the substructure approach. Dokl Chem 381(1):317–320

    Article  Google Scholar 

  36. Artemenko NV, Baskin II, Palyulin VA, Zefirov NS (2003) Artificial neural network and fragmental approach in prediction of physicochemical properties of organic compounds. Russ Chem Bull 52(1):20–29

    Article  CAS  Google Scholar 

  37. Jover J, Bosque R, Sales J (2004) Determination of Abraham solute parameters from molecular structure. J Chem Inf Comput Sci 44(3):1098–1106

    Article  CAS  Google Scholar 

  38. Zhokhova NI, Baskin II, Palyulin VA, Zefirov AN, Zefirov NS (2007) Fragmental descriptors with labeled atoms and their application in QSAR/QSPR studies. Dokl Chem 417(2):282–284

    Article  CAS  Google Scholar 

  39. Baskin II, Halberstam NM, Artemenko NV, Palyulin VA, Zefirov NS (2003) NASAWIN—a universal software for QSPR/QSAR studies. In: Ford M (ed) EuroQSAR 2002 designing drugs and crop protectants: processes, problems and solutions. Blackwell Publishing, Massachusetts, pp 260–263

    Google Scholar 

  40. Baskin II, Zhokhova NI, Palyulin VA, Zefirov AN, Zefirov NS (2009) Multilevel approach to the prediction of properties of organic compounds in the framework of the QSAR/QSPR methodology. Dokl Chem 427(1):172–175

    Article  CAS  Google Scholar 

  41. Rossi F, Villa N (2006) Support vector machine for functional data classification. Neurocomputing 69(7–9):730–742

    Article  Google Scholar 

  42. Geisser S (1993) Predictive inference. Chapman and Hall, New York

    Google Scholar 

  43. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36:111–147

    Google Scholar 

  44. Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20(4):269–276

    Article  CAS  Google Scholar 

  45. Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection Bias in performance evaluation. J Mach Learn Res 11:2079–2107

    Google Scholar 

  46. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746. doi:10.1021/ci800151m

    Article  CAS  Google Scholar 

  47. R: A Language and Environment for Statistical Computing (2012). http://www.R-project.org/

  48. DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and experimentally determined active site geometries. J Am Chem Soc 115(13):5372–5384. doi:10.1021/ja00066a004

    Article  CAS  Google Scholar 

  49. Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular comparison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 45(19):4153–4170. doi:10.1021/jm020808p

    Article  CAS  Google Scholar 

  50. Böhm M, Stüjrzebecher J, Klebe G (1999) Three-dimensional quantitative structure-activity relationship analyses using comparative molecular field analysis and comparative molecular similarity indices analysis to elucidate selectivity differences of inhibitors binding to trypsin, thrombin, and factor Xa. J Med Chem 42(3):458–477. doi:10.1021/jm981062r

    Article  Google Scholar 

  51. Besler BH, Merz KM, Kollman PA (1990) Atomic charges derived from semiempirical methods. J Comput Chem 11(4):431–439. doi:10.1002/jcc.540110404

    Article  CAS  Google Scholar 

  52. Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity-a rapid access to atomic charges. Tetrahedron 36(22):3219–3228

    Article  CAS  Google Scholar 

  53. Geerlings P, De Proft F, Langenaeker W (2003) Conceptual density functional theory. Chem Rev 103(5):1793–1874. doi:10.1021/cr990029p

    Article  CAS  Google Scholar 

  54. Hamsici OC, Martinez AM (2009) Rotation invariant kernels and their application to shape analysis. IEEE Trans Pattern Anal 31(11):1985–1999. doi:10.1109/tpami.2008.234

    Article  Google Scholar 

  55. Haasdonk B, Burkhardt H (2007) Invariant kernel functions for pattern analysis and machine learning. Mach Learn 68(1):35–61. doi:10.1007/s10994-007-5009-7

    Article  Google Scholar 

  56. Wood J (1996) Invariant pattern recognition: a review. Pattern Recognit 29(1):1–17. doi:10.1016/0031-3203(95)00069-0

    Article  Google Scholar 

  57. Erhan D, L’Heureux P-J, Yue SY, Bengio Y (2006) Collaborative filtering on a family of biological targets. J Chem Inf Model 46(2):626–635

    Article  CAS  Google Scholar 

  58. Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi:10.1093/bioinformatics/btm580

    Article  CAS  Google Scholar 

  59. Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156

    Article  CAS  Google Scholar 

  60. Geppert H, Humrich J, Stumpfe D, Gaertner T, Bajorath J (2009) Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors. J Chem Inf Model 49(4):767–779. doi:10.1021/ci900004a

    Article  CAS  Google Scholar 

  61. Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861

    Google Scholar 

  62. Hall P, Robinson AP (2009) Reducing variability of cross validation for smoothing-parameter choice. Biometrika 96(1):175–186. doi:10.1093/biomet/asn068

    Article  Google Scholar 

  63. Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    Google Scholar 

  64. Bishop CM (2006) Pattern recognition and machine learning. Information Science and Statistics, Springer

    Google Scholar 

  65. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model 52(6):1413–1437. doi:10.1021/ci200409x

    Article  CAS  Google Scholar 

  66. Huang T-M, Kecman V, Kopriva I (2006) Kernel based algorithms for mining huge data sets. Supervised, semi-supervised, and unsupervised learning. Springer, Berlin

    Google Scholar 

Download references

Acknowledgments

The authors thank Prof. Yu. A. Ustynyuk for stimulating discussion and advice. The authors also thank Prof. A. Varnek and Dr. G. Marcou for valuable comments regarding the developed approach. This work was supported by Russian Foundation for Basic Research (Grant 13-07-00511).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor I. Baskin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baskin, I.I., Zhokhova, N.I. The continuous molecular fields approach to building 3D-QSAR models. J Comput Aided Mol Des 27, 427–442 (2013). https://doi.org/10.1007/s10822-013-9656-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-013-9656-4

Keywords

Navigation