Skip to main content

Continuous Molecular Fields Approach Applied to Structure-Activity Modeling

  • Chapter
  • First Online:
Application of Computational Techniques in Pharmacy and Medicine

Abstract

The Method of Continuous Molecular Fields is a universal approach to predict various properties of chemical compounds, in which molecules are represented by means of continuous fields (such as electrostatic, steric, electron density functions, etc.). The essence of the proposed approach consists in performing statistical analysis of functional molecular data by means of joint application of kernel machine learning methods and special kernels which compare molecules by computing overlap integrals of their molecular fields. This approach is an alternative to traditional methods of building 3D “structure-activity” and “structure-property” models based on the use of fixed sets of molecular descriptors. The methodology of the approach is described in this chapter, followed by its application to building regression 3D-QSAR models and conducting virtual screening based on one-class classification models. The main directions of the further development of this approach are outlined at the end of the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Varnek A, Baskin II (2011) Chemoinformatics as a theoretical chemistry discipline. Mol Inf 30(1):20–32. doi:10.1002/minf.201000100

    Article  CAS  Google Scholar 

  2. Gasteiger J, Engel T (2003) Chemoinformatics: a textbook. Wiley-VCH, Weinheim

    Book  Google Scholar 

  3. Gasteiger J (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH, Weinheim

    Book  Google Scholar 

  4. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Mod 52(6):1413–1437. doi:10.1021/ci200409x

    Article  CAS  Google Scholar 

  5. Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York

    Google Scholar 

  6. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge

    Google Scholar 

  7. Zhokhova NI, Baskin II, Bakhronov DK, Palyulin VA, Zefirov NS (2009) Method of continuous molecular fields in the search for quantitative structure-activity relationships. Dokl Chem 429(1):273–276

    Article  CAS  Google Scholar 

  8. Baskin II, Zhokhova NI (2013) The continuous molecular fields approach to building 3D-QSAR models. J Comput-Aided Mol Des 27(5):427–442. doi:10.1007/s10822-013-9656-4

    Article  CAS  Google Scholar 

  9. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. doi:10.1007/bf00994018

    Google Scholar 

  10. Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables. In: proceedings of the Fifteenth International Conference on Machine Learning (ICML-98). Morgan Kaufmann, Burlington, pp 515–521

    Google Scholar 

  11. Rasmussen CE, Williams CKI (2006) Gaussian processes in machine learning. Adaptive computation and machine learning. MIT, Cambridge

    Google Scholar 

  12. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471

    Article  Google Scholar 

  13. Karpov PV, Baskin II, Zhokhova NI, Zefirov NS (2011) Method of continuous molecular fields in the one-class classification task. Dokl Chem 440(2):263–265

    Article  CAS  Google Scholar 

  14. Karpov PV, Baskin II, Zhokhova NI, Nawrozkij MB, Zefirov AN, Yablokov AS, Novakov IA, Zefirov NS (2011) One-class approach: models for virtual screening of non-nucleoside HIV-1 reverse transcriptase inhibitors based on the concept of continuous molecular fields. Russ Chem Bull 60(11):2418–2424. doi:10.1007/s11172-011-0372-8

    Article  CAS  Google Scholar 

  15. Kubinyi H (ed) (2000) 3D QSAR in drug design. Volume 1: theory methods and applications (Three-dimensional quantitative structure activity relationships). Kluwer/Escom, Dordrecht

    Google Scholar 

  16. Kubinyi H, Folkers G, Martin YC (eds) (2002a) 3D QSAR in drug design. Volume 2: ligand-protein Interactions and Molecular Similarity. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  17. Kubinyi H, Folkers G, Martin YC (eds) (2002b) 3D QSAR in drug design. Volume 3: Recent advances. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  18. Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110(18):5959–5967. doi:10.1021/ja00226a005

    Article  CAS  Google Scholar 

  19. Testa B, Carrupt PA, Gaillard P, Billois F, Weber P (1996) Lipophilicity in molecular modeling. Pharm Res 13(3):335–343. doi:10.1023/a:1016024005429

    Article  CAS  Google Scholar 

  20. Kim KH, Greco G, Novellino E, Silipo C, Vittoria A (1993) Use of the hydrogen bond potential function in a comparative molecular field analysis (CoMFA) on a set of benzodiazepines. J Comput-Aided Mol Des 7(3):263–280

    Article  CAS  Google Scholar 

  21. Waller CL, Marshall GR (1993) Three-dimensional quantitative structure-activity relationship of angiotesin-converting enzyme and thermolysin inhibitors. II. A comparison of CoMFA models incorporating molecular orbital fields and desolvation free energies based on active-analog and complementary-receptor-field alignment rules. J Med Chem 36(16):2390–2403

    Article  CAS  Google Scholar 

  22. Kellogg GE (1996) E-state fields: applications to 3D QSAR. J Comput-Aided Mol Des 10(6):513–520

    Article  CAS  Google Scholar 

  23. Kroemer RT, Hecht P (1995) Replacement of steric 6-12 potential–derived interaction energies by atom-based indicator variables in CoMFA leads to models of higher consistency. J Comput-Aided Mol Des 9(3):205–212

    Article  CAS  Google Scholar 

  24. Klebe G, Abraham U (1999) Comparative molecular similarity index analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J Comput-Aided Mol Des 13(1):1–10

    Article  CAS  Google Scholar 

  25. Goodford P (2006) The basic principles of GRID. In: Cruciani G (ed) Molecular interaction fields. Applications in drug discovery and ADME prediction. Methods and principles in medicinal chemistry, vol 27. Wiley-VCH, Weinheim, pp 3–26

    Google Scholar 

  26. Höskuldsson A (1988) PLS regression methods. J Chemom 2(3):211–228

    Article  Google Scholar 

  27. Fradera X, Amat L, Besalu E, Carbo-Dorca R (1997) Application of molecular quantum similarity to QSAR. Quant Struct-Act Rel 16(1):25–32

    Article  CAS  Google Scholar 

  28. Rosipal R, Trejo LJ (2002) Kernel partial least squares regression in reproducing Kernel Hilbert Space. J Mach Learn Res 2(2):97–123. doi:10.1162/15324430260185556

    Google Scholar 

  29. Baskin II, Tikhonova IG, Palyulin VA, Zefirov NS (2003) Selectivity fields: comparative molecular field analysis (CoMFA) of the glycine/NMDA and AMPA receptors. J Med Chem 46(19):4063–4069

    Article  CAS  Google Scholar 

  30. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York

    Google Scholar 

  31. Baskin II, Kireeva N, Varnek A (2010) The One-class classification approach to data description and to models applicability domain. Mol Inf 29(8–9):581–587. doi:10.1002/minf.201000063

    Article  CAS  Google Scholar 

  32. DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and experimentally determined active site geometries. J Am Chem Soc 115(13):5372–5384. doi:10.1021/ja00066a004

    Article  CAS  Google Scholar 

  33. Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554

    Article  CAS  Google Scholar 

  34. Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular comparison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 45(19):4153–4170. doi:10.1021/jm020808p

    Article  CAS  Google Scholar 

  35. Böhm M, StüÑrzebecher J, Klebe G (1999) Three-Dimensional quantitative structure-activity relationship analyses using comparative molecular field analysis and comparative molecular similarity indices analysis to elucidate selectivity differences of inhibitors binding to trypsin, thrombin, and factor Xa. J Med Chem 42(3):458–477. doi:10.1021/jm981062r

    Article  Google Scholar 

  36. Besler BH, Merz KM, Kollman PA (1990) Atomic charges derived from semiempirical methods. J Comp Chem 11(4):431–439. doi:10.1002/jcc.540110404

    Article  CAS  Google Scholar 

  37. Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity-a rapid access to atomic charges. Tetrahedron 36(22):3219–3228

    Article  CAS  Google Scholar 

  38. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set in descriptor space: a review. Altern Lab Anim 33(5):445–459

    CAS  Google Scholar 

  39. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746. doi:10.1021/ci800151m

    Article  CAS  Google Scholar 

  40. Sushko I, Novotarskyi S, Korner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Muller KR, Xi L, Liu H, Yao X, Oberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV (2010) Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. J Chem Inf Model 50(12):2094–2111. doi:10.1021/ci100253r

    Article  CAS  Google Scholar 

  41. Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011a) Virtual screening based on one-class classification. Dokl Chem 437(2):107–111

    Article  CAS  Google Scholar 

  42. Karpov PV, Osolodkin DI, Baskin II, Palyulin VA, Zefirov NS (2011b) One-class classification as a novel method of ligand-based virtual screening: the case of glycogen synthase kinase 3ÐÐ inhibitors. Bioorg Med Chem Lett 21(22):6728–6731

    Article  CAS  Google Scholar 

  43. Markou M, Singh S (2003a) Novelty detection: a review—part 1: statistical approaches. Signal Process 83(12):2481–2497

    Article  Google Scholar 

  44. Markou M, Singh S (2003b) Novelty detection: A review—part 2: neural network based approaches. Signal Process 83(12):2499–2521

    Article  Google Scholar 

  45. Kearsley SK, Smith GM (1990) An alternative method for the alignment of molecular structures: maximizing electrostatic and steric overlap. Tetrahedron Comput Methodol 3(6 PART C):615–633

    Article  CAS  Google Scholar 

  46. Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. ACM Trans Intel Syst Technol 2(3):27:21–27:27

    Google Scholar 

  47. Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49(23):6789–6801

    Article  CAS  Google Scholar 

  48. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  Google Scholar 

  49. Maggiora GM (2006) On outliers and activity cliffs why QSAR often disappoints. J Chem Inf Mod 46(4):1535–1535. doi:10.1021/ci060117s

    Article  CAS  Google Scholar 

  50. Carbo-Dorca R, Besalu E (2006) Generation of molecular fields, quantum similarity measures and related questions. J Math Chem 39(3–4):495–510. doi:10.1007/s10910-005-9046-9

    Article  CAS  Google Scholar 

  51. Van Damme S, Bultinck P (2009) 3D QSAR based on conceptual DFT molecular fields: antituberculotic activity. J Mol Struct—THEOCHEM 943 (1–3):83–89. doi:10.1016/j.theochem.2009.10.031

    Google Scholar 

  52. Geerlings P, De Proft F, Langenaeker W (2003) Conceptual density functional theory. Chem Rev 103(5):1793–1874. doi:10.1021/cr990029p

    Article  CAS  Google Scholar 

  53. Cruz V, Ramos J, Munoz-Escalona A, Lafuente P, Pena B, Martinez-Salazar J (2004) 3D-QSAR analysis of metallocene-based catalysts used in ethylene polymerisation. Polymer 45(6):2061–2072. doi:10.1016/j.polymer.2003.12.059

    Article  CAS  Google Scholar 

  54. Cruz VL, Ramos J, Martinez S, Munoz-Escalona A, Martinez-Salazar J (2005) Structure–activity relationship study of the metallocene catalyst activity in ethylene polymerization. Organometallics 24(21):5095–5102. doi:10.1021/om050458f

    Article  CAS  Google Scholar 

  55. Heritage TW, Ferguson AM, Turner DB, Willett P (1998) EVA: a novel theoretical descriptor for QSAR studies. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in drug design. Ligand-protein complexes and molecular similarity, vol 2. Kluwer Academic Publishers, London, pp 381–398

    Google Scholar 

  56. Wagener M, Sadowski J, Gasteiger J (1995) Autocorrelation of molecular surface properties for modeling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J Am Chem Soc 117(29):7769–7775. doi:10.1021/ja00134a023

    Article  CAS  Google Scholar 

  57. Silverman BD, Platt DE (1996) Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition. J Med Chem 39(11):2129–2140. doi:10.1021/jm950589q

    Article  CAS  Google Scholar 

  58. Todeschini R, Gramatica P (1998) New 3D molecular descriptors: the WHIM theory and QSAR applications. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in drug design. Ligand–protein complexes and molecular similarity, vol 2. Kluwer Academic Publishers, London, pp 355–380

    Google Scholar 

  59. Pastor M, Cruciani G, McLay I, Pickett S, Clementi S (2000) GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. J Med Chem 43(17):3233–3243. doi:jm000941m

    Article  CAS  Google Scholar 

  60. Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason JS (2007) A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J Chem Inf Mod 47(2):279–294

    Article  CAS  Google Scholar 

  61. Cruciani G, Pastor M, Guba W (2000) VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. Eur J Pharm Sci 11(Suppl. 2):S29–S39. doi:S0928098700001627

    Article  CAS  Google Scholar 

  62. Hamsici OC, Martinez AM (2009) Rotation invariant kernels and their application to shape analysis. IEEE Trans Pattern Anal 31(11):1985–1999. doi:10.1109/tpami.2008.234

    Article  Google Scholar 

  63. Haasdonk B, Burkhardt H (2007) Invariant kernel functions for pattern analysis and machine learning. Mach Learn 68(1):35–61. doi:10.1007/s10994-007-5009-7

    Article  Google Scholar 

  64. Wood J (1996) Invariant pattern recognition: A review. Pattern Recogn 29(1):1–17. doi:10.1016/0031-3203(95)00069-0

    Article  Google Scholar 

  65. Azencott CA, Ksikes A, Swamidass SJ, Chen JH, Ralaivola L, Baldi P (2007) One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J Chem Inf Mod 47(3):965–974

    Article  CAS  Google Scholar 

  66. Bishop CM (2006) Pattern ecognition and machine learning. Information science and statistics. Springer, New York

    Google Scholar 

  67. Baskin II, Zhokhova NI, Palyulin VA, Zefirov NS (2008) Additive inductive learning in QSAR/QSPR studies and molecular modeling. In: 4th German conference on chemoinformatics, November 9–11, 2008, Goslar, Germany, p 78

    Google Scholar 

  68. Erhan D, L’Heureux P-J, Yue SY, Bengio Y (2006) Collaborative filtering on a family of biological targets. J Chem Inf Model 46(2):626–635

    Article  CAS  Google Scholar 

  69. Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi:10.1093/bioinformatics/btm580

    Article  CAS  Google Scholar 

  70. Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156

    Article  CAS  Google Scholar 

  71. Geppert H, Humrich J, Stumpfe D, Gaertner T, Bajorath J (2009) Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors. J Chem Inf Mod 49(4):767–779. doi:10.1021/ci900004a

    Article  CAS  Google Scholar 

  72. Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107

    Google Scholar 

  73. Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861

    Google Scholar 

  74. Hall P, Robinson AP (2009) Reducing variability of crossvalidation for smoothing-parameter choice. Biometrika 96(1):175–186. doi:10.1093/biomet/asn068

    Article  Google Scholar 

  75. Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    Google Scholar 

  76. Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  Google Scholar 

  77. Smola AJ, Mangasarian OL, Scholkopf B (2002) Sparse kernel feature analysis. In: classification, automation, and new media. Studies in classification, data analysis, and knowledge organization, pp 167–178

    Google Scholar 

  78. Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41(1):176–190. doi:10.1016/j.patcog.2007.05.018

    Article  Google Scholar 

  79. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  80. R: a language and environment for statistical computing. (2012). http://www.R-project.org/. Accessed 11 August 2014.

Download references

Acknowledgments

The authors thank Prof. Yu.A.Ustynyuk for stimulating discussion and advice. The authors also thank Prof. A.Varnek and Dr. G.Marcou for valuable comments regarding the developed approach. This work was supported by Russian Foundation for Basic Research (Grant 13-07-00511).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor I. Baskin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Baskin, I., Zhokhova, N. (2014). Continuous Molecular Fields Approach Applied to Structure-Activity Modeling. In: Gorb, L., Kuz'min, V., Muratov, E. (eds) Application of Computational Techniques in Pharmacy and Medicine. Challenges and Advances in Computational Chemistry and Physics, vol 17. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9257-8_13

Download citation

Publish with us

Policies and ethics