Abstract
The Method of Continuous Molecular Fields is a universal approach to predict various properties of chemical compounds, in which molecules are represented by means of continuous fields (such as electrostatic, steric, electron density functions, etc.). The essence of the proposed approach consists in performing statistical analysis of functional molecular data by means of joint application of kernel machine learning methods and special kernels which compare molecules by computing overlap integrals of their molecular fields. This approach is an alternative to traditional methods of building 3D “structure-activity” and “structure-property” models based on the use of fixed sets of molecular descriptors. The methodology of the approach is described in this chapter, followed by its application to building regression 3D-QSAR models and conducting virtual screening based on one-class classification models. The main directions of the further development of this approach are outlined at the end of the chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Varnek A, Baskin II (2011) Chemoinformatics as a theoretical chemistry discipline. Mol Inf 30(1):20–32. doi:10.1002/minf.201000100
Gasteiger J, Engel T (2003) Chemoinformatics: a textbook. Wiley-VCH, Weinheim
Gasteiger J (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH, Weinheim
Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Mod 52(6):1413–1437. doi:10.1021/ci200409x
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT, Cambridge
Zhokhova NI, Baskin II, Bakhronov DK, Palyulin VA, Zefirov NS (2009) Method of continuous molecular fields in the search for quantitative structure-activity relationships. Dokl Chem 429(1):273–276
Baskin II, Zhokhova NI (2013) The continuous molecular fields approach to building 3D-QSAR models. J Comput-Aided Mol Des 27(5):427–442. doi:10.1007/s10822-013-9656-4
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. doi:10.1007/bf00994018
Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables. In: proceedings of the Fifteenth International Conference on Machine Learning (ICML-98). Morgan Kaufmann, Burlington, pp 515–521
Rasmussen CE, Williams CKI (2006) Gaussian processes in machine learning. Adaptive computation and machine learning. MIT, Cambridge
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Karpov PV, Baskin II, Zhokhova NI, Zefirov NS (2011) Method of continuous molecular fields in the one-class classification task. Dokl Chem 440(2):263–265
Karpov PV, Baskin II, Zhokhova NI, Nawrozkij MB, Zefirov AN, Yablokov AS, Novakov IA, Zefirov NS (2011) One-class approach: models for virtual screening of non-nucleoside HIV-1 reverse transcriptase inhibitors based on the concept of continuous molecular fields. Russ Chem Bull 60(11):2418–2424. doi:10.1007/s11172-011-0372-8
Kubinyi H (ed) (2000) 3D QSAR in drug design. Volume 1: theory methods and applications (Three-dimensional quantitative structure activity relationships). Kluwer/Escom, Dordrecht
Kubinyi H, Folkers G, Martin YC (eds) (2002a) 3D QSAR in drug design. Volume 2: ligand-protein Interactions and Molecular Similarity. Kluwer Academic Publishers, Dordrecht
Kubinyi H, Folkers G, Martin YC (eds) (2002b) 3D QSAR in drug design. Volume 3: Recent advances. Kluwer Academic Publishers, Dordrecht
Cramer RD, Patterson DE, Bunce JD (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110(18):5959–5967. doi:10.1021/ja00226a005
Testa B, Carrupt PA, Gaillard P, Billois F, Weber P (1996) Lipophilicity in molecular modeling. Pharm Res 13(3):335–343. doi:10.1023/a:1016024005429
Kim KH, Greco G, Novellino E, Silipo C, Vittoria A (1993) Use of the hydrogen bond potential function in a comparative molecular field analysis (CoMFA) on a set of benzodiazepines. J Comput-Aided Mol Des 7(3):263–280
Waller CL, Marshall GR (1993) Three-dimensional quantitative structure-activity relationship of angiotesin-converting enzyme and thermolysin inhibitors. II. A comparison of CoMFA models incorporating molecular orbital fields and desolvation free energies based on active-analog and complementary-receptor-field alignment rules. J Med Chem 36(16):2390–2403
Kellogg GE (1996) E-state fields: applications to 3D QSAR. J Comput-Aided Mol Des 10(6):513–520
Kroemer RT, Hecht P (1995) Replacement of steric 6-12 potential–derived interaction energies by atom-based indicator variables in CoMFA leads to models of higher consistency. J Comput-Aided Mol Des 9(3):205–212
Klebe G, Abraham U (1999) Comparative molecular similarity index analysis (CoMSIA) to study hydrogen-bonding properties and to score combinatorial libraries. J Comput-Aided Mol Des 13(1):1–10
Goodford P (2006) The basic principles of GRID. In: Cruciani G (ed) Molecular interaction fields. Applications in drug discovery and ADME prediction. Methods and principles in medicinal chemistry, vol 27. Wiley-VCH, Weinheim, pp 3–26
Höskuldsson A (1988) PLS regression methods. J Chemom 2(3):211–228
Fradera X, Amat L, Besalu E, Carbo-Dorca R (1997) Application of molecular quantum similarity to QSAR. Quant Struct-Act Rel 16(1):25–32
Rosipal R, Trejo LJ (2002) Kernel partial least squares regression in reproducing Kernel Hilbert Space. J Mach Learn Res 2(2):97–123. doi:10.1162/15324430260185556
Baskin II, Tikhonova IG, Palyulin VA, Zefirov NS (2003) Selectivity fields: comparative molecular field analysis (CoMFA) of the glycine/NMDA and AMPA receptors. J Med Chem 46(19):4063–4069
Ramsay JO, Silverman BW (2005) Functional data analysis. Springer series in statistics, 2nd edn. Springer, New York
Baskin II, Kireeva N, Varnek A (2010) The One-class classification approach to data description and to models applicability domain. Mol Inf 29(8–9):581–587. doi:10.1002/minf.201000063
DePriest SA, Mayer D, Naylor CB, Marshall GR (1993) 3D-QSAR of angiotensin-converting enzyme and thermolysin inhibitors: a comparison of CoMFA models based on deduced and experimentally determined active site geometries. J Am Chem Soc 115(13):5372–5384. doi:10.1021/ja00066a004
Sutherland JJ, O’Brien LA, Weaver DF (2004) A comparison of methods for modeling quantitative structure-activity relationships. J Med Chem 47(22):5541–5554
Gohlke H, Klebe G (2002) DrugScore meets CoMFA: adaptation of fields for molecular comparison (AFMoC) or how to tailor knowledge-based pair-potentials to a particular protein. J Med Chem 45(19):4153–4170. doi:10.1021/jm020808p
Böhm M, StüÑrzebecher J, Klebe G (1999) Three-Dimensional quantitative structure-activity relationship analyses using comparative molecular field analysis and comparative molecular similarity indices analysis to elucidate selectivity differences of inhibitors binding to trypsin, thrombin, and factor Xa. J Med Chem 42(3):458–477. doi:10.1021/jm981062r
Besler BH, Merz KM, Kollman PA (1990) Atomic charges derived from semiempirical methods. J Comp Chem 11(4):431–439. doi:10.1002/jcc.540110404
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity-a rapid access to atomic charges. Tetrahedron 36(22):3219–3228
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set in descriptor space: a review. Altern Lab Anim 33(5):445–459
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Oberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746. doi:10.1021/ci800151m
Sushko I, Novotarskyi S, Korner R, Pandey AK, Cherkasov A, Li J, Gramatica P, Hansen K, Schroeter T, Muller KR, Xi L, Liu H, Yao X, Oberg T, Hormozdiari F, Dao P, Sahinalp C, Todeschini R, Polishchuk P, Artemenko A, Kuz’min V, Martin TM, Young DM, Fourches D, Muratov E, Tropsha A, Baskin I, Horvath D, Marcou G, Muller C, Varnek A, Prokopenko VV, Tetko IV (2010) Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. J Chem Inf Model 50(12):2094–2111. doi:10.1021/ci100253r
Karpov PV, Baskin II, Palyulin VA, Zefirov NS (2011a) Virtual screening based on one-class classification. Dokl Chem 437(2):107–111
Karpov PV, Osolodkin DI, Baskin II, Palyulin VA, Zefirov NS (2011b) One-class classification as a novel method of ligand-based virtual screening: the case of glycogen synthase kinase 3ÐÐ inhibitors. Bioorg Med Chem Lett 21(22):6728–6731
Markou M, Singh S (2003a) Novelty detection: a review—part 1: statistical approaches. Signal Process 83(12):2481–2497
Markou M, Singh S (2003b) Novelty detection: A review—part 2: neural network based approaches. Signal Process 83(12):2499–2521
Kearsley SK, Smith GM (1990) An alternative method for the alignment of molecular structures: maximizing electrostatic and steric overlap. Tetrahedron Comput Methodol 3(6 PART C):615–633
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. ACM Trans Intel Syst Technol 2(3):27:21–27:27
Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49(23):6789–6801
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Maggiora GM (2006) On outliers and activity cliffs why QSAR often disappoints. J Chem Inf Mod 46(4):1535–1535. doi:10.1021/ci060117s
Carbo-Dorca R, Besalu E (2006) Generation of molecular fields, quantum similarity measures and related questions. J Math Chem 39(3–4):495–510. doi:10.1007/s10910-005-9046-9
Van Damme S, Bultinck P (2009) 3D QSAR based on conceptual DFT molecular fields: antituberculotic activity. J Mol Struct—THEOCHEM 943 (1–3):83–89. doi:10.1016/j.theochem.2009.10.031
Geerlings P, De Proft F, Langenaeker W (2003) Conceptual density functional theory. Chem Rev 103(5):1793–1874. doi:10.1021/cr990029p
Cruz V, Ramos J, Munoz-Escalona A, Lafuente P, Pena B, Martinez-Salazar J (2004) 3D-QSAR analysis of metallocene-based catalysts used in ethylene polymerisation. Polymer 45(6):2061–2072. doi:10.1016/j.polymer.2003.12.059
Cruz VL, Ramos J, Martinez S, Munoz-Escalona A, Martinez-Salazar J (2005) Structure–activity relationship study of the metallocene catalyst activity in ethylene polymerization. Organometallics 24(21):5095–5102. doi:10.1021/om050458f
Heritage TW, Ferguson AM, Turner DB, Willett P (1998) EVA: a novel theoretical descriptor for QSAR studies. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in drug design. Ligand-protein complexes and molecular similarity, vol 2. Kluwer Academic Publishers, London, pp 381–398
Wagener M, Sadowski J, Gasteiger J (1995) Autocorrelation of molecular surface properties for modeling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J Am Chem Soc 117(29):7769–7775. doi:10.1021/ja00134a023
Silverman BD, Platt DE (1996) Comparative molecular moment analysis (CoMMA): 3D-QSAR without molecular superposition. J Med Chem 39(11):2129–2140. doi:10.1021/jm950589q
Todeschini R, Gramatica P (1998) New 3D molecular descriptors: the WHIM theory and QSAR applications. In: Kubinyi H, Folkers G, Martin YC (eds) 3D QSAR in drug design. Ligand–protein complexes and molecular similarity, vol 2. Kluwer Academic Publishers, London, pp 355–380
Pastor M, Cruciani G, McLay I, Pickett S, Clementi S (2000) GRid-INdependent descriptors (GRIND): a novel class of alignment-independent three-dimensional molecular descriptors. J Med Chem 43(17):3233–3243. doi:jm000941m
Baroni M, Cruciani G, Sciabola S, Perruccio F, Mason JS (2007) A common reference framework for analyzing/comparing proteins and ligands. Fingerprints for Ligands and Proteins (FLAP): theory and application. J Chem Inf Mod 47(2):279–294
Cruciani G, Pastor M, Guba W (2000) VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. Eur J Pharm Sci 11(Suppl. 2):S29–S39. doi:S0928098700001627
Hamsici OC, Martinez AM (2009) Rotation invariant kernels and their application to shape analysis. IEEE Trans Pattern Anal 31(11):1985–1999. doi:10.1109/tpami.2008.234
Haasdonk B, Burkhardt H (2007) Invariant kernel functions for pattern analysis and machine learning. Mach Learn 68(1):35–61. doi:10.1007/s10994-007-5009-7
Wood J (1996) Invariant pattern recognition: A review. Pattern Recogn 29(1):1–17. doi:10.1016/0031-3203(95)00069-0
Azencott CA, Ksikes A, Swamidass SJ, Chen JH, Ralaivola L, Baldi P (2007) One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties. J Chem Inf Mod 47(3):965–974
Bishop CM (2006) Pattern ecognition and machine learning. Information science and statistics. Springer, New York
Baskin II, Zhokhova NI, Palyulin VA, Zefirov NS (2008) Additive inductive learning in QSAR/QSPR studies and molecular modeling. In: 4th German conference on chemoinformatics, November 9–11, 2008, Goslar, Germany, p 78
Erhan D, L’Heureux P-J, Yue SY, Bengio Y (2006) Collaborative filtering on a family of biological targets. J Chem Inf Model 46(2):626–635
Faulon J-L, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor. Bioinformatics 24(2):225–233. doi:10.1093/bioinformatics/btm580
Jacob L, Vert JP (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24(19):2149–2156
Geppert H, Humrich J, Stumpfe D, Gaertner T, Bajorath J (2009) Ligand prediction from protein sequence and small molecule information using support vector machines and fingerprint descriptors. J Chem Inf Mod 49(4):767–779. doi:10.1021/ci900004a
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
Cawley GC, Talbot NLC (2007) Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. J Mach Learn Res 8:841–861
Hall P, Robinson AP (2009) Reducing variability of crossvalidation for smoothing-parameter choice. Biometrika 96(1):175–186. doi:10.1093/biomet/asn068
Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Smola AJ, Mangasarian OL, Scholkopf B (2002) Sparse kernel feature analysis. In: classification, automation, and new media. Studies in classification, data analysis, and knowledge organization, pp 167–178
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for clustering. Pattern Recogn 41(1):176–190. doi:10.1016/j.patcog.2007.05.018
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
R: a language and environment for statistical computing. (2012). http://www.R-project.org/. Accessed 11 August 2014.
Acknowledgments
The authors thank Prof. Yu.A.Ustynyuk for stimulating discussion and advice. The authors also thank Prof. A.Varnek and Dr. G.Marcou for valuable comments regarding the developed approach. This work was supported by Russian Foundation for Basic Research (Grant 13-07-00511).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Baskin, I., Zhokhova, N. (2014). Continuous Molecular Fields Approach Applied to Structure-Activity Modeling. In: Gorb, L., Kuz'min, V., Muratov, E. (eds) Application of Computational Techniques in Pharmacy and Medicine. Challenges and Advances in Computational Chemistry and Physics, vol 17. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-9257-8_13
Download citation
DOI: https://doi.org/10.1007/978-94-017-9257-8_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-017-9256-1
Online ISBN: 978-94-017-9257-8
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)


