Abstract
An automated PLS engine, WB-PLS, was applied to 1632 QSAR series with at least 25 compounds per series extracted from WOMBAT (WOrld of Molecular BioAcTivity). WB-PLS extracts a single Y variable per series, as well as pre-computed X variables from a table. The table contained 2D descriptors, the drug-like MDL 320 keys as implemented in the Mesa A&C Fingerprint module, and in-house generated topological-pharmacophore SMARTS counts and fingerprints. Each descriptor type was treated as a block, with or without scaling. Cross-validation, variable importance on projections (VIP) above 0.8 and q 2⩾0.3 were applied for model significance. Among cross-validation methods, leave-one-in-seven-out (CV7) is a better measure of model significance, compared to leave-one-out (measuring redundancy) and leave-half-out (too restrictive). SMARTS counts overlap with 2D descriptors (having a more quantitative nature), whereas MDL keys overlap with in-house fingerprints (both are more qualitative). The SMARTS counts is the most effective descriptor system, when compared to the other three. At the individual level, size-related descriptors and topological indices (in the 2D property space), and branched SMARTS, aromatic and ring atom types and halogens are found to be most relevant according to the VIP criterion.
Similar content being viewed by others
References
C. Hansch T. Fujita (1964) J. Am. Chem. Soc., 86 1616
S.M. Free SuffixJr. J.W. Wilson (1964) J. Med. Chem., 7 395
R. Todeschini V. Consonni (2000) Handbook of Molecular Descriptors Wiley-VCH Weinheim
C. Hansch A. Leo (1995) Exploring QSAR. Fundamentals and Applications in Chemistry and Biology ACS Publishers Washington, DC
D.J. Livingstone (2000) J. Chem. Inf. Comput. Sci., 40 195
Kubinyi, H., unpublished results.
Leo, A. and Weininger, D., CMR3. Daylight Chemical Information Systems, Santa Fe, New Mexico, htttp://www.daylight.com/, 1995.
A. Leo (1993) Chem. Rev., 5 1281
Leo, A. and Weininger, D., CLOGP 4.0. Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2001.
http://www.qsar.org/resource/software/htm, accessed in June 2002.
Y. Ran N. Jain S.H. Yalkowsky (2001) J. Chem. Inf. Comput. Sci., 41 1208
D.J. Livingstone M.G. Ford J.J. Huuskonen D.W. Salt (2001) J. Comput.-Aided Mol. Design 15 741
R.C. Glen (1994) J. Comput.-Aided Mol. Design 8 457
J. Hinze H.H. Jaffe (1962) J. Am. Chem. Soc., 84 540
J. Hinze M.A. Whitehead H.H. Jaffe (1963) J. Am. Chem. Soc., 85 148
J. Gasteiger M. Marsili (1980) Tetrahedron 36 3219
Hansch et al. (2003) J. Chem. Inf. Comput. Sci., 43 120
O.A. Raevsky V.Yu. Grigor’ev D. Kireev N.S. Zefirov (1992) Quant. Struct.-Act. Relat., 11 49
HYBOT. TimTec Inc., Moscow, Russia, http://www.timtec.net/software/hybotplus.htm, 1998.
A.M. Zissimos M.H. Abraham M.C. Barker K.J. Box K.Y. Tam (2002) J. Chem. Soc. Perkin 2 3 470
L.B. Kier L.H. Hall (1999) Molecular Structure Description: The Electrotopological State Academic Press New York
T.I. Oprea (2000) J. Comput.-Aided Mol. Design 14 251
A.T. Balaban (1998) SAR QSAR Environ. Res., 8 1
L.B. Kier L.H. Hall (1986) Molecular Connectivity in Structure-Activity Analysis John Wiley New York
An analysis [26] using over 200 topological indices on over 1000 diverse structures revealed that these descriptors are grouped in 18 clusters that can be related to size, bond information, and molecular complexity (among other properties).
S.C. Basak A.T. Balaban G.D. Grunwald B.D. Gute (2000) J. Chem. Inf. Comput. Sci., 40 891
R.D. Cramer SuffixIII D.E. Patterson J.D. Bunce (1988) J. Am. Chem. Soc., 110 5959
P.J. Goodford (1985) J. Med. Chem., 28 849
Wold, S., Johansson, E. and Cocchi, M., In Kubinyi, H. (Ed), 3D QSAR in Drug Design: Theory, Methods and Applications, ESCOM, Leiden, 1993, pp. 523-550.
H. Kubinyi (Eds) (1993) 3D QSAR in Drug Design: Theory Methods and Applications ESCOM Leiden
H. Kubinyi G. Folkers Y.C. Martin (1998) 3D QSAR in Drug Design, Vol. 2. Ligand Protein Interactions and Molecular Similarity Kluwer/ESCOM Dordrecht
H. Kubinyi G. Folkers Y.C. Martin (1998) 3D QSAR in Drug Design, Vol. 3. Recent Advances Kluwer/ESCOM Dordrecht
Cramer III, R.D. and Wold, S.B., US pat. 5025388 (1991). (CAN 115:135113).
S.H. Unger C. Hansch (1973) J. Med. Chem., 16 745
D.C. Whitley M.G. Ford D.J. Livingstone (2000) J. Chem. Inf. Comput. Sci., 40 1160
M.M.C. Ferreira C.A. Montanari A.C. Gaudio (2002) Quimica Nova 25 439
O. Nicolotti V.J. Gillet P.J. Fleming D.V.S. Green (2002) J. Med. Chem., 45 5069
A. Golbraikh M. Shen Z. Xiao Y.-D. Xiao K.-H. Lee A. Tropsha (2003) J. Comput.-Aided Mol. Design, 17 241
D. Weininger (1988) J. Chem. Inf. Comput. Sci., 28 31
WB-PLS 1.0, developed at Sunset Molecular Discovery LLC, Santa Fe, New Mexico, http://www.sunsetmolecular.com/, 2004.
WOMBAT database, Sunset Molecular Discovery LLC, Santa Fe, New Mexico, http://www.sunsetmolecular.com/, 2004.
J.L. Durant B.A. Leland D.R. Henry J.G. Nourse (2002) J. Chem. Inf. Comput. Sci., 42 1273
SMARTS, Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/dayhtml/doc/theory.smarts.html; online SMARTS tutorial: http://www.daylight.com/dayhtml/doc/theory.smarts. html, 2004.
C.A. Lipinski F. Lombardo B.W. Dominy P.J. Feeney (1997) Adv. Drug Delivery Rev., 23 3
MacCuish J. and MacCuish N., Measures Software, Mesa Analytics and Computing LLC, Santa Fe, New Mexico.
G. Schneider W. Neidhart T. Giller G. Schmidt (1999) Angew. Chem. Int. Ed. Engl. 38 2894
E. Byvatov U. Fechner J. Sadowski G. Schneider (2003) J. Chem. Inf. Comput. Sci., 43 1882
Daylight Toolkit v4.81, Daylight Chemical Information Systems, Santa Fe, New Mexico, http://www.daylight.com/, 2003.
OEChem v1.2, Openeye Scientific Software, Santa Fe, New Mexico, http://www.eyesopen.com/, 2004.
S. Wold A. Ruhe H. Wold W.J. Dunn SuffixIII (1984) SIAM J. Sci. Stat. Comput., 5 735
J. Trygg (2001) Parsimonious Multivariate Models Umetrics Academy Umeå
A. Höskuldsson (1998) J. Chemometr., 2 211
R.D. Cramer J.D. Bunce D.E. Patterson I.E. Frank (1988) Quant. Struct.-Act. Relat., 7 18
S. Wold (1978) Technometrics 20 397
Statistical parameters are described in the SIMCA user manual; the software is available from Umetrics, Umeå, Sweden, web site: http://www.umetrics.com/.
L. Eriksson E. Johansson N. Kettaneh-Wold S. Wold (2001) Multi- and Megavariate Data Analysis. Principles and Applications Umetrics Academy Umeå
E. Zhu R.M. Barnes (1995) J. Chemometr. 9 363
These figures are available from the authors upon request.
T.I. Oprea J. Gottfries (2001) J. Comb. Chem., 3 157
T.I. Oprea (2002) J. Braz. Chem. Soc., 13 811
C. Hansch D. Hoekman A. Leo D. Weininger C.D. Selassie (2002) Chem. Rev., 102 783
By default, for cross-validation the SIMCA-P software divides the original data into 7 groups; see the user manual or the document http://www.umetrics.com/download/KB/Multivariate%20FAQ.pdf, 2004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Olah, M., Bologa, C. & Oprea, T.I. An automated PLS search for biologically relevant QSAR descriptors. J Comput Aided Mol Des 18, 437–449 (2004). https://doi.org/10.1007/s10822-004-4060-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-4060-8