Abstract
We have found that molecular shape and electrostatics, in conjunction with 2D structural fingerprints, are important variables in discriminating classes of active and inactive compounds. The subject of this paper is how to explore the selection of these variables and identify their relative importance in quantitative structure–activity relationships (QSAR) analysis. We show the use of these variables in a form of similarity searching with respect to a crystal structure of a known bound ligand. This analysis is then validated through k-fold cross-validation of enrichments via several common classifiers. Additionally, we show an effective methodology using the variables in hypothesis generation; namely, when the crystal structure of a bound ligand is not known.
Similar content being viewed by others
References
Daylight Theory Manual, Daylight CIS Inc., Mission Viejo, CA, http://www.daylight.com.
Barnard Chemical Information Ltd., Sheffield, UK, http://www.bci.gb.com.
MDL Information Systems, Inc., San Leandro, CA, http://www.mdli.com.
BioByte, Claremont, CA, http://www.biobyte.com.
Edusoft, San Francisco, CA, http://www.edusoft.com.
R.D. Cramer M.A. Poss M.A. Hermsmeier T.J. Caulfield M.C. Kowala M.T. Valentine (1999) J. Med. Chem., 42 3919
R.S. Pearlman K.M. Smith (1999) J. Chem. Inf. Comput. Sci., 39 28
Greco, G., Novellino, E. and Martin, Y.C., In Lipkowitz, K.B. and Boyd, D.B. (Eds.), Reviews in Computational Chemistry, VCH Publishers, New York, NY, 1997, pp. 183–240.
R. Wang Y. Fu L. Lai (1997) J. Chem. Inf. Comput. Sci., 37 615
J.L. Durant B.A. Leland D.R. Henry J.G. Nourse (2002) J. Chem. Inf. Comput. Sci., 42 1273
B.B. Masek A. Merchant J.B. Mattheews (1993) Proteins 17 193
J.C. Gower P. Legendre (1986) J. Classif. 3 5
T.A. Halgren (1996) J. Comput. Chem., 17 490
T.A. Halgren (1999) J. Comput. Chem., 20 720
Gasteiger, J. and Marsili, M., Tetrahedron Lett., (1978) 3181.
Fingerprint Module, Mesa Analytics & Computing, LLC, Santa Fe, NM, http://www.mesaac.com.
OEChem–C++ Theory Manual, OpenEye Scientific Software, Santa Fe, NM, http://www.eyesopen.com.
A.R. Leach V.J. Gillet (2003) An Introduction to Chemoinformatics Kluwer Boston, MA
M. Kubat R.C. Holte S. Matwin (1998) Mach. Learn., 30 195
A.K. Jain R.C. Dubes (1988) Algorithms for Clustering Data Prentice Hall Englewood Cliffs, NJ
J.A. Grant B.T. Pickup (1996) J. Comput. Chem., 17 1653
ROCS, OpenEye Scientific Software, Santa Fe, NM, http://www.eyesopen.com.
D.C. Spellmeyer A.K. Wong M.J. Bower (1997) J. Mol. Graph. Mod. 15 18
J. Boström (2001) J. Comput.-Aided Mol. Des., 15 1137
H.M. Berman J. Westbrook Z. Feng G. Gilliland T.N. Bhat H. Weissig I.N. Shindyalov P.E. Bourne (2000) Nucleic Acids Res., 28 235
Wombat Database, Sunset Molecular Discovery LLC, Santa Fe, NM, http://www.sunsetmolecular.com.
G.M. Downs J.M. Barnard K.B. Lipkowitz D.B. Boyd (Eds) (2002) Reviews in Computational Chemistry Wiley–VCH ew York, NY 1–40
R. Taylor (1995) J. Chem. Inf. Comput. Sci., 35 59
D. Butina (1999) J. Chem. Inf. Comput. Sci., 39 747
MacCuish, N.E. and MacCuish, J.D., Chemometrics and Chemoinformatics, ACS Symposium Series, in press.
R. Tarjan (1983) Inf. Process. Lett., 17 37
E. Fischer (1894) Ber. Dt. Chem. Ges., 27 2985
G.E. Kellogg S. Phatak A. Nicholls A. Grant (2003) QSAR Comb. Sci., 22 959
S.K. Kearsley G.M. Smith (1990) Tet. Comput. Met., 3 615
A.C. Good E.E. Hodgkin W.G. Richards (1992) J. Chem. Inf. Comput. Sci., 32 188
A.C. Good W.G. Richards (1993) J. Chem. Inf. Comput. Sci., 33 112
A. Jaklian D.B. Jack C. Bayly (2002) J. Comput. Chem. 23 1623–1641
Katz, A.H., Tawa, G.J., Mason, K., Gove, S. and Alvarez, J., In COMP92, 227th American Chemical Society National Meeting, Anaheim, CA, 2004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nicholls, A., MacCuish, N.E. & MacCuish, J.D. Variable selection and model validation of 2D and 3D molecular descriptors>. J Comput Aided Mol Des 18, 451–474 (2004). https://doi.org/10.1007/s10822-004-5202-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10822-004-5202-8