Abstract
Quantitative structure–activity relationship (QSAR) methodology aims to explore the relationship between molecular structures and experimental endpoints, producing a model for the prediction of new data; the predictive performance of the model must be checked by external validation. Clearly, the qualities of chemical structure information and experimental endpoints, as well as the statistical parameters used to verify the external predictivity have a strong influence on QSAR model reliability. Here, we emphasize the importance of these three aspects by analyzing our models on estrogen receptor binders (Endocrine disruptor knowledge base (EDKB) database). Endocrine disrupting chemicals, which mimic or antagonize the endogenous hormones such as estrogens, are a hot topic in environmental and toxicological sciences. QSAR shows great values in predicting the estrogenic activity and exploring the interactions between the estrogen receptor and ligands. We have verified our previously published model for additional external validation on new EDKB chemicals. Having found some errors in the used 3D molecular conformations, we redevelop a new model using the same data set with corrected structures, the same method (ordinary least-square regression, OLS) and DRAGON descriptors. The new model, based on some different descriptors, is more predictive on external prediction sets. Three different formulas to calculate correlation coefficient for the external prediction set \({(Q^{2}_{\rm EXT})}\) were compared, and the results indicated that the new proposal of Consonni et al. had more reasonable results, consistent with the conclusions from regression line, Williams plot and root mean square error (RMSE) values. Finally, the importance of reliable endpoints values has been highlighted by comparing the classification assignments of EDKB with those of another estrogen receptor binders database (METI): we found that 16.1% assignments of the common compounds were opposite (20 among 124 common compounds). In order to verify the real assignments for these inconsistent compounds, we predicted these samples, as a blind external set, by our regression models and compared the results with the two databases. The results indicated that most of the predictions were consistent with METI. Furthermore, we built a kNN classification model using the 104 consistent compounds to predict those inconsistent ones, and most of the predictions were also in agreement with METI database.
Similar content being viewed by others
References
Hansch C, Leo A (1979) Substituent constants for correlation analysis in chemistry and biology. John Wiley and Sons, New York
Regulation (EC) No 1907/2006 of the European Parliament and of the Council (18/12/2006) Concerning REACH; http://eur-lex.europa.eu/LexUriServ/site/en/oj/2006/l_396/l_39620061230en00010849.pdf. Accessed 30 Dec 2006
http://www.oecd.org/dataoecd/33/37/37849783.pdf. Accessed 22 Jan 2007
Kojima H, Katsura E, Takeuchi S, Niiyama K, Kobayashi K (2004) Screening for estrogen and androgen receptor activities in 200 pesticides by in vitro reporter gene assays using Chinese hamster ovary cells. Environ Health Perspect 112: 524–531. doi:10.1289/ehp.6649
Kojima H, Iida MI, Katsura E, Kanetoshi A, Hori Y, Kobayashi K (2003) Effects of a diphenyl ether-type herbicide, chlornitrofen, and its amino derivative on androgen and estrogen receptor activities. Environ Health Perspect 111: 497–502. doi:10.1289/ehp.5724
Colborn T (1995) Environmental estrogens: health implications for humans and wildlife. Environ Health Perspect 103: 135–136
Colborn T, vom Saal FS, Soto AM (1993) Developmental effects of endocrine-disrupting chemicals in wildlife and humans. Environ Health Perspect 101: 378–384
Jensen TK, Toppari J, Keiding N, Skakkebaek NE (1995) Do environmental estrogens contribute to the decline in male reproductive health?. Clin Chem 41: 1896–1901
Liu HX, Papa E, Gramatica P (2006) QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem Res Toxicol 19: 1540–1548. doi:10.1021/tx0601509
Liu HX, Papa E, Walker JD, Gramatica P (2007) In silico screening of estrogen-like chemicals based on different nonlinear classification models. J Mol Graph Model 26: 135–144. doi:10.1016/j.jmgm.2007.01.003
Cramer RD III, Patterson DE, Bunce JD (1988) Comparative molecular-field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110: 5959–5967. doi:10.1021/ja00226a005
Gantchev TG, Ali H, van Lier JE (1994) Quantitative structure–activity relationships/comparative molecular field analysis (QSAR/CoMFA) for receptor-binding properties of halogenated estradiol derivatives. J Med Chem 37: 4164–4176. doi:10.1021/jm00050a013
Waller CL, Oprea TI, Chae K, Park HK, Korach KS, Laws SC, Wiese TE, Kelce WR, Gray LEJr (1996) Ligand based identification of environmental estrogens. Chem Res Toxicol 9: 1240–1248. doi:10.1021/tx960054f
Marini F, Roncaglioni A, Novič M (2005) Variable selection and interpretation in structure-affinity correlation modeling of estrogen receptor binders. J Chem Inf Model 45: 1507–1519. doi:10.1021/ci0501645
Asikainen A, Ruuskanen J, Tuppurainen K (2003) Spectroscopic QSAR methods and self-organizing molecular field analysis for relating molecular structure and estrogenic activity. J Chem Inf Comp Sci 43: 1974–1981. doi:10.1021/ci034110b
Kurunczi L, Seclaman E, Oprea TI, Crisan L, Simon Z (2005) MTD-PLS: a PLS variant of the minimal topologic difference method. III. Mapping interactions between estradiol derivatives and the alpha estrogenic receptor. J Chem Inf Model 45: 1275–1281. doi:10.1021/ci050077c
Mekenyan O, Kamenska V, Serafimova R, Poellinger L, Brouwer A, Walker J (2002) Development and validation of an average mammalian estrogen receptor-based QSAR model. SAR QSAR Environ Res 13: 579–595. doi:10.1080/1062936021000020044
Bradbury S, Kamenska V, Schmieder P, Ankley G, Mekenyan O (2000) A computationally based identification algorithm for estrogen receptor ligands: part 1. Predicting hERR binding affinity. Toxicol Sci 58: 253–269. doi:10.1093/toxsci/58.2.253
Asikainen A, Ruuskanen J, Tuppurainen K (2004) Consensus kNN QSAR: a versatile method for predicting the estrogenic activity of organic compounds in silico. A comparative study with five estrogen receptors and a large, diverse set of ligands. Environ Sci Technol 38: 6724–6729. doi:10.1021/es049665h
Zheng W, Tropsha A (2000) Novel variable selection quantitative structure-property relationship approach based on the k-nearest neighbor principle. J Chem Inf Comp Sci 40: 185–194. doi:10.1021/ci980033m
http://edkb.fda.gov/databasedoor.html. Accessed March 2006
Ministry of Economy Trade and Industry, Japan (METI) (2002) Current status of testing methods development for endocrine disrupters. In: 6th meeting of the task force on endocrine disrupters testing and assessment (EDTA), 24–25 June 2002, Yokyo, Japan, 2002.http://www.meti.go.jp/interface/honsho/Search/English/search?query=gEndocappendix1e&whence=0&max=20&result=normal&sort=score&idxname=meti. Accessed 10 Sept 2008
Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct?. QSAR Comb Sci 27: 1337–1345. doi:10.1002/qsar.200810084
Schüürmann G, Ebert RU, Chen JW, Wang B, Kühne R (2008) External validation and prediction employing the predictive squared correlation coefficient: test set activity mean vs training set activity mean. J Chem Inf Model 48: 2140–2145. doi:10.1021/ci800253u
Consonni V, Ballabio D, Todeschini R (2009) Comments on the definition of the Q2 parameter for QSAR validation. J Chem Inf Model 49: 1669–1678. doi:10.1021/ci900115y
Kuiper GGJM, Lemmen JG, Carlsson B, Corton JC, Safe SH, van der Saag PT, Burg B, Gustafsson JA (1998) Interaction of estrogenic chemicals and phytoestrogens with estrogen receptor beta. Endocrinology 139: 4252–4263. doi:10.1210/en.139.10.4252
Shi LM, Fang H, Tong W, Wu J, Perkins R, Blair RM, Branham WS, Dial SL, Moland CL, Sheehan DM (2001) QSAR models using a large diverse set of estrogens. J Chem Inf Comp Sci 41: 186–195. doi:10.1021/ci000066d
Roncaglioni A, Piclin N, Pintore M, Benfenati E (2008) Binary classification models for endocrine disrupter effects mediated through the estrogen receptor. SAR QSAR Environ Res 19: 697–733. doi:10.1080/10629360802550606
http://pubchem.ncbi.nlm.nih.gov/. Accessed 10 Mar 2009
http://chem.sis.nlm.nih.gov/chemidplus/. Accessed 10 Mar 2009
HyperChem. (2002) Release 7.03 for Windows, molecular modeling system. Hypercube, Inc., Gainesville, FL. http://www.hyper.com/. Accessed 10 Mar 2009
Todeschini R, Consonni V, Mauri A, Pavan M (2005) DRAGON, version 5.3 for Windows, software for the calculation of molecular descriptors. Talete srl, Milan, Italy. http://www.talete.mi.it/products/dragon_description.htm. Accessed 10 April 2009
Katritzky AR, Lobanov VS, Karelson M (1994) CODESSA, University of Florida, Gainesville, FL. http://www.codessa-pro.com/. Accessed 10 April 2009
Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J Chem Inf Comp Sci 34: 854–866. doi:10.1021/ci00020a020
Todeschini R, Consonni V, Pavan M (2002) MOBY DIGS, version 1.2 for Windows, software for multilinear regression analysis and variable subset selection by genetic algorithm. Talete srl, Milan, Italy. http://www.talete.mi.it/products/moby_description.htm. Accessed 10 April 2009
Todeschini R, Consonni V, Maiocchi A (1999) The K correlation index: theory development and its application in chemometrics. Chemom Intell Lab Syst 46: 13–29. doi:10.1016/S0169-7439(98)00124-5
Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111: 1361–1375. doi:10.1289/ehp.5758
Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22: 69–77. doi:10.1002/qsar.200390007
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26: 694–701. doi:10.1002/qsar.200610151
Sharaf MA, Illman DL, Kowalski BR (1986) Chemometrics. Wiley, New York
Author information
Authors and Affiliations
Corresponding author
Electronic Supplementary Material
The Below is the Electronic Supplementary Material.
Rights and permissions
About this article
Cite this article
Li, J., Gramatica, P. The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders. Mol Divers 14, 687–696 (2010). https://doi.org/10.1007/s11030-009-9212-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-009-9212-2