Skip to main content
Log in

The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders

  • Full-Length Paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Quantitative structure–activity relationship (QSAR) methodology aims to explore the relationship between molecular structures and experimental endpoints, producing a model for the prediction of new data; the predictive performance of the model must be checked by external validation. Clearly, the qualities of chemical structure information and experimental endpoints, as well as the statistical parameters used to verify the external predictivity have a strong influence on QSAR model reliability. Here, we emphasize the importance of these three aspects by analyzing our models on estrogen receptor binders (Endocrine disruptor knowledge base (EDKB) database). Endocrine disrupting chemicals, which mimic or antagonize the endogenous hormones such as estrogens, are a hot topic in environmental and toxicological sciences. QSAR shows great values in predicting the estrogenic activity and exploring the interactions between the estrogen receptor and ligands. We have verified our previously published model for additional external validation on new EDKB chemicals. Having found some errors in the used 3D molecular conformations, we redevelop a new model using the same data set with corrected structures, the same method (ordinary least-square regression, OLS) and DRAGON descriptors. The new model, based on some different descriptors, is more predictive on external prediction sets. Three different formulas to calculate correlation coefficient for the external prediction set \({(Q^{2}_{\rm EXT})}\) were compared, and the results indicated that the new proposal of Consonni et al. had more reasonable results, consistent with the conclusions from regression line, Williams plot and root mean square error (RMSE) values. Finally, the importance of reliable endpoints values has been highlighted by comparing the classification assignments of EDKB with those of another estrogen receptor binders database (METI): we found that 16.1% assignments of the common compounds were opposite (20 among 124 common compounds). In order to verify the real assignments for these inconsistent compounds, we predicted these samples, as a blind external set, by our regression models and compared the results with the two databases. The results indicated that most of the predictions were consistent with METI. Furthermore, we built a kNN classification model using the 104 consistent compounds to predict those inconsistent ones, and most of the predictions were also in agreement with METI database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hansch C, Leo A (1979) Substituent constants for correlation analysis in chemistry and biology. John Wiley and Sons, New York

    Google Scholar 

  2. Regulation (EC) No 1907/2006 of the European Parliament and of the Council (18/12/2006) Concerning REACH; http://eur-lex.europa.eu/LexUriServ/site/en/oj/2006/l_396/l_39620061230en00010849.pdf. Accessed 30 Dec 2006

  3. http://www.oecd.org/dataoecd/33/37/37849783.pdf. Accessed 22 Jan 2007

  4. Kojima H, Katsura E, Takeuchi S, Niiyama K, Kobayashi K (2004) Screening for estrogen and androgen receptor activities in 200 pesticides by in vitro reporter gene assays using Chinese hamster ovary cells. Environ Health Perspect 112: 524–531. doi:10.1289/ehp.6649

    Article  CAS  PubMed  Google Scholar 

  5. Kojima H, Iida MI, Katsura E, Kanetoshi A, Hori Y, Kobayashi K (2003) Effects of a diphenyl ether-type herbicide, chlornitrofen, and its amino derivative on androgen and estrogen receptor activities. Environ Health Perspect 111: 497–502. doi:10.1289/ehp.5724

    Article  CAS  PubMed  Google Scholar 

  6. Colborn T (1995) Environmental estrogens: health implications for humans and wildlife. Environ Health Perspect 103: 135–136

    Article  PubMed  Google Scholar 

  7. Colborn T, vom Saal FS, Soto AM (1993) Developmental effects of endocrine-disrupting chemicals in wildlife and humans. Environ Health Perspect 101: 378–384

    Article  CAS  PubMed  Google Scholar 

  8. Jensen TK, Toppari J, Keiding N, Skakkebaek NE (1995) Do environmental estrogens contribute to the decline in male reproductive health?. Clin Chem 41: 1896–1901

    CAS  PubMed  Google Scholar 

  9. Liu HX, Papa E, Gramatica P (2006) QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem Res Toxicol 19: 1540–1548. doi:10.1021/tx0601509

    Article  CAS  PubMed  Google Scholar 

  10. Liu HX, Papa E, Walker JD, Gramatica P (2007) In silico screening of estrogen-like chemicals based on different nonlinear classification models. J Mol Graph Model 26: 135–144. doi:10.1016/j.jmgm.2007.01.003

    Article  CAS  PubMed  Google Scholar 

  11. Cramer RD III, Patterson DE, Bunce JD (1988) Comparative molecular-field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 110: 5959–5967. doi:10.1021/ja00226a005

    Article  CAS  Google Scholar 

  12. Gantchev TG, Ali H, van Lier JE (1994) Quantitative structure–activity relationships/comparative molecular field analysis (QSAR/CoMFA) for receptor-binding properties of halogenated estradiol derivatives. J Med Chem 37: 4164–4176. doi:10.1021/jm00050a013

    Article  CAS  PubMed  Google Scholar 

  13. Waller CL, Oprea TI, Chae K, Park HK, Korach KS, Laws SC, Wiese TE, Kelce WR, Gray LEJr (1996) Ligand based identification of environmental estrogens. Chem Res Toxicol 9: 1240–1248. doi:10.1021/tx960054f

    Article  CAS  PubMed  Google Scholar 

  14. Marini F, Roncaglioni A, Novič M (2005) Variable selection and interpretation in structure-affinity correlation modeling of estrogen receptor binders. J Chem Inf Model 45: 1507–1519. doi:10.1021/ci0501645

    Article  CAS  PubMed  Google Scholar 

  15. Asikainen A, Ruuskanen J, Tuppurainen K (2003) Spectroscopic QSAR methods and self-organizing molecular field analysis for relating molecular structure and estrogenic activity. J Chem Inf Comp Sci 43: 1974–1981. doi:10.1021/ci034110b

    CAS  Google Scholar 

  16. Kurunczi L, Seclaman E, Oprea TI, Crisan L, Simon Z (2005) MTD-PLS: a PLS variant of the minimal topologic difference method. III. Mapping interactions between estradiol derivatives and the alpha estrogenic receptor. J Chem Inf Model 45: 1275–1281. doi:10.1021/ci050077c

    Article  CAS  PubMed  Google Scholar 

  17. Mekenyan O, Kamenska V, Serafimova R, Poellinger L, Brouwer A, Walker J (2002) Development and validation of an average mammalian estrogen receptor-based QSAR model. SAR QSAR Environ Res 13: 579–595. doi:10.1080/1062936021000020044

    Article  CAS  Google Scholar 

  18. Bradbury S, Kamenska V, Schmieder P, Ankley G, Mekenyan O (2000) A computationally based identification algorithm for estrogen receptor ligands: part 1. Predicting hERR binding affinity. Toxicol Sci 58: 253–269. doi:10.1093/toxsci/58.2.253

    Article  CAS  PubMed  Google Scholar 

  19. Asikainen A, Ruuskanen J, Tuppurainen K (2004) Consensus kNN QSAR: a versatile method for predicting the estrogenic activity of organic compounds in silico. A comparative study with five estrogen receptors and a large, diverse set of ligands. Environ Sci Technol 38: 6724–6729. doi:10.1021/es049665h

    Article  CAS  PubMed  Google Scholar 

  20. Zheng W, Tropsha A (2000) Novel variable selection quantitative structure-property relationship approach based on the k-nearest neighbor principle. J Chem Inf Comp Sci 40: 185–194. doi:10.1021/ci980033m

    CAS  Google Scholar 

  21. http://edkb.fda.gov/databasedoor.html. Accessed March 2006

  22. Ministry of Economy Trade and Industry, Japan (METI) (2002) Current status of testing methods development for endocrine disrupters. In: 6th meeting of the task force on endocrine disrupters testing and assessment (EDTA), 24–25 June 2002, Yokyo, Japan, 2002.http://www.meti.go.jp/interface/honsho/Search/English/search?query=gEndocappendix1e&whence=0&max=20&result=normal&sort=score&idxname=meti. Accessed 10 Sept 2008

  23. Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct?. QSAR Comb Sci 27: 1337–1345. doi:10.1002/qsar.200810084

    Article  CAS  Google Scholar 

  24. Schüürmann G, Ebert RU, Chen JW, Wang B, Kühne R (2008) External validation and prediction employing the predictive squared correlation coefficient: test set activity mean vs training set activity mean. J Chem Inf Model 48: 2140–2145. doi:10.1021/ci800253u

    Article  PubMed  Google Scholar 

  25. Consonni V, Ballabio D, Todeschini R (2009) Comments on the definition of the Q2 parameter for QSAR validation. J Chem Inf Model 49: 1669–1678. doi:10.1021/ci900115y

    Article  CAS  PubMed  Google Scholar 

  26. Kuiper GGJM, Lemmen JG, Carlsson B, Corton JC, Safe SH, van der Saag PT, Burg B, Gustafsson JA (1998) Interaction of estrogenic chemicals and phytoestrogens with estrogen receptor beta. Endocrinology 139: 4252–4263. doi:10.1210/en.139.10.4252

    Article  CAS  PubMed  Google Scholar 

  27. Shi LM, Fang H, Tong W, Wu J, Perkins R, Blair RM, Branham WS, Dial SL, Moland CL, Sheehan DM (2001) QSAR models using a large diverse set of estrogens. J Chem Inf Comp Sci 41: 186–195. doi:10.1021/ci000066d

    CAS  Google Scholar 

  28. Roncaglioni A, Piclin N, Pintore M, Benfenati E (2008) Binary classification models for endocrine disrupter effects mediated through the estrogen receptor. SAR QSAR Environ Res 19: 697–733. doi:10.1080/10629360802550606

    Article  CAS  PubMed  Google Scholar 

  29. http://pubchem.ncbi.nlm.nih.gov/. Accessed 10 Mar 2009

  30. http://chem.sis.nlm.nih.gov/chemidplus/. Accessed 10 Mar 2009

  31. HyperChem. (2002) Release 7.03 for Windows, molecular modeling system. Hypercube, Inc., Gainesville, FL. http://www.hyper.com/. Accessed 10 Mar 2009

  32. Todeschini R, Consonni V, Mauri A, Pavan M (2005) DRAGON, version 5.3 for Windows, software for the calculation of molecular descriptors. Talete srl, Milan, Italy. http://www.talete.mi.it/products/dragon_description.htm. Accessed 10 April 2009

  33. Katritzky AR, Lobanov VS, Karelson M (1994) CODESSA, University of Florida, Gainesville, FL. http://www.codessa-pro.com/. Accessed 10 April 2009

  34. Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J Chem Inf Comp Sci 34: 854–866. doi:10.1021/ci00020a020

    CAS  Google Scholar 

  35. Todeschini R, Consonni V, Pavan M (2002) MOBY DIGS, version 1.2 for Windows, software for multilinear regression analysis and variable subset selection by genetic algorithm. Talete srl, Milan, Italy. http://www.talete.mi.it/products/moby_description.htm. Accessed 10 April 2009

  36. Todeschini R, Consonni V, Maiocchi A (1999) The K correlation index: theory development and its application in chemometrics. Chemom Intell Lab Syst 46: 13–29. doi:10.1016/S0169-7439(98)00124-5

    Article  CAS  Google Scholar 

  37. Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111: 1361–1375. doi:10.1289/ehp.5758

    Article  CAS  PubMed  Google Scholar 

  38. Tropsha A, Gramatica P, Gombar VK (2003) The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb Sci 22: 69–77. doi:10.1002/qsar.200390007

    Article  CAS  Google Scholar 

  39. Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26: 694–701. doi:10.1002/qsar.200610151

    Article  CAS  Google Scholar 

  40. Sharaf MA, Illman DL, Kowalski BR (1986) Chemometrics. Wiley, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paola Gramatica.

Electronic Supplementary Material

The Below is the Electronic Supplementary Material.

ESM 1 (PDF 439 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Gramatica, P. The importance of molecular structures, endpoints’ values, and predictivity parameters in QSAR research: QSAR analysis of a series of estrogen receptor binders. Mol Divers 14, 687–696 (2010). https://doi.org/10.1007/s11030-009-9212-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-009-9212-2

Keywords

Navigation