Modeling bioconcentration factor (BCF) using mechanistically interpretable descriptors computed from open source tool “PaDEL-Descriptor”
Predictive regression-based models for bioconcentration factor (BCF) have been developed using mechanistically interpretable descriptors computed from open source tool PaDEL-Descriptor (http://padel.nus.edu.sg/software/padeldescriptor/). A data set of 522 diverse chemicals has been used for this modeling study, and extended topochemical atom (ETA) indices developed by the present authors’ group were chosen as the descriptors. Due to the importance of lipohilicity in modeling BCF, XLogP (computed partition coefficient) was also tried as an additional descriptor. Genetic function approximation followed by multiple linear regression algorithm was applied to select descriptors, and subsequent partial least squares analyses were performed to establish mathematical equations for BCF prediction. The model generated from only ETA indices shows importance of seven descriptors in model development, while the model generated from ETA descriptors along with XlogP shows importance of four descriptors in model development. In general, BCF depends on lipophilicity, presence of heteroatoms, presence of halogens, fused ring system, hydrogen bonding groups, etc. The developed models show excellent statistical qualities and predictive ability. The developed models were used also for prediction of an external data set available from the literature, and good quality of predictions (R 2 pred = 0.812 and 0.826) was demonstrated. Thus, BCF can be predicted using ETA and XlogP descriptors calculated from open source PaDEL-Descriptor software in the context of aquatic chemical toxicity management.
KeywordsQSAR BCF PaDEL-Descriptor Mathematical modeling ETA XlogP
The authors thank Council of Scientific and Industrial Research (CSIR), New Delhi for awarding a major research project (no. 01 (2546)/11/EMR-II) to KR and a senior research fellowship to SP.
Declaration of interest
- CAESAR. http://www.caesar-project.eu. Accessed 23 Oct 2013
- Cerius2 version 4.10. Cerius 2 Version 4.10 is a product of Accelrys Inc., San Diego, CAGoogle Scholar
- CHEMPROP. http://www.ufz.de/index.php?en=6738. Accessed 23 Oct 2013
- Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi-and megavariate data analysis: principles and applications, 2nd edn. Umetrics Academy, Umetrics, Umea, SwedenGoogle Scholar
- Everitt B, Landau S, Leese M (2001) Cluster Analysis. Arnold, LondonGoogle Scholar
- Johnson AR, Wichern WD (2005) Applied multivariate statistical analysis. Pearson, DelhiGoogle Scholar
- OECD Document (2007) Guidance Document on the Validation of (Quantitative) 1226. http://search.oecd.org/officialdocuments/displaydocumentpdf/?cote=env/jm/mono(2007)2&doclanguage=en. Accessed 23 Oct 2013
- Roy K, Ghosh G (2003) Introduction of Extended topochemical atom (ETA) indices in the valence electron mobile (VEM) environment as tools for QSAR/QSPR studies. Internet Electron J Mol Des 2(9): 599–620. http://biochempress.com/Files/iejmd_2003_2_0599.pdf
- Roy K, Chakraborty P, Mitra I, Ojha PK, Kar S, Das RN (2013) Some case studies on application of “rm2” metrics for judging quality of quantitative structure–activity relationship predictions: emphasis on scaling of response data. J Comput Chem 34(12):1071–1082. doi: 10.1002/jcc.23231 CrossRefGoogle Scholar
- Tasmin R, Shimasaki Y, Tsuyama M, Qiu X, Khalil F, Okino N, Yamada N, Fukuda S, Kang IJ, Oshima Y (2013) Elevated water temperature reduces the acute toxicity of the widely used herbicide diuron to a green alga, Pseudokirchneriella subcapitata. Environ Sci Pollut Res Int. doi: 10.1007/s11356-013-1989-y Google Scholar
- Wold S (1995) PLS for multivariate linear modelling. In: van de Waterbeemd H (ed) Chemometric methods in molecular design. VCH, Weinheim, pp 195–218Google Scholar