Abstract
Remediation of water contaminated by organic pollutants is a major challenge, which could be improved by better knowledge on the aqueous solubility of organic compounds. Indeed, the aqueous solubility controls the fate and toxicity of pollutants. Here we performed a structure–property study based on a genetic algorithm for the prediction of aqueous solubility of chlorinated hydrocarbons. 1497 descriptors were calculated with the Dragon software. The variable selection method of the genetic algorithm was used to select an optimal subset of descriptors that have significant contribution to the overall aqueous solubility, from the large pool of calculated descriptors. The support vector machine was then employed to model the possible quantitative relationships between selected descriptors and aqueous solubility. Our results show that total size, polarizability and electronegativity modify the aqueous solubility of compounds. We also found that the support vector machine method gave better results than other methods such as principal component regression and partial least squares.
Similar content being viewed by others
References
Byvatov E, Fechner U, Sadowski J, Schneider G (2003) Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J Chem Inf Comput Sci 43:1882–1889. doi:10.1021/ci0341161
Cizmas L, Sharma VK, Gray CM, McDonald TJ (2015) Pharmaceuticals and personal care products in waters: occurrence, toxicity, and risk. Environ Chem Lett 13:381–394. doi:10.1007/s10311-015-0524-4
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. doi:10.1007/BF00994018
Delgado EJ (2002) Prediction aqueous solubility of chlorinated hydrocarbons from molecular structure. Fluid Phase Equilib 199:101–107. doi:10.1016/S0378-3812(01)00818-4
Dohányosová P, Sarraute S, Dohnal V, Majer V, Gomes MC (2004) Aqueous solubility and related thermodynamic functions of nonaromatic hydrocarbons as a function of molecular structure. Ind Eng Chem Res 43:2805–2815. doi:10.1021/ie030800t
Dsikowitzky L, Schwarzbauer J (2014) Industrial organic contaminants: identification, toxicity and fate in the environment. Environ Chem Lett 12:371–386. doi:10.1007/s10311-014-0467-1
Gunn SR (1998) Support vector machines for classification and regression. Technical Report, University of Southampton
Hibbert DB (1993) Genetic algorithms in chemistry. Chemom Intell Lab Syst 19:277–293. doi:10.1016/0169-7439(93)80028-G
Huibers PDT, Katritzky AR (1998) Correlation of the aqueous solubility of hydrocarbons and halogenated hydrocarbons with molecular structure. J Chem Inf Comput Sci 38:283–292. doi:10.1021/ci9700438
John EM, Shaike JM (2015) Chlorpyrifos: pollution and remediation. Environ Chem Lett 13:269–291. doi:10.1007/s10311-015-0513-7
Kasiotis KM, Emmanouil C (2015) Advanced PAH pollution monitoring by bivalves. Environ Chem Lett 13:395–411. doi:10.1007/s10311-015-0525-3
Kubinyi H (1994) Variable selection in QSAR studies. II. a highly efficient combination of systematic search and evolution. QSAR Comb Sci 13:393–401. doi:10.1002/qsar.19940130403
Leardi R (1994) Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection. J Chemom 8:65–79. doi:10.1002/cem.1180080107
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281. doi:10.1002/cem.1180060506
Liao Y, Fang SC, Nuttle HLW (2004) A neural network model with bounded-weights for pattern classification. Compu Oper Res 31:1411–1426. doi:10.1016/S0305-0548(03)00097-2
Liu HX, Zhang RS, Luan F, Yao XJ, Liu MC, Hu ZD, Fan BT (2003a) Diagnosing breast cancer based on support vector machines. J Chem Inf Comput Sci 43:900–907. doi:10.1021/ci0256438
Liu HX, Zhang RS, Yao XJ, Liu MC, Hu ZD, Fan BT (2003b) QSAR study of ethyl 2-[(3-methyl-2,5-dioxo(3-pyrrolinyl))amino]-4-(trifluoromethyl)pyrimidine-5-carboxylate: an inhibitor of AP-1 and NF-κB mediated gene expression based on support vector machines. J Chem Inf Comput Sci 43:1288–1296. doi:10.1021/ci0340355
Liu HX, Zhang RS, Yao XJ, Liu MC, Hu ZD, Fan BT (2004) Prediction of the isoelectric point of an amino acid based on GA-PLS and SVMs. J Chem Inf Comput Sci 44:161–167. doi:10.1021/ci034173u
Lucasius CB, Kateman G (1993) Understanding and using genetic algorithms Part 1. concepts, properties and context. Chemom Intell Lab Syst 19:1–33. doi:10.1016/0169-7439(93)80079-W
Lucasius CB, Kateman G (1994) Understanding and using genetic algorithms Part 2. representation, configuration and hybridization. Chemom Intell Lab Syst 25:99–145. doi:10.1016/0169-7439(94)85038-0
Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, Gramatica P, Jaworska JS, Kahn S, Klopman G, Marchant CA, Myatt G, Nikolova-Jeliazkova N, Patlewicz GY, Perkins R, Roberts DW, Schultz TW, Stanton DT, van de Sandt JJM, Tong W, Veith G, Yang C (2005) Current status of methods for defining the applicability domain of (quantitative) structure–activity relationships. ATLA 33:1–19
Norinder U (2003) Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing 55:337–346. doi:10.1016/S0925-2312(03)00374-6
Pan Y, Jiang J, Wang R, Cao H, Cui Y (2009) A novel QSPR model for prediction of lower flammability limits of organic compounds based on support vector machine. J Hazard Mater 168:962–969. doi:10.1016/j.jhazmat.2009.02.122
Pereda S, Awan JA, Mohammadi AH, Valtz A, Coquelet C, Brignole EA, Richon D (2009) Solubility of hydrocarbons in water: experimental measurements and modeling using a group contribution with association equation of state (GCA-EoS). Fluid Phase Equilib 275:52–59. doi:10.1016/j.fluid.2008.09.008
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT, London
Tijani JO, Fatoba OO, Babajide OO, Petrik LF (2016) Pharmaceuticals, endocrine disruptors, personal care products, nanomaterials and perfluorinated pollutants: a review. Environ Chem Lett 14:27–49. doi:10.1007/s10311-015-0537-z
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Wiley-VCH, Weinheim
Vapnik V (1998) Statistical learning theory. Wiley, New York
Young DC (2001) Computational chemistry: a practical guide for applying techniques to real-world problems. Wiley, New York
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bahadori, B., Atabati, M. & Zarei, K. Better prediction of aqueous solubility of chlorinated hydrocarbons using support vector machine modeling. Environ Chem Lett 14, 541–548 (2016). https://doi.org/10.1007/s10311-016-0561-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10311-016-0561-7