Abstract
The selection of the most relevant variables is an important step in the QSAR/QSPR modeling process. In this work we apply modified particle swarm optimization (MPSO) based on multiple linear regression (MLR) for selecting a small subset of descriptors that has significant contribution to the Gibbs energy of formation for a diverse set of organic compounds. Nonlinear relationships between selected molecular descriptors and Gibbs energy of formation are achieved by radial basis function neural network (RBF NN), adaptive neuro-fuzzy inference system (ANFIS), and support vector machine (SVM) methods. The MLR, RBF NN, ANFIS, and SVM squared correlation coefficients are 0.928, 0.946, 0.945, and 0.947, respectively. The obtained results suggest that the proposed MPSO is an efficient and powerful method for feature selection (descriptor selection) in the QSAR/QSPR studies.
Similar content being viewed by others
References
Varekova RS, Geidl S, Ionescu CM, Skrehota O, Kudera M, Sehnal D, Bouchal T, Abagyan R, Huber HJ, Koca J (2011) Predicting pKa values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. J Chem Inf Model 51:1795–1806
Li Y, Su L, Zhang X, Huang X, Zhai H (2011) Prediction of association constants of cesium chelates based on Uniform Design Optimized Support Vector Machine. Chemometr Intell Lab Syst 105:106–113
Oberg T, Liu T (2011) Extension of a prediction model to estimate vapor pressures of perfluorinated compounds (PFCs). Chemometr Intell Lab Syst 107:59–64
Golmohammadi H, Dashtbozorgi Z (2010) Quantitative structure–property relationship studies of gas-to-wet butyl acetate partition coefficient of some organic compounds using genetic algorithm and artificial neural network. Struct Chem 21:1241–1252
Jarvas G, Quellet C, Dallos A (2011) Estimation of Hansen solubility parameters using multivariate nonlinear QSPR modeling with COSMO screening charge density moments. Fluid Phase Equilib 309:8–14
Jiao L, Li H (2010) QSPR studies on the aqueous solubility of PCDD/Fs by using artificial neural network combined with stepwise regression. Chemometr Intell Lab Syst 103:90–95
Modarresi H, Modarress H, Dearden JC (2007) QSPR model of Henry’s law constant for a diverse set of organic chemicals based on genetic algorithm–radial basis function network approach. Chemosphere 66:2067–2076
Kazakov A, Muzny CD, Diky V, Chirico RD, Frenkel M (2010) Predictive correlations based on large experimental datasets: critical constants for pure compounds. Fluid Phase Equilib 298:131–142
Dutta D, Guha R, Wild D, Chen T (2007) Ensemble Feature Selection: consistent descriptor subsets for multiple QSAR models. J Chem Inf Model 47:989–997
Xu L, Zhang WJ (2001) Comparison of different methods for variable selection. Anal Chim Acta 446:477–483
Sutter JM, Sl Dixon, Jurs PC (1995) Automated descriptor selection for quantitative structure–activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84
Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure–activity relationships and quantitative structure–property relationships. J Chem Inf Comput Sci 34:854–866
Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281
Kubinyi H (1996) Evolutionary variable selection in regression and PLS analyses. J Chemom 10:119–133
Luke BT (1994) Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure–property relationships. J Chem Inf Comput Sci 34:1279–1287
Duchowicz PR, Castro EA, Fernandez FM, Gonzalez MPA (2005) a new search algorithm of QSPR/QSAR theories: normal boiling points of some organic molecules. Chem Phys Lett 412:376–380
Shen Q, Jiang JH, Tao Jc, Shen Gl, Yu RQ (2005) Modified ant colony optimization algorithm for variable selection in QSAR modeling: QSAR studies of cyclooxygenase inhibitors. J Chem Inf Model 45:1024–1029
Shamsipur M, Zare-Shahabadi V, Hemmateenejad B, Akhond M (2009) An efficient variable selection method based on the use of external memory in ant colony optimization. Application to QSAR/QSPR studies. Anal Chim Acta 646:39–46
Duchowicz PR, Castro EA, Fernandez FM (2008) Application of a novel ranking approach in QSPR-QSAR. J Math Chem 43:620–636
Shamsipur M, Zare-Shahabadi V, Hemmateenejad B, Akhond M (2009) Combination of ant colony optimization with various local search strategies. A novel method for variable selection in multivariate calibration and QSPR study. QSAR Comb Sci 28:1263–1275
Kennedy J, Eberhart RC. (1995) Particle swarm optimization. In: Proceedings of the 1995 international conference on neural networks, vol. 4. Perth, 27 November–1 December 1995
Marinakis Y, Marinaki M, Dounias G (2010) A hybrid particle swarm optimization algorithm for the vehicle routing problem. Eng Appl Artif Intel 23:463–472
Clerc M, Kennedy J (2002) Particle swarm—explosion, stability, and convergence in a ultidimensional complex space. IEEE Trans Evol Comput 6:58–73
Niknam T, Zeinoddini-Meymand H, Nayeripour M (2010) A practical algorithm for optimal operation management of distribution network including fuel cell power plants. Renew Energ 35:1696–1714
Firouzi BB, Zeinoddini-Meymand H, Niknam T, Mojarrad HD (2011) A novel multi-objective Chaotic Crazy Pso algorithm for optimal operation management of distribution network with regard to fuel cell power plants. Int J Innov Comput I 7:6395–6409
Andrews PS (2006) An investigation into mutation operators for particle swarm optimization. In: Proceedings of the 2006 congress on evolutionary computation (CEC’06), Vancubert, July 2006
Shen Q, Jiang JH, Jiao CX, Shen Gl, Yu RQ (2004) Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists. Eur J Pharm Sci 22:145–152
Yaws CL (2003) Yaws’ handbook of thermodynamic and physical properties of chemical compounds. Norwich, New York
Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graphics Modell 20:269–276
Talete srl, Dragon for windows (software for molecular descriptor calculations), (http://www/talete.mi.it/). Accessed 25 May 2011
Khajeh A, Modarress H, Rezaee B (2009) Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers. Expt Sys with Appl 36:5728–5732
Khajeh A, Modarress H (2010) Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expet Syst Appl 37:3070–3074
Jang J (1993) ANFIS: adaptive network-based fuzzy inference systems. IEEE Trans Systems Man Cybernet 23:665–685
Sugeno M (1985) Industrial applications of fuzzy control. Elsevier, Amsterdam
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge
Herbrich R (2002) Learning kernel classifiers. MIT Press, Cambridge
Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Wiley-VCH, Weinheim
Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2:267–278
Yager R, Filev D (1994) Approximate clustering via the mountain method, IEEE Trans. Syst Man Cybernet 24:1279–1284
Khajeh A, Modarress H (2010) QSPR prediction of flash point of esters by means of GFA and ANFIS. J Hazard Mater 179:715–720
Khajeh A, Modarress H (2011) Quantitative structure-property relationship for surface tension of some common alcohols. J Chemom 25:333–339
Khajeh A, Modarress H (2011) Quantitative structure–property relationship prediction of liquid thermal conductivity for some alcohols. Struct Chem 22:1315–1323
Khajeh A, Rasaei MR (2012) Diffusion coefficient prediction of acids in water at infinite dilution by QSPR method. Struct Chem 23:399–406
Khajeh A, Modarress H (2011) Quantitative structure–property relationship for flash point of alcohols. Ind Eng Chem Res 50:11337–11342
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. (http://www.csie.ntu.edu.tw/~cjlin/libsvm). Accessed 7 Sept 2011
Yan A (2006) Modeling of Gibbs energy of formation of organic compounds by linear and nonlinear methods. J Chem Inf Model 46:2299–2304
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Khajeh, A., Modarress, H. & Zeinoddini-Meymand, H. Modified particle swarm optimization method for variable selection in QSAR/QSPR studies. Struct Chem 24, 1401–1409 (2013). https://doi.org/10.1007/s11224-012-0165-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11224-012-0165-1