Skip to main content
Log in

Modified particle swarm optimization method for variable selection in QSAR/QSPR studies

  • Original Research
  • Published:
Structural Chemistry Aims and scope Submit manuscript

Abstract

The selection of the most relevant variables is an important step in the QSAR/QSPR modeling process. In this work we apply modified particle swarm optimization (MPSO) based on multiple linear regression (MLR) for selecting a small subset of descriptors that has significant contribution to the Gibbs energy of formation for a diverse set of organic compounds. Nonlinear relationships between selected molecular descriptors and Gibbs energy of formation are achieved by radial basis function neural network (RBF NN), adaptive neuro-fuzzy inference system (ANFIS), and support vector machine (SVM) methods. The MLR, RBF NN, ANFIS, and SVM squared correlation coefficients are 0.928, 0.946, 0.945, and 0.947, respectively. The obtained results suggest that the proposed MPSO is an efficient and powerful method for feature selection (descriptor selection) in the QSAR/QSPR studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Varekova RS, Geidl S, Ionescu CM, Skrehota O, Kudera M, Sehnal D, Bouchal T, Abagyan R, Huber HJ, Koca J (2011) Predicting pKa values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. J Chem Inf Model 51:1795–1806

    Article  Google Scholar 

  2. Li Y, Su L, Zhang X, Huang X, Zhai H (2011) Prediction of association constants of cesium chelates based on Uniform Design Optimized Support Vector Machine. Chemometr Intell Lab Syst 105:106–113

    Article  CAS  Google Scholar 

  3. Oberg T, Liu T (2011) Extension of a prediction model to estimate vapor pressures of perfluorinated compounds (PFCs). Chemometr Intell Lab Syst 107:59–64

    Article  Google Scholar 

  4. Golmohammadi H, Dashtbozorgi Z (2010) Quantitative structure–property relationship studies of gas-to-wet butyl acetate partition coefficient of some organic compounds using genetic algorithm and artificial neural network. Struct Chem 21:1241–1252

    Article  CAS  Google Scholar 

  5. Jarvas G, Quellet C, Dallos A (2011) Estimation of Hansen solubility parameters using multivariate nonlinear QSPR modeling with COSMO screening charge density moments. Fluid Phase Equilib 309:8–14

    Article  CAS  Google Scholar 

  6. Jiao L, Li H (2010) QSPR studies on the aqueous solubility of PCDD/Fs by using artificial neural network combined with stepwise regression. Chemometr Intell Lab Syst 103:90–95

    Article  CAS  Google Scholar 

  7. Modarresi H, Modarress H, Dearden JC (2007) QSPR model of Henry’s law constant for a diverse set of organic chemicals based on genetic algorithm–radial basis function network approach. Chemosphere 66:2067–2076

    Article  CAS  Google Scholar 

  8. Kazakov A, Muzny CD, Diky V, Chirico RD, Frenkel M (2010) Predictive correlations based on large experimental datasets: critical constants for pure compounds. Fluid Phase Equilib 298:131–142

    Article  CAS  Google Scholar 

  9. Dutta D, Guha R, Wild D, Chen T (2007) Ensemble Feature Selection: consistent descriptor subsets for multiple QSAR models. J Chem Inf Model 47:989–997

    Article  CAS  Google Scholar 

  10. Xu L, Zhang WJ (2001) Comparison of different methods for variable selection. Anal Chim Acta 446:477–483

    Article  CAS  Google Scholar 

  11. Sutter JM, Sl Dixon, Jurs PC (1995) Automated descriptor selection for quantitative structure–activity relationships using generalized simulated annealing. J Chem Inf Comput Sci 35:77–84

    Article  CAS  Google Scholar 

  12. Rogers D, Hopfinger AJ (1994) Application of genetic function approximation to quantitative structure–activity relationships and quantitative structure–property relationships. J Chem Inf Comput Sci 34:854–866

    Article  CAS  Google Scholar 

  13. Leardi R, Boggia R, Terrile M (1992) Genetic algorithms as a strategy for feature selection. J Chemom 6:267–281

    Article  CAS  Google Scholar 

  14. Kubinyi H (1996) Evolutionary variable selection in regression and PLS analyses. J Chemom 10:119–133

    Article  CAS  Google Scholar 

  15. Luke BT (1994) Evolutionary programming applied to the development of quantitative structure-activity relationships and quantitative structure–property relationships. J Chem Inf Comput Sci 34:1279–1287

    Article  CAS  Google Scholar 

  16. Duchowicz PR, Castro EA, Fernandez FM, Gonzalez MPA (2005) a new search algorithm of QSPR/QSAR theories: normal boiling points of some organic molecules. Chem Phys Lett 412:376–380

    Article  CAS  Google Scholar 

  17. Shen Q, Jiang JH, Tao Jc, Shen Gl, Yu RQ (2005) Modified ant colony optimization algorithm for variable selection in QSAR modeling: QSAR studies of cyclooxygenase inhibitors. J Chem Inf Model 45:1024–1029

    Article  CAS  Google Scholar 

  18. Shamsipur M, Zare-Shahabadi V, Hemmateenejad B, Akhond M (2009) An efficient variable selection method based on the use of external memory in ant colony optimization. Application to QSAR/QSPR studies. Anal Chim Acta 646:39–46

    Article  CAS  Google Scholar 

  19. Duchowicz PR, Castro EA, Fernandez FM (2008) Application of a novel ranking approach in QSPR-QSAR. J Math Chem 43:620–636

    Article  CAS  Google Scholar 

  20. Shamsipur M, Zare-Shahabadi V, Hemmateenejad B, Akhond M (2009) Combination of ant colony optimization with various local search strategies. A novel method for variable selection in multivariate calibration and QSPR study. QSAR Comb Sci 28:1263–1275

    Article  CAS  Google Scholar 

  21. Kennedy J, Eberhart RC. (1995) Particle swarm optimization. In: Proceedings of the 1995 international conference on neural networks, vol. 4. Perth, 27 November–1 December 1995

  22. Marinakis Y, Marinaki M, Dounias G (2010) A hybrid particle swarm optimization algorithm for the vehicle routing problem. Eng Appl Artif Intel 23:463–472

    Article  Google Scholar 

  23. Clerc M, Kennedy J (2002) Particle swarm—explosion, stability, and convergence in a ultidimensional complex space. IEEE Trans Evol Comput 6:58–73

    Article  Google Scholar 

  24. Niknam T, Zeinoddini-Meymand H, Nayeripour M (2010) A practical algorithm for optimal operation management of distribution network including fuel cell power plants. Renew Energ 35:1696–1714

    Article  Google Scholar 

  25. Firouzi BB, Zeinoddini-Meymand H, Niknam T, Mojarrad HD (2011) A novel multi-objective Chaotic Crazy Pso algorithm for optimal operation management of distribution network with regard to fuel cell power plants. Int J Innov Comput I 7:6395–6409

    Google Scholar 

  26. Andrews PS (2006) An investigation into mutation operators for particle swarm optimization. In: Proceedings of the 2006 congress on evolutionary computation (CEC’06), Vancubert, July 2006

  27. Shen Q, Jiang JH, Jiao CX, Shen Gl, Yu RQ (2004) Modified particle swarm optimization algorithm for variable selection in MLR and PLS modeling: QSAR studies of antagonism of angiotensin II antagonists. Eur J Pharm Sci 22:145–152

    Article  CAS  Google Scholar 

  28. Yaws CL (2003) Yaws’ handbook of thermodynamic and physical properties of chemical compounds. Norwich, New York

    Google Scholar 

  29. Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graphics Modell 20:269–276

    Article  CAS  Google Scholar 

  30. Talete srl, Dragon for windows (software for molecular descriptor calculations), (http://www/talete.mi.it/). Accessed 25 May 2011

  31. Khajeh A, Modarress H, Rezaee B (2009) Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers. Expt Sys with Appl 36:5728–5732

    Article  Google Scholar 

  32. Khajeh A, Modarress H (2010) Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expet Syst Appl 37:3070–3074

    Article  Google Scholar 

  33. Jang J (1993) ANFIS: adaptive network-based fuzzy inference systems. IEEE Trans Systems Man Cybernet 23:665–685

    Article  Google Scholar 

  34. Sugeno M (1985) Industrial applications of fuzzy control. Elsevier, Amsterdam

    Google Scholar 

  35. Vapnik V (1995) The nature of statistical learning theory. Springer, New York

    Book  Google Scholar 

  36. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines. Cambridge University Press, Cambridge

    Google Scholar 

  37. Herbrich R (2002) Learning kernel classifiers. MIT Press, Cambridge

    Google Scholar 

  38. Todeschini R, Consonni V (2009) Molecular descriptors for chemoinformatics. Wiley-VCH, Weinheim

    Book  Google Scholar 

  39. Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2:267–278

    Google Scholar 

  40. Yager R, Filev D (1994) Approximate clustering via the mountain method, IEEE Trans. Syst Man Cybernet 24:1279–1284

    Article  Google Scholar 

  41. Khajeh A, Modarress H (2010) QSPR prediction of flash point of esters by means of GFA and ANFIS. J Hazard Mater 179:715–720

    Article  CAS  Google Scholar 

  42. Khajeh A, Modarress H (2011) Quantitative structure-property relationship for surface tension of some common alcohols. J Chemom 25:333–339

    Article  CAS  Google Scholar 

  43. Khajeh A, Modarress H (2011) Quantitative structure–property relationship prediction of liquid thermal conductivity for some alcohols. Struct Chem 22:1315–1323

    Article  CAS  Google Scholar 

  44. Khajeh A, Rasaei MR (2012) Diffusion coefficient prediction of acids in water at infinite dilution by QSPR method. Struct Chem 23:399–406

    Article  CAS  Google Scholar 

  45. Khajeh A, Modarress H (2011) Quantitative structure–property relationship for flash point of alcohols. Ind Eng Chem Res 50:11337–11342

    Article  CAS  Google Scholar 

  46. Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. (http://www.csie.ntu.edu.tw/~cjlin/libsvm). Accessed 7 Sept 2011

  47. Yan A (2006) Modeling of Gibbs energy of formation of organic compounds by linear and nonlinear methods. J Chem Inf Model 46:2299–2304

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Modarress.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 591 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khajeh, A., Modarress, H. & Zeinoddini-Meymand, H. Modified particle swarm optimization method for variable selection in QSAR/QSPR studies. Struct Chem 24, 1401–1409 (2013). https://doi.org/10.1007/s11224-012-0165-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11224-012-0165-1

Keywords

Navigation