Abstract
The combined data splitting feature selection (CDFS) is a new strategy in quantitative structure–property relation (QSPR) analysis, in which the sampling of training set is performed repeatedly to find a subset of molecular descriptors producing a stable QSPR model insensitive to the presence or absence of one or more compounds in the training set. Here, we used genetic algorithm-partial least square (GA-PLS) as modeling method in CDFS methodology and applied it to QSPR study of the GC retention of the analgesic drugs. A set of 58 analgesic drugs with known Kovats retention index were selected and a large number of theoretical descriptors was calculated for each molecule. The random sampling of the training set (80% of data) was performed 20 times and the remaining molecules were used as validation set. Each time, the most appropriate QSPR model was produced by GA-PLS. The selected descriptors of each run were then analyzed for similarity and frequency distribution. The final QSPR model, which obtained from the common descriptors between 50% of runs, possessed squared correlation coefficient of 0.924, 0.865 and 0.903 for training, validation and cross-validation, respectively. In addition, it was able to reproduce 85% of variances in the retention factor of external test set compounds.
Similar content being viewed by others
References
P. Gramatica, Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 26, 694 (2007)
P.X. Liu, W. Long, Current mathematical methods used in QSAR/QSPR studies. Int. J. Mol. Sci. 10, 1978 (2009)
B. Hemmateenejad, QSPR models for half-wave reduction potential of steroids: A comparative study between feature selection and feature extraction from subsets of or entire set of descriptors. M. Yazdani, Anal. Chim. Acta 634, 27 (2009)
B. Hemmateenejad, K. Javidnia, M. Nematollahi, M. Elyasi, QSAR studies on the antiviral agents of natural origin. J. Iran. Chem. Soc. 6, 420 (2009)
B. Hemmateenejad, K. Javidnia, M. Elyasi, Quantitative structure retention relationship studies of the Kovats retention indices of a large set of terpenes: a combined data splitting feature selection (CDFS) strategy. Anal. Chim. Acta 592, 72 (2007)
M. Jalali-Heravi, Z. Garkani-Nejad, A. Kyani, Quantitative structure–retention relationship study of a variety of compounds in reversed-phase liquid chromatography: A PLS-MLR-STANN approach. QSAR Comb. Sci. 27, 137 (2008)
J. Ghasemi, S. Saaidpour, QSRR prediction of the chromatographic retention behavior of painkiller drugs. J. Chromatogr. Sci. 47, 156 (2009)
H. Du, X.Q. Chen, A comparative study of the separation of oleanolic acid and ursolic acid in prunella vulgaris by high-performance liquid chromatography and cyclodextrin-modified micellar electrokinetic chromatograph. J. Iran. Chem. Soc. 6, 334 (2009)
A. Beteringhe, A.C. Radutiu, D.C. Culita, A. Mischie, F. Spafiu, Quantitative structure–retention relationship (QSRR) study for predicting gas chromatographic retention times for some stationary phases. QSAR Comb. Sci. 27, 996 (2008)
A.A. Amiri, B. Hemmateenejad, A. Safavi, H. Sharghi, A.R.S. Beni, M. Shamsipur, Structure-retention and mobile phase-retention relationships for reversed-phase high performance liquid chromatography of several hydroxythioxanthone derivatives in binary acetonitrile-water mixtures. Anal. Chim. Acta 605, 11 (2007)
M. Jalali-Heravi, M.H. Fatemi, Artificial neural network modeling of Kovats retention indices for noncyclic and monocyclic terpenes. J. Chromatogr. A 915, 177 (2001)
S. Riahi, E. Pourbasheer, M.R. Ganjali, P. Norouzi, Investigation of different linear and nonlinear chemometric methods for modeling of retention index of essential oil components: concerns to support vector machine. J. Haz. Mat. 166, 853 (2009)
B. Hemmateenejad, M. Shamsipur, A. Safavi, H. Sharghi, A.A. Amiri, Reversed-phase high performance liquid chromatography (RP-HPLC) characteristics of some 9, 10-anthraquinone derivatives using binary acetonitrile–water mixtures as mobile phase. Talanta 77, 351 (2008)
A.C. Moffat, M.D. Ossleton, B. Widdop, Clarke’s analysis of drugs and poisons (Pharmaceutical Press, London, 2004)
R. Todeschini, V. Consoni, Handbook of molecular descriptors (Wiley-VCH, Weinheim, 2000)
D. Rogers, A.J. Hopfinger, Application of genetic function approximation to quantitative structure–activity-relationships and quantitative structure–property relationships. J. Chem. Inf. Comput. Sci. 34, 854 (1994)
O. Deeb, B. Hemmateenejad, A. Jaber, R. Garduno-Juarez, R. Miri, Effects of electronic and physicochemical parameters on the carcinogenic activity of some sulfa drugs using QSAR analysis based on genetic-MLR and genetic-PLS. Chemosphere 67, 2122 (2007)
A. Mohajeri, B. Hemmateenejad, A.R. Mehdipour, R. Miri, Modeling calcium channel antagonist activity of dihydropyridine derivatives using quantum topological molecular similarity indices analyzed by GA-PLS and GA-PC-PLS. J. Mol. Graph. Model. 26, 1057 (2008)
G. Absalan, B. Hemmateenejad, M. Soleimani, M. Akhond, R. Miri, Quantitative structure–micelization relationship study of Gemini surfactants using genetic-MLR and genetic-PLS. QSAR Comb. Sci. 23, 416 (2004)
S. Wold, M. Sjostrom, L. Erikson, PLS-regression: a basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hemmateenejad, B., Javidnia, K., Miri, R. et al. Quantitative structure–retention relationship study of analgesic drugs by application of combined data splitting-feature selection strategy and genetic algorithm-partial least square. J IRAN CHEM SOC 9, 53–60 (2012). https://doi.org/10.1007/s13738-011-0005-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13738-011-0005-z