The Robust Models of Retention for Thin Layer Chromatography
We present an application of the machine learning methods for modelling the retention constants in the thin layer chromatography. First a feature selection algorithm is applied to reduce the feature space and then the regression models are built with a help of the random forest algorithm. The models obtained in this way have better correlation with the experimental data than the reference models built with linear regression. They are also robust—the cross-validation tests shows that the accuracy on unseen data is on average identical to the cross-validated accuracy obtained on the training set.
Keywordsrandom forest feature selection thin layer chromatography
Unable to display preview. Download preview PDF.
- 3.Kaliszan, R.: Quantitative relationships between molecular structure and chromatographic retention. Implications in physical, analytical, and medicinal chemistry. Critical Reviews in Analytical Chemistry 16, 323–383 (1986)Google Scholar
- 9.Kursa, M.B., Rudnicki, W.R.: Feature Selection with the Boruta Package. Journal Of Statistical Software 36(11) (2010)Google Scholar
- 10.Moffat, A.: Clarke’s Analysis of Drugs and Poisons, 3rd edn. Pharmaceutical Press, London (2004)Google Scholar
- 11.Pyka, A.: The application of topological indexes in TLC. Journal of Planar Chromatography—Modern TLC 14, 152–159 (2001)Google Scholar
- 12.R Development Core Team: R: A Language and Environment for Statistical Computing (2010), http://www.r-project.org/
- 13.Tetko, I.V., Gasteiger, J., Todeschini, R., Mauri, A., Livingstone, D., Ertl, P., Palyulin, V.a., Radchenko, E.V., Zefirov, N.S., Makarenko, A.S., Tanchuk, V.Y., Prokopenko, V.V.: Virtual computational chemistry laboratory-design and description. Journal of Computer-Aided Molecular Design 19(6), 453–463 (2005)CrossRefGoogle Scholar