Abstract
Quantitative structure-retention relationship (QSRR) modeling has emerged as an efficient alternative to predict analyte retention times using molecular descriptors. However, most reported QSRR models are column-specific, requiring separate models for each high-performance liquid chromatography (HPLC) system. This study evaluates the potential of machine learning (ML) algorithms and quantum mechanical (QM) descriptors to develop QSRR models that can predict retention times across three different reversed-phase HPLC columns under varying conditions. Four machine learning methods—partial least squares (PLS) regression, ridge regression (RR), random forest (RF), and gradient boosting (GB)—were compared on a dataset of 360 retention times for 15 aromatic analytes. Molecular descriptors were calculated using density functional theory (DFT). Column characteristics like particle size and pore size and experimental conditions like temperature and gradient time were additionally used as descriptors. Results showed that the GB-QSRR model demonstrated the best predictive performance, with Q2 of 0.989 and root mean square error of prediction (RMSEP) of 0.749 min on the test set. Feature analysis revealed that solvation energy (SE), HOMO–LUMO energy gap (∆E HOMO–LUMO), total dipole moment (Mtot), and global hardness (η) are among the most influential predictors for retention time prediction, indicating the significance of electrostatic interactions and hydrophobicity. Our findings underscore the efficiency of ensemble methods, GB and RF models employing non-linear learners, in capturing local variations in retention times across diverse experimental setups. This study emphasizes the potential of cross-column QSRR modeling and highlights the utility of ML models in optimizing chromatographic analysis.
Graphical Abstract
Similar content being viewed by others
Data availability
The datasets and codes generated during and/or analyzed during the current study are available from the corresponding author upon request.
Abbreviations
- HPLC:
-
High-performance liquid chromatography
- RP-HPLC:
-
Reversed-phase high-performance liquid chromatography
- QSRR:
-
Quantitative structure-retention relationship
- NCV:
-
Nested cross-validation
- ML:
-
Machine learning
- GB:
-
Gradient boosting
- RF:
-
Random forest
- PLS:
-
Partial least squares
- RR:
-
Ridge regression
- ANN:
-
Artificial neural network
- MLR:
-
Multiple linear regression
- SVM:
-
Support vector machine
- SVR:
-
Support vector regression
- LASSO:
-
Least absolute shrinkage and selection operator
- RMSE:
-
Root mean square error
- MAE:
-
Mean absolute error
- MAD:
-
Median absolute error
- MSE:
-
Mean squared error
- R 2 :
-
Coefficient of determination
- r :
-
Correlation coefficient
- DFT:
-
Density functional theory
- SE:
-
Solvation energy
- Mtot:
-
Total dipole moment
- ∆E HOMO-LUMO:
-
HOMO-LUMO energy gap
- EA:
-
Electron affinity
- IP:
-
Ionization potential
- η:
-
Global hardness
- μ:
-
Electronic chemical potential
- ω:
-
Electrophilicity
References
Kumar SD, Kumar DRH. Importance of RP-HPLC in analytical method development: a review. Int J Pharm Sci Res. 2012;3(12):4626.
Pasin D, Mollerup CB, Rasmussen BS, Linnet K, Dalsgaard PW. Development of a single retention time prediction model integrating multiple liquid chromatography systems: application to new psychoactive substances. Anal Chim Acta. 2021;1184: 339035.
Lei Z, Jing L, Qiu F, Zhang H, Huhman D, Zhou Z, et al. Construction of an ultrahigh pressure liquid chromatography-tandem mass spectral library of plant natural products and comparative spectral analyses. Anal Chem. 2015;87(14):7373–81.
Zapadka M, Kaczmarek M, Kupcewicz B, Dekowski P, Walkowiak A, Kokotkiewicz A, et al. An application of QSRR approach and multiple linear regression method for lipophilicity assessment of flavonoids. J Pharm Biomed Anal. 2019;164:681–9.
Welerowicz T, Buszewski B. The effect of stationary phase on lipophilicity determination of β-blockers using reverse-phase chromatographic systems. Biomed Chromatogr. 2005;19(10):725–36.
Giaginis C, Tsantili-Kakoulidou A. Quantitative structure–retention relationships as useful tool to characterize chromatographic systems and their potential to simulate biological processes. Chromatographia. 2013;76(5):211–26.
Santoro AL, Carrilho E, Lanças FM, Montanari CA. Quantitative structure–retention relationships of flavonoids unraveled by immobilized artificial membrane chromatography. Eur J Pharm Sci. 2016;88:147–57.
Wen Y, Amos RIJ, Talebi M, Szucs R, Dolan JW, Pohl CA, et al. Retention index prediction using quantitative structure–retention relationships for improving structure identification in nontargeted metabolomics. Anal Chem. 2018;90(15):9434–40.
Buszewski B, Gadzała-Kopciuch RM, Markuszewski M, Kaliszan R. Chemically bonded silica stationary phases: synthesis, physicochemical characterization, and molecular mechanism of reversed-phase HPLC retention. Anal Chem. 1997;69(16):3277–84.
Žuvela P, Skoczylas M, Jay Liu J, Ba̧czek T, Kaliszan R, Wong MW, et al. Column characterization and selection systems in reversed-phase high-performance liquid chromatography. Chemical reviews. 2019;119(6):3674–729.
Buszewski B, Walczak J, Skoczylas M, Haddad PR. High performance liquid chromatography as a molecular probe in quantitative structure-retention relationships studies of selected lipid classes on polar-embedded stationary phases. J Chromatogr A. 2019;1585:105–12.
Osipenko S, Nikolaev E, Kostyukevich Y. Retention time prediction with message-passing neural networks. Separations. 2022;9(10):291.
Ba̧czek T, Wiczling P, Marszałł M, Heyden YV, Kaliszan R. Prediction of peptide retention at different HPLC conditions from multiple linear regression models. Journal of Proteome Research. 2005;4(2):555–63.
Domingo-Almenara X, Guijas C, Billings E, Montenegro-Burke JR, Uritboonthai W, Aisporna AE, et al. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat Commun. 2019;10(1):5811.
Osipenko S, Bashkirova I, Sosnin S, Kovaleva O, Fedorov M, Nikolaev E, et al. Machine learning to predict retention time of small molecules in nano-HPLC. Anal Bioanal Chem. 2020;412:7767–76.
Falchi F, Bertozzi SM, Ottonello G, Ruda GF, Colombano G, Fiorelli C, et al. Kernel-based, partial least squares quantitative structure-retention relationship model for UPLC retention time prediction: a useful tool for metabolite identification. Anal Chem. 2016;88(19):9510–7.
Wolfer AM, Lozano S, Umbdenstock T, Croixmarie V, Arrault A, Vayer P. UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling. Metabolomics. 2016;12(1):8.
Aicheler F, Li J, Hoene M, Lehmann R, Xu G, Kohlbacher O. Retention time prediction improves identification in nontargeted lipidomics approaches. Anal Chem. 2015;87(15):7698–704.
Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C. Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics. Metabolomics. 2015;11:696–706.
Kumari P, Van Laethem T, Hubert P, Fillet M, Sacré P-Y, Hubert C. Quantitative structure retention-relationship modeling: towards an innovative general-purpose strategy. Molecules. 2023;28(4):1696.
Snyder L, Dolan J, Carr P. A new look at the selectivity of RPC columns. The hydrophobic subtraction model evaluates the selectivity of HPLC reversed-phased columns so that researchers can choose a suitable substitute or a sufficiently orthogonal second column. Analytical chemistry. 2007;79(9):3254–62.
Kaliszan R. Quantitative structure-retention relationships applied to reversed-phase high-performance liquid chromatography. J Chromatogr A. 1993;656(1–2):417–35.
Szucs R, Brown R, Brunelli C, Heaton JC, Hradski J. Structure driven prediction of chromatographic retention times: applications to pharmaceutical analysis. Int J Mol Sci. 2021;22(8):3848.
Song Q, Li J, Huo H, Cao Y, Wang Y, Song Y, et al. Retention time and optimal collision energy advance structural annotation relied on LC–MS/MS: an application in metabolite identification of an antidementia agent namely echinacoside. Anal Chem. 2019;91(23):15040–8.
Singh YR, Shah DB, Maheshwari DG, Shah JS, Shah S. Advances in AI-Driven retention prediction for different chromatographic techniques: unraveling the complexity. Crit Rev Anal Chem. 202331:1–1.
Liu JJ, Alipuly A, Baczek T, Wong MW, Zuvela P. Quantitative structure-retention relationships with non-linear programming for prediction of chromatographic elution order. Int J Mol Sci. 2019;20(14):3443.
Souza ÉS, Kuhnen CA, da Silva JB, Yunes RA, Heinzen VEF. Quantitative structure–retention relationship modelling of esters on stationary phases of different polarity. J Mol Graph Model. 2009;28(1):20–7.
Buszewski B, Žuvela P, Sagandykova G, Walczak-Skierska J, Pomastowski P, David J, et al. Mechanistic chromatographic column characterization for the analysis of flavonoids using quantitative structure-retention relationships based on density functional theory. Int J Mol Sci. 2020;21(6):2053.
Witting M, Böcker S. Current status of retention time prediction in metabolite identification. J Sep Sci. 2020;43(9–10):1746–54.
Bouwmeester R, Martens L, Degroeve S. Generalized calibration across liquid chromatography setups for generic prediction of small-molecule retention times. Anal Chem. 2020;92(9):6571–8.
Stanstrup J, Neumann S, Vrhovsek U. PredRet: prediction of retention time by direct mapping between multiple chromatographic systems. Anal Chem. 2015;87(18):9421–8.
Zisi C, Sampsonidis I, Fasoula S, Papachristos K, Witting M, Gika HG, et al. QSRR modeling for metabolite standards analyzed by two different chromatographic columns using multiple linear regression. Metabolites. 2017;7(1):7.
Ulenberg S, Bączek T. Comparison of quantum mechanics protocols during the evaluation of quantitative structure-retention relationships supported by genetic-algorithm multiple linear regression. Journal of Chromatography Open. 2021;1: 100019.
Kohn W, Sham LJ. Self-consistent equations including exchange and correlation effects. Phys Rev. 1965;140(4A):A1133.
Chai J-D, Head-Gordon M. Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections. Phys Chem Chem Phys. 2008;10(44):6615–20.
Foster JP, Weinhold F. Natural hybrid orbitals. Journal of the American Chemical Society. 1980;102(24):7211–8.
Reed AE, Curtiss LA, Weinhold F. Intermolecular interactions from a natural bond orbital, donor-acceptor viewpoint. Chem Rev. 1988;88(6):899–926.
Parr RG, Pearson RG. Absolute hardness: companion parameter to absolute electronegativity. J Am Chem Soc. 1983;105(26):7512–6.
Pearson RG. Chemical hardness and density functional theory. J Chem Sci. 2005;117(5):369–77.
Koopmans T. Über die Zuordnung von Wellenfunktionen und Eigenwerten zu den einzelnen Elektronen eines Atoms. Physica. 1934;1(1–6):104–13.
Anderson LN, Oviedo MB, Wong BM. Accurate electron affinities and orbital energies of anions from a nonempirically tuned range-separated density functional theory approach. J Chem Theory Comput. 2017;13(4):1656–66.
Baerends EJ, Gritsenko OV, Van Meer R. The Kohn-Sham gap, the fundamental gap and the optical gap: the physical meaning of occupied and virtual Kohn-Sham orbital energies. Phys Chem Chem Phys. 2013;15(39):16408–25.
Wold H. Estimation of principal components and related models by iterative least squares. In: Krishnajah PR, editors. Multivariate analysis. NewYork: Academic Press; 1966. pp. 391–420.
Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst. 2001;58(2):109–30.
Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Networks. 1999;10(5):988–99.
Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185:1–17.
Geladi P, Kowalski BR. An example of 2-block predictive partial least-squares regression with simulated data. Anal Chim Acta. 1986;185:19–32.
Tikhonov AN, Arsenin VY. Solutions of Ill-Posed Problems. New York: Winston; 1977.
Budka M, Gabrys B. Ridge regression ensemble for toxicity prediction. Procedia Computer Science. 2010;1(1):193–201.
Svetnik V, Wang T, Tong C, Liaw A, Sheridan RP, Song Q. Boosting: an ensemble learning tool for compound classification and QSAR modeling. J Chem Inf Model. 2005;45(3):786–99.
Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. UK. Boca Raton: Taylor & Francis Ltd; 1984.
Unger KK, Skudas R, Schulte MM. Particle packed columns and monolithic columns in high-performance liquid chromatography-comparison and critical appraisal. J Chromatogr A. 2008;1184(1–2):393–415.
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–58.
Palop JJ, Mucke L, Roberson ED. Quantifying biomarkers of cognitive dysfunction and neuronal network hyperexcitability in mouse models of Alzheimer’s disease: depletion of calcium-dependent proteins and inhibitory hippocampal remodeling. Methods Mol Biol. 2011;670:245–62.
Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn. 1999;36(1):105–39.
Krmar J, Svrkota B, Đajić N, Stojanović J, Protić A, Otašević B. QSRR approach: application to retention mechanism in liquid chromatography. In: Novel Aspects of Gas Chromatography and Chemometrics. IntechOpen; 2022.
Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007;47(6):2345–57.
Czub N, Pacławski A, Szlęk J, Mendyk A. Curated database and preliminary AutoML QSAR model for 5-HT1A receptor. Pharmaceutics. 2021;13(10):1711.
Kaliszan R. Quantitative structure-retention relationships. Anal Chem. 1992;64(11):619A-A631.
Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM. Evaluating deep learning models for predicting ALK-5 inhibition. PLoS ONE. 2021;16(1): e0246126.
Chen J, Quan X, Yazhi Z, Yan Y, Yang F. Quantitative structure–property relationship studies on n-octanol/water partitioning coefficients of PCDD/Fs. Chemosphere. 2001;44(6):1369–74.
Frank LE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics. 1993;35(2):109–35.
Funding
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (2019R1A2C2084709 and 2021R1A4A3025742).
Author information
Authors and Affiliations
Contributions
Sargol Mazraedoost: conceptualization; data curation; methodology; software; formal analysis; writing—original draft preparation; writing—review and editing; visualization.
Petar Žuvela: conceptualization; methodology; supervision; writing—review and editing.
Szymon Ulenberg: experimental; writing materials and methods (instrumentation or equipment and chromatographic conditions)—review and editing.
Tomasz Bączek: experimental; writing materials and methods (instrumentation or equipment and chromatographic conditions)—review and editing.
J. Jay Liu: writing—review and editing; supervision; funding acquisition.
All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mazraedoost, S., Žuvela, P., Ulenberg, S. et al. Cross-column density functional theory–based quantitative structure-retention relationship model development powered by machine learning. Anal Bioanal Chem 416, 2951–2968 (2024). https://doi.org/10.1007/s00216-024-05243-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-024-05243-7