Skip to main content
Log in

Cross-column density functional theory–based quantitative structure-retention relationship model development powered by machine learning

  • Research Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Quantitative structure-retention relationship (QSRR) modeling has emerged as an efficient alternative to predict analyte retention times using molecular descriptors. However, most reported QSRR models are column-specific, requiring separate models for each high-performance liquid chromatography (HPLC) system. This study evaluates the potential of machine learning (ML) algorithms and quantum mechanical (QM) descriptors to develop QSRR models that can predict retention times across three different reversed-phase HPLC columns under varying conditions. Four machine learning methods—partial least squares (PLS) regression, ridge regression (RR), random forest (RF), and gradient boosting (GB)—were compared on a dataset of 360 retention times for 15 aromatic analytes. Molecular descriptors were calculated using density functional theory (DFT). Column characteristics like particle size and pore size and experimental conditions like temperature and gradient time were additionally used as descriptors. Results showed that the GB-QSRR model demonstrated the best predictive performance, with Q2 of 0.989 and root mean square error of prediction (RMSEP) of 0.749 min on the test set. Feature analysis revealed that solvation energy (SE), HOMO–LUMO energy gap (∆E HOMO–LUMO), total dipole moment (Mtot), and global hardness (η) are among the most influential predictors for retention time prediction, indicating the significance of electrostatic interactions and hydrophobicity. Our findings underscore the efficiency of ensemble methods, GB and RF models employing non-linear learners, in capturing local variations in retention times across diverse experimental setups. This study emphasizes the potential of cross-column QSRR modeling and highlights the utility of ML models in optimizing chromatographic analysis.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets and codes generated during and/or analyzed during the current study are available from the corresponding author upon request.

Abbreviations

HPLC:

High-performance liquid chromatography

RP-HPLC:

Reversed-phase high-performance liquid chromatography

QSRR:

Quantitative structure-retention relationship

NCV:

Nested cross-validation

ML:

Machine learning

GB:

Gradient boosting

RF:

Random forest

PLS:

Partial least squares

RR:

Ridge regression

ANN:

Artificial neural network

MLR:

Multiple linear regression

SVM:

Support vector machine

SVR:

Support vector regression

LASSO:

Least absolute shrinkage and selection operator

RMSE:

Root mean square error

MAE:

Mean absolute error

MAD:

Median absolute error

MSE:

Mean squared error

R 2 :

Coefficient of determination

r :

Correlation coefficient

DFT:

Density functional theory

SE:

Solvation energy

Mtot:

Total dipole moment

∆E HOMO-LUMO:

HOMO-LUMO energy gap

EA:

Electron affinity

IP:

Ionization potential

η:

Global hardness

μ:

Electronic chemical potential

ω:

Electrophilicity

References

  1. Kumar SD, Kumar DRH. Importance of RP-HPLC in analytical method development: a review. Int J Pharm Sci Res. 2012;3(12):4626.

    Google Scholar 

  2. Pasin D, Mollerup CB, Rasmussen BS, Linnet K, Dalsgaard PW. Development of a single retention time prediction model integrating multiple liquid chromatography systems: application to new psychoactive substances. Anal Chim Acta. 2021;1184: 339035.

    Article  CAS  PubMed  Google Scholar 

  3. Lei Z, Jing L, Qiu F, Zhang H, Huhman D, Zhou Z, et al. Construction of an ultrahigh pressure liquid chromatography-tandem mass spectral library of plant natural products and comparative spectral analyses. Anal Chem. 2015;87(14):7373–81.

    Article  CAS  PubMed  Google Scholar 

  4. Zapadka M, Kaczmarek M, Kupcewicz B, Dekowski P, Walkowiak A, Kokotkiewicz A, et al. An application of QSRR approach and multiple linear regression method for lipophilicity assessment of flavonoids. J Pharm Biomed Anal. 2019;164:681–9.

    Article  CAS  PubMed  Google Scholar 

  5. Welerowicz T, Buszewski B. The effect of stationary phase on lipophilicity determination of β-blockers using reverse-phase chromatographic systems. Biomed Chromatogr. 2005;19(10):725–36.

    Article  CAS  PubMed  Google Scholar 

  6. Giaginis C, Tsantili-Kakoulidou A. Quantitative structure–retention relationships as useful tool to characterize chromatographic systems and their potential to simulate biological processes. Chromatographia. 2013;76(5):211–26.

    Article  CAS  Google Scholar 

  7. Santoro AL, Carrilho E, Lanças FM, Montanari CA. Quantitative structure–retention relationships of flavonoids unraveled by immobilized artificial membrane chromatography. Eur J Pharm Sci. 2016;88:147–57.

    Article  CAS  PubMed  Google Scholar 

  8. Wen Y, Amos RIJ, Talebi M, Szucs R, Dolan JW, Pohl CA, et al. Retention index prediction using quantitative structure–retention relationships for improving structure identification in nontargeted metabolomics. Anal Chem. 2018;90(15):9434–40.

    Article  CAS  PubMed  Google Scholar 

  9. Buszewski B, Gadzała-Kopciuch RM, Markuszewski M, Kaliszan R. Chemically bonded silica stationary phases: synthesis, physicochemical characterization, and molecular mechanism of reversed-phase HPLC retention. Anal Chem. 1997;69(16):3277–84.

    Article  CAS  Google Scholar 

  10. Žuvela P, Skoczylas M, Jay Liu J, Ba̧czek T, Kaliszan R, Wong MW, et al. Column characterization and selection systems in reversed-phase high-performance liquid chromatography. Chemical reviews. 2019;119(6):3674–729.

    Article  PubMed  Google Scholar 

  11. Buszewski B, Walczak J, Skoczylas M, Haddad PR. High performance liquid chromatography as a molecular probe in quantitative structure-retention relationships studies of selected lipid classes on polar-embedded stationary phases. J Chromatogr A. 2019;1585:105–12.

    Article  CAS  PubMed  Google Scholar 

  12. Osipenko S, Nikolaev E, Kostyukevich Y. Retention time prediction with message-passing neural networks. Separations. 2022;9(10):291.

    Article  Google Scholar 

  13. Ba̧czek T, Wiczling P, Marszałł M, Heyden YV, Kaliszan R. Prediction of peptide retention at different HPLC conditions from multiple linear regression models. Journal of Proteome Research. 2005;4(2):555–63.

    Article  PubMed  Google Scholar 

  14. Domingo-Almenara X, Guijas C, Billings E, Montenegro-Burke JR, Uritboonthai W, Aisporna AE, et al. The METLIN small molecule dataset for machine learning-based retention time prediction. Nat Commun. 2019;10(1):5811.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Osipenko S, Bashkirova I, Sosnin S, Kovaleva O, Fedorov M, Nikolaev E, et al. Machine learning to predict retention time of small molecules in nano-HPLC. Anal Bioanal Chem. 2020;412:7767–76.

    Article  CAS  PubMed  Google Scholar 

  16. Falchi F, Bertozzi SM, Ottonello G, Ruda GF, Colombano G, Fiorelli C, et al. Kernel-based, partial least squares quantitative structure-retention relationship model for UPLC retention time prediction: a useful tool for metabolite identification. Anal Chem. 2016;88(19):9510–7.

    Article  CAS  PubMed  Google Scholar 

  17. Wolfer AM, Lozano S, Umbdenstock T, Croixmarie V, Arrault A, Vayer P. UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling. Metabolomics. 2016;12(1):8.

    Article  Google Scholar 

  18. Aicheler F, Li J, Hoene M, Lehmann R, Xu G, Kohlbacher O. Retention time prediction improves identification in nontargeted lipidomics approaches. Anal Chem. 2015;87(15):7698–704.

    Article  CAS  PubMed  Google Scholar 

  19. Cao M, Fraser K, Huege J, Featonby T, Rasmussen S, Jones C. Predicting retention time in hydrophilic interaction liquid chromatography mass spectrometry and its use for peak annotation in metabolomics. Metabolomics. 2015;11:696–706.

    Article  CAS  PubMed  Google Scholar 

  20. Kumari P, Van Laethem T, Hubert P, Fillet M, Sacré P-Y, Hubert C. Quantitative structure retention-relationship modeling: towards an innovative general-purpose strategy. Molecules. 2023;28(4):1696.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Snyder L, Dolan J, Carr P. A new look at the selectivity of RPC columns. The hydrophobic subtraction model evaluates the selectivity of HPLC reversed-phased columns so that researchers can choose a suitable substitute or a sufficiently orthogonal second column. Analytical chemistry. 2007;79(9):3254–62.

    Article  CAS  PubMed  Google Scholar 

  22. Kaliszan R. Quantitative structure-retention relationships applied to reversed-phase high-performance liquid chromatography. J Chromatogr A. 1993;656(1–2):417–35.

    Article  CAS  Google Scholar 

  23. Szucs R, Brown R, Brunelli C, Heaton JC, Hradski J. Structure driven prediction of chromatographic retention times: applications to pharmaceutical analysis. Int J Mol Sci. 2021;22(8):3848.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Song Q, Li J, Huo H, Cao Y, Wang Y, Song Y, et al. Retention time and optimal collision energy advance structural annotation relied on LC–MS/MS: an application in metabolite identification of an antidementia agent namely echinacoside. Anal Chem. 2019;91(23):15040–8.

    Article  CAS  PubMed  Google Scholar 

  25. Singh YR, Shah DB, Maheshwari DG, Shah JS, Shah S. Advances in AI-Driven retention prediction for different chromatographic techniques: unraveling the complexity. Crit Rev Anal Chem. 202331:1–1.

  26. Liu JJ, Alipuly A, Baczek T, Wong MW, Zuvela P. Quantitative structure-retention relationships with non-linear programming for prediction of chromatographic elution order. Int J Mol Sci. 2019;20(14):3443.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Souza ÉS, Kuhnen CA, da Silva JB, Yunes RA, Heinzen VEF. Quantitative structure–retention relationship modelling of esters on stationary phases of different polarity. J Mol Graph Model. 2009;28(1):20–7.

    Article  PubMed  Google Scholar 

  28. Buszewski B, Žuvela P, Sagandykova G, Walczak-Skierska J, Pomastowski P, David J, et al. Mechanistic chromatographic column characterization for the analysis of flavonoids using quantitative structure-retention relationships based on density functional theory. Int J Mol Sci. 2020;21(6):2053.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Witting M, Böcker S. Current status of retention time prediction in metabolite identification. J Sep Sci. 2020;43(9–10):1746–54.

    Article  CAS  PubMed  Google Scholar 

  30. Bouwmeester R, Martens L, Degroeve S. Generalized calibration across liquid chromatography setups for generic prediction of small-molecule retention times. Anal Chem. 2020;92(9):6571–8.

    Article  CAS  PubMed  Google Scholar 

  31. Stanstrup J, Neumann S, Vrhovsek U. PredRet: prediction of retention time by direct mapping between multiple chromatographic systems. Anal Chem. 2015;87(18):9421–8.

    Article  CAS  PubMed  Google Scholar 

  32. Zisi C, Sampsonidis I, Fasoula S, Papachristos K, Witting M, Gika HG, et al. QSRR modeling for metabolite standards analyzed by two different chromatographic columns using multiple linear regression. Metabolites. 2017;7(1):7.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Ulenberg S, Bączek T. Comparison of quantum mechanics protocols during the evaluation of quantitative structure-retention relationships supported by genetic-algorithm multiple linear regression. Journal of Chromatography Open. 2021;1: 100019.

    Article  Google Scholar 

  34. Kohn W, Sham LJ. Self-consistent equations including exchange and correlation effects. Phys Rev. 1965;140(4A):A1133.

    Article  Google Scholar 

  35. Chai J-D, Head-Gordon M. Long-range corrected hybrid density functionals with damped atom–atom dispersion corrections. Phys Chem Chem Phys. 2008;10(44):6615–20.

    Article  CAS  PubMed  Google Scholar 

  36. Foster JP, Weinhold F. Natural hybrid orbitals. Journal of the American Chemical Society. 1980;102(24):7211–8.

    Article  CAS  Google Scholar 

  37. Reed AE, Curtiss LA, Weinhold F. Intermolecular interactions from a natural bond orbital, donor-acceptor viewpoint. Chem Rev. 1988;88(6):899–926.

    Article  CAS  Google Scholar 

  38. Parr RG, Pearson RG. Absolute hardness: companion parameter to absolute electronegativity. J Am Chem Soc. 1983;105(26):7512–6.

    Article  CAS  Google Scholar 

  39. Pearson RG. Chemical hardness and density functional theory. J Chem Sci. 2005;117(5):369–77.

    Article  CAS  Google Scholar 

  40. Koopmans T. Über die Zuordnung von Wellenfunktionen und Eigenwerten zu den einzelnen Elektronen eines Atoms. Physica. 1934;1(1–6):104–13.

    Article  Google Scholar 

  41. Anderson LN, Oviedo MB, Wong BM. Accurate electron affinities and orbital energies of anions from a nonempirically tuned range-separated density functional theory approach. J Chem Theory Comput. 2017;13(4):1656–66.

    Article  CAS  PubMed  Google Scholar 

  42. Baerends EJ, Gritsenko OV, Van Meer R. The Kohn-Sham gap, the fundamental gap and the optical gap: the physical meaning of occupied and virtual Kohn-Sham orbital energies. Phys Chem Chem Phys. 2013;15(39):16408–25.

    Article  CAS  PubMed  Google Scholar 

  43. Wold H. Estimation of principal components and related models by iterative least squares. In: Krishnajah PR, editors. Multivariate analysis. NewYork: Academic Press; 1966. pp. 391–420.

  44. Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst. 2001;58(2):109–30.

    Article  CAS  Google Scholar 

  45. Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Networks. 1999;10(5):988–99.

    Article  CAS  PubMed  Google Scholar 

  46. Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185:1–17.

    Article  CAS  Google Scholar 

  47. Geladi P, Kowalski BR. An example of 2-block predictive partial least-squares regression with simulated data. Anal Chim Acta. 1986;185:19–32.

    Article  CAS  Google Scholar 

  48. Tikhonov AN, Arsenin VY. Solutions of Ill-Posed Problems. New York: Winston; 1977.

  49. Budka M, Gabrys B. Ridge regression ensemble for toxicity prediction. Procedia Computer Science. 2010;1(1):193–201.

    Article  Google Scholar 

  50. Svetnik V, Wang T, Tong C, Liaw A, Sheridan RP, Song Q. Boosting: an ensemble learning tool for compound classification and QSAR modeling. J Chem Inf Model. 2005;45(3):786–99.

    Article  CAS  PubMed  Google Scholar 

  51. Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. UK. Boca Raton: Taylor & Francis Ltd; 1984.

    Google Scholar 

  52. Unger KK, Skudas R, Schulte MM. Particle packed columns and monolithic columns in high-performance liquid chromatography-comparison and critical appraisal. J Chromatogr A. 2008;1184(1–2):393–415.

    Article  CAS  PubMed  Google Scholar 

  53. Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci. 2003;43(6):1947–58.

    Article  CAS  PubMed  Google Scholar 

  54. Palop JJ, Mucke L, Roberson ED. Quantifying biomarkers of cognitive dysfunction and neuronal network hyperexcitability in mouse models of Alzheimer’s disease: depletion of calcium-dependent proteins and inhibitory hippocampal remodeling. Methods Mol Biol. 2011;670:245–62.

  55. Bauer E, Kohavi R. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn. 1999;36(1):105–39.

    Article  Google Scholar 

  56. Krmar J, Svrkota B, Đajić N, Stojanović J, Protić A, Otašević B. QSRR approach: application to retention mechanism in liquid chromatography. In: Novel Aspects of Gas Chromatography and Chemometrics. IntechOpen; 2022.

  57. Rücker C, Rücker G, Meringer M. y-Randomization and its variants in QSPR/QSAR. J Chem Inf Model. 2007;47(6):2345–57.

    Article  PubMed  Google Scholar 

  58. Czub N, Pacławski A, Szlęk J, Mendyk A. Curated database and preliminary AutoML QSAR model for 5-HT1A receptor. Pharmaceutics. 2021;13(10):1711.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Kaliszan R. Quantitative structure-retention relationships. Anal Chem. 1992;64(11):619A-A631.

    Article  CAS  Google Scholar 

  60. Espinoza GZ, Angelo RM, Oliveira PR, Honorio KM. Evaluating deep learning models for predicting ALK-5 inhibition. PLoS ONE. 2021;16(1): e0246126.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Chen J, Quan X, Yazhi Z, Yan Y, Yang F. Quantitative structure–property relationship studies on n-octanol/water partitioning coefficients of PCDD/Fs. Chemosphere. 2001;44(6):1369–74.

    Article  CAS  PubMed  Google Scholar 

  62. Frank LE, Friedman JH. A statistical view of some chemometrics regression tools. Technometrics. 1993;35(2):109–35.

    Article  Google Scholar 

Download references

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science and ICT (2019R1A2C2084709 and 2021R1A4A3025742).

Author information

Authors and Affiliations

Authors

Contributions

Sargol Mazraedoost: conceptualization; data curation; methodology; software; formal analysis; writing—original draft preparation; writing—review and editing; visualization.

Petar Žuvela: conceptualization; methodology; supervision; writing—review and editing.

Szymon Ulenberg: experimental; writing materials and methods (instrumentation or equipment and chromatographic conditions)—review and editing.

Tomasz Bączek: experimental; writing materials and methods (instrumentation or equipment and chromatographic conditions)—review and editing.

J. Jay Liu: writing—review and editing; supervision; funding acquisition.

All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to J. Jay Liu.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 776 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mazraedoost, S., Žuvela, P., Ulenberg, S. et al. Cross-column density functional theory–based quantitative structure-retention relationship model development powered by machine learning. Anal Bioanal Chem 416, 2951–2968 (2024). https://doi.org/10.1007/s00216-024-05243-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-024-05243-7

Keywords

Navigation