Abstract
Chromatographic retention times are usually modeled considering only one analyte at a time. However, it has certain limitations as no information is shared between the analytes, and consequently the model predictions poorly generalize to out-of-sample analytes. In this work, a publicly available dataset was used to illustrate the benefits of pooling the individual data and analyzing them simultaneously utilizing Bayesian hierarchical approach. Statistical analysis was carried out using the Stan program coupled with R, which enables full Bayesian inference with Markov chain Monte Carlo sampling. This methodology allows (i) incorporating prior knowledge about the likely values of model parameters, (ii) considering the between-analyte variability and the correlation between the model parameters, (iii) explaining the between-analyte variability by available predictors, and (iv) sharing information across the analytes. The latter is especially valuable when only limited information is available in the data about certain model parameters. The results are obtained in the form of posterior probability distribution, which quantifies uncertainty about the model parameters and predictions. Posterior probability is also directly relevant for decision-making. In this work, we used the Neue model to describe the relationship between retention factor and acetonitrile content in the mobile phase for 1026 analytes. The model was parametrized in terms of retention factor in 100% water, retention factor in 100% acetonitrile, and curvature coefficient, and considered log P and pKa as predictors. From this analysis, we discovered that the analytes formed two clusters with different retention depending on the degree of analyte dissociation. The final model turned out to be well calibrated with the data. It gives insight into the behavior of analytes in the chromatographic column and can be used to make predictions for a structurally diverse set of analytes if their log P and pKa values are known.
Similar content being viewed by others
References
Snyder LR, Kirkland JJ, Dolan JW. Introduction to modern liquid chromatography, 2nd ed. New York: John Wiley & Sons, Inc.; 2009.
Nikitas P, Pappa-Louisi A. Retention models for isocratic and gradient elution in reversed-phase liquid chromatography. Journal of chromatography. A 2009;1216(10):1737–1755. https://doi.org/10.1016/j.chroma.2008.09.051
Neue UD. Nonlinear Retention Relationships in Reversed-Phase Chromatography. Chromatographia 2006;63(S13):S45–S53. https://doi.org/10.1365/s10337-006-0718-9, http://www.springerlink.com/index/10.1365/s10337-006-0718-9.
Gelman A. Multilevel (Hierarchical) Modeling: What It Can and Cannot Do. Technometrics 2006; 48(3):432–435. https://doi.org/10.1198/004017005000000661.
Hox J. Multilevel analysis: Techniques and applications, 2nd ed. New York: Routledge; 2010.
Stangl DK. Prediction and decision making using Bayesian hierarchical models. Stat Med 1995; 14(20):2173–2190.
Wiczling P. Analyzing chromatographic data using multilevel modeling. Anal Bioanal Chem 2018; 410(16):3905–3915. https://doi.org/10.1007/s00216-018-1061-3.
Haddad PR, Taraji M, Szücs R. Prediction of Analyte Retention Time in Liquid Chromatography. Anal Chem 2021;93(1):228–256. https://doi.org/10.1021/acs.analchem.0c04190.
Bouwmeester R, Gabriels R, Hulstaert N, Martens L, Degroeve S. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat Methods 2021;18(11):1363–1369. https://doi.org/10.1038/s41592-021-01301-5.
Giese S H, Sinn L R, Wegner F, Rappsilber J. Retention time prediction using neural networks increases identifications in crosslinking mass spectrometry. Nat Commun 2021;12(1):3237. https://doi.org/10.1038/s41467-021-23441-0.
McElreath R. 2016. Statistical rethinking: a bayesian course with examples in r and stan.
Gelman A, Simpson D, Betancourt M. The prior can often only be understood in the context of the likelihood. Entropy 2017;19(10):555. https://doi.org/10.4324/9781315650982.
Boswell PG, Schellenberg JR, Carr PW, Cohen JD, Hegeman AD. Easy and accurate high-performance liquid chromatography retention prediction with different gradients, flow rates, and instruments by back-calculation of gradient and flow rate profiles. J Chromatogr A 2011;1218(38):6742–6749. https://doi.org/10.1016/J.CHROMA.2011.07.070, https://www.sciencedirect.com/science/article/abs/pii/S0021967311011095?via%3Dihub.
Boswell PG, Schellenberg JR, Carr PW, Cohen JD, Hegeman AD. A study on retention ‘projection’ as a supplementary means for compound identification by liquid chromatography?mass spectrometry capable of predicting-retention with different gradients, flow rates, and instruments. J Chromatogr A 2011;1218(38):6732–6741. https://doi.org/10.1016/J.CHROMA.2011.07.105, https://www.sciencedirect.com/science/article/abs/pii/S0021967311011447?via%3Dihub.
Kruschke JK. Doing bayesian data analysis: A tutorial with r, jags, and stan, 2nd ed. London: Academic Press; 2014.
Hoffman MD, Gelman A. The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 2014;15(1):1593–1623.
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A. Stan: A probabilistic programming language. Journal of Statistical Software, Articles 2017;76(1):1–32. https://doi.org/10.18637/jss.v076.i01.
Stan Development Team. 2021. RStan: the R interface to Stan. https://mc-stan.org/, R package version 2.21.3.
Margossian C, Gillespie B. 2017. Differential equations based models in stan. https://mc-stan.org/events/stancon2017-notebooks/stancon2017-margossian-gillespie-ode.html.
Kubik L, Kaliszan R, Wiczling P. Analysis of Isocratic-Chromatographic-Retention Data using Bayesian Multilevel Modeling. Anal Chem 2018;90(22):13670–13679. https://doi.org/10.1021/acs.analchem.8b04033.
Neue UD, Phoebe CH, Tran K, Cheng Y-F, Lu Z. Dependence of reversed-phase retention of ionizable analytes on pH, concentration of organic solvent and silanol activity. J Chromatogr A 2001; 925(1):49–67. https://doi.org/10.1016/S0021-9673(01)01009-3.
Pappa-Louisi A, Nikitas P, Balkatzopoulou P, Malliakas C. Two- and three-parameter equations for representation of retention data in reversed-phase liquid chromatography. J Chromatogr A 2004; 1033(1):29–41. https://doi.org/10.1016/J.CHROMA.2004.01.021.
Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput 2014; 24 (6): 997–1016. https://doi.org/10.1007/s11222-013-9416-2, http://link.springer.com/10.1007/s11222-013-9416-2.
Vehtari A, Gelman A, Gabry J. Practical bayesian model evaluation using leave-one-out cross-validation and waic. Stat Comput 2017;27:1413–1432.
Hanai T. Structure---retention correlation in liquid chromatography. J Chromatogr A 1991;550:313–324. https://doi.org/10.1016/S0021-9673(01)88547-2, http://www.sciencedirect.com/science/article/pii/S0021967301885472.
Gritti F, Guiochon G. Adsorption Mechanism in RPLC. Effect of the Nature of the Organic Modifier. Anal Chem 2005;77(13):4257–4272. https://doi.org/10.1021/ac0580058.
Royal Society of Chemistry. 2021. CSID:2015292. https://www.chemspider.com/Chemical-Structure.2015292.html.
Wiczling P, Kamedulska A, Kubik L. Application of Bayesian Multilevel Modeling in the Quantitative Structure---Retention Relationship Studies of Heterogeneous Compounds. Anal Chem 2021;93(18):6961–6971. https://doi.org/10.1021/acs.analchem.0c05227.
Funding
This study was supported by (i) the project POWR.03.02.00-00-I035/16-00 co-financed by the European Union through the European Social Fund under the Operational Programme Knowledge Education Development 2014–2020 and (ii) the National Science Centre, Poland (grant 2015/18/E/ST4/00449).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kamedulska, A., Kubik, Ł. & Wiczling, P. Statistical analysis of isocratic chromatographic data using Bayesian modeling. Anal Bioanal Chem 414, 3471–3481 (2022). https://doi.org/10.1007/s00216-022-03968-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00216-022-03968-x