Skip to main content

Advertisement

Log in

Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation

  • Research Paper
  • Published:
Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Abstract

Recent progress in metabolomics has been aided by the development of analysis techniques such as gas and liquid chromatography coupled with mass spectrometry (GC-MS and LC-MS) and nuclear magnetic resonance (NMR) spectroscopy. The vast quantities of data produced by these techniques has resulted in an increase in the use of machine algorithms that can aid in the interpretation of this data, such as principal components analysis (PCA) and partial least squares (PLS). Techniques such as these can be applied to biomarker discovery, interlaboratory comparison, and clinical diagnoses. However, there is a lingering question whether the results of these studies can be applied to broader sets of clinical data, usually taken from different data sources. In this work, we address this question by creating a metabolomics workflow that combines a previously published consensus analysis procedure (https://doi.org/10.1016/j.chemolab.2016.12.010) with PCA and PLS models using uncertainty analysis based on bootstrapping. This workflow is applied to NMR data that come from an interlaboratory comparison study using synthetic and biologically obtained metabolite mixtures. The consensus analysis identifies trusted laboratories, whose data are used to create classification models that are more reliable than without. With uncertainty analysis, the reliability of the classification can be rigorously quantified, both for data from the original set and from new data that the model is analyzing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Nicholson JK, Wilson ID. Understanding 'Global' systems biology: metabonomics and the continuum of metabolism. Nat Rev Drug Discov. 2003;2(8):668–76.

    Article  CAS  PubMed  Google Scholar 

  2. Lu X, Zhao X, Bai C, Zhao C, Lu G, Xu G. LC–MS-based metabonomics analysis. J Chromatogr B. 2008;866(1–2):64–76.

    Article  CAS  Google Scholar 

  3. Willenberg I, Ostermann AI, Schebb NH. Targeted metabolomics of the arachidonic acid cascade: current state and challenges of LC–MS analysis of oxylipins. Anal Bioanal Chem. 2015;407(10):2675–83.

    Article  CAS  PubMed  Google Scholar 

  4. Karaman İ, Nørskov NP, Yde CC, Hedemann MS, Bach Knudsen KE, Kohler A. Sparse multi-block PLSR for biomarker discovery when integrating data from LC–MS and NMR metabolomics. Metabolomics. 2015;11(2):367–79.

    Article  CAS  Google Scholar 

  5. Hsu C-C, ElNaggar MS, Peng Y, Fang J, Sanchez LM, Mascuch SJ, et al. Real-time metabolomics on living microorganisms using ambient electrospray ionization flow-probe. Anal Chem. 2013;85(15):7014–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rath CM, Yang JY, Alexandrov T, Dorrestein PC. Data-independent microbial metabolomics with ambient ionization mass spectrometry. J Am Soc Mass Spectrom. 2013;24(8):1167–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Weston DJ. Ambient ionization mass spectrometry: current understanding of mechanistic theory; analytical performance and application areas. Analyst. 2010;135(4):661–8.

    Article  CAS  PubMed  Google Scholar 

  8. Evans AM, DeHaven CD, Barrett T, Mitchell M, Milgram E. Integrated, nontargeted ultrahigh performance liquid chromatography/electrospray ionization tandem mass spectrometry platform for the identification and relative quantification of the small-molecule complement of biological systems. Anal Chem. 2009;81(16):6656–67.

    Article  CAS  PubMed  Google Scholar 

  9. Ehrhardt C, Arapitsas P, Stefanini M, Flick G, Mattivi F. Analysis of the phenolic composition of fungus-resistant grape varieties cultivated in Italy and Germany using UHPLC-MS/MS. J Mass Spectrom. 2014;49(9):860–9.

    Article  CAS  PubMed  Google Scholar 

  10. Rodriguez-Aller M, Gurny R, Veuthey J-L, Guillarme D. Coupling ultra high-pressure liquid chromatography with mass spectrometry: constraints and possible applications. J Chromatogr A. 2013;1292:2–18.

    Article  CAS  PubMed  Google Scholar 

  11. Wishart DS. Quantitative metabolomics using NMR. TrAC Trends Anal Chem. 2008;27(3):228–37.

    Article  CAS  Google Scholar 

  12. Viant MR, Lyeth BG, Miller MG, Berman RF. An NMR metabolomic investigation of early metabolic disturbances following traumatic brain injury in a mammalian model. NMR Biomed. 2005;18(8):507–16.

    Article  CAS  PubMed  Google Scholar 

  13. Arana VA, Medina J, Alarcon R, Moreno E, Heintz L, Schäfer H, et al. Coffee’s country of origin determined by NMR: the Colombian case. Food Chem. 2015;175:500–6.

    Article  CAS  PubMed  Google Scholar 

  14. Noothalapati H, Shigeto S. Exploring metabolic pathways in vivo by a combined approach of mixed stable isotope-labeled Raman microspectroscopy and multivariate curve resolution analysis. Anal Chem. 2014;86(15):7828–34.

    Article  CAS  PubMed  Google Scholar 

  15. Hosokawa M, Ando M, Mukai S, Osada K, Yoshino T. Hamaguchi H-o, et al. in vivo live cell imaging for the quantitative monitoring of lipids by using Raman microspectroscopy. Anal Chem. 2014;86(16):8224–30.

    Article  CAS  PubMed  Google Scholar 

  16. Gilany K, Moazeni-Pourasil RS, Jafarzadeh N, Savadi-Shiraz E. Metabolomics fingerprinting of the human seminal plasma of asthenozoospermic patients. Mol Reprod Dev. 2014;81(1):84–6.

    Article  CAS  PubMed  Google Scholar 

  17. Dettmer K, Aronov PA, Hammock BD. Mass spectrometry-based metabolomics. Mass Spectrom Rev. 2007;26(1):51–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fonville JM, Richards SE, Barton RH, Boulange CL, Ebbels TMD, Nicholson JK, et al. The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping. J Chemom. 2010;24(11–12):636–49.

    Article  CAS  Google Scholar 

  19. Gromski PS, Xu Y, Correa E, Ellis DI, Turner ML, Goodacre R. A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data. Anal Chim Acta. 2014;829:1–8.

    Article  CAS  PubMed  Google Scholar 

  20. Ouyang M, Zhang Z, Chen C, Liu X, Liang Y. Application of sparse linear discriminant analysis for metabolomics data. Anal Methods. 2014;6(22):9037–44.

    Article  CAS  Google Scholar 

  21. Wu X, Zhao L, Peng H, She Y, Feng Y. Search for potential biomarkers by UPLC/Q-TOF–MS analysis of dynamic changes of glycerophospholipid constituents of RAW264.7 cells treated with NSAID. Chromatographia. 2015;78(3):211–20.

    Article  CAS  Google Scholar 

  22. Li Y-Q, Liu Y-F, Song D-D, Zhou Y-P, Wang L, Xu S, et al. Particle swarm optimization-based protocol for partial least-squares discriminant analysis: application to 1H nuclear magnetic resonance analysis of lung cancer metabonomics. Chemom Intell Lab Syst. 2014;135:192–200.

    Article  CAS  Google Scholar 

  23. Uarrota VG, Moresco R, Coelho B, Nunes EDC, Peruch LAM, Neubert EDO, et al. Metabolomics combined with chemometric tools (PCA, HCA, PLS-DA and SVM) for screening cassava (Manihot esculenta Crantz) roots during postharvest physiological deterioration. Food Chem. 2014;161:67–78.

    Article  CAS  PubMed  Google Scholar 

  24. Heinemann J, Mazurie A, Tokmina-Lukaszewska M, Beilman GJ, Bothner B. Application of support vector machines to metabolomics experiments with limited replicates. Metabolomics. 2014;10(6):0.

    Article  CAS  Google Scholar 

  25. Wang X, Zhang M, Ma J, Zhang Y, Hong G, Sun F, et al. Metabolic changes in Paraquat poisoned patients and support vector machine model of discrimination. Biol Pharm Bull. 2015;38(3):470–5.

    Article  CAS  PubMed  Google Scholar 

  26. Tsugawa H, Tsujimoto Y, Arita M, Bamba T, Fukusaki E. GC/MS based metabolomics: development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA). BMC Bioinformatics. 2011;12(1):131.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Dunn WB, Broadhurst DI, Edison A, Guillou C, Viant MR, Bearden DW, et al. Quality assurance and quality control processes: summary of a metabolomics community questionnaire. Metabolomics. 2017;13(5):50.

    Article  CAS  Google Scholar 

  28. Sheen DA, Rocha WFC, Lippa KA, Bearden DW. A scoring metric for multivariate data for reproducibility analysis using chemometric methods. Chemom Intell Lab Syst. 2017;162:10–20.

    Article  CAS  Google Scholar 

  29. Almeida MR, Fidelis CHV, Barata LES, Poppi RJ. Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation. Talanta. 2013;117:305–11.

    Article  CAS  PubMed  Google Scholar 

  30. de Almeida MR, Correa DN, Rocha WFC, Scafi FJO, Poppi RJ. Discrimination between authentic and counterfeit banknotes using Raman spectroscopy and PLS-DA with uncertainty estimation. Microchem J. 2013;109:170–7.

    Article  CAS  Google Scholar 

  31. Rocha WFC, Sheen DA. Classification of biodegradable materials using QSAR modelling with uncertainty estimation. SAR QSAR Environ Res. 2016 1–13.

  32. Gallo V, Intini N, Mastrorilli P, Latronico M, Scapicchio P, Triggiani M, et al. Performance assessment in fingerprinting and multi component quantitative NMR analyses. Anal Chem. 2015;87(13):6709–17.

    Article  CAS  PubMed  Google Scholar 

  33. Bich W. Error, uncertainty and probability. In: Bava E, Kuhne M, Rossi AM, editors. Metrology and Physical Constants. 1852013. p. 47–73.

  34. Faber K, Kowalski BR. Prediction error in least squares regression: further critique on the deviation used in the Unscrambler. Chemom Intell Lab Syst. 1996;34(2):283–92.

    Article  CAS  Google Scholar 

  35. Faber NM, Song XH, Hopke PK. Sample-specific standard error of prediction for partial least squares regression. TrAC Trends Anal Chem. 2003;22(5):330–4.

    Article  CAS  Google Scholar 

  36. Fernández Pierna JA, Jin L, Wahl F, Faber NM, Massart DL. Estimation of partial least squares regression prediction uncertainty when the reference values carry a sizeable measurement error. Chemom Intell Lab Syst. 2003;65(2):281–91.

    Article  Google Scholar 

  37. Datta J, Ghosh JK. Bootstrap—an exploration. Stat Methodol. 2014;20:63–72.

    Article  Google Scholar 

  38. Kreiss J-P, Paparoditis E. Bootstrap methods for dependent data: a review. J Korean Stat Soc. 2011;40(4):357–78.

    Article  Google Scholar 

  39. Wehrens R, Putter H, Buydens LMC. The bootstrap: a tutorial. Chemom Intell Lab Syst. 2000;54(1):35–52.

    Article  CAS  Google Scholar 

  40. Harrington PB, Laurent C, Levinson DF, Levitt P, Markey SP. Bootstrap classification and point-based feature selection from age-staged mouse cerebellum tissues of matrix assisted laser desorption/ionization mass spectra using a fuzzy rule-building expert system. Anal Chim Acta. 2007;599(2):219–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Kijewski T, Kareem A. On the reliability of a class of system identification techniques: insights from bootstrap theory. Struct Saf. 2002;24(2–4):261–80.

    Article  Google Scholar 

  42. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993.

    Book  Google Scholar 

  43. Hjorth JSU. Computer intensive statistical methods: validation, model selection, and bootstrap. New York: Chapman and Hall; 1993.

    Google Scholar 

  44. Olivieri AC, Faber NM, Ferré J, Boqué R, Kalivas JH, Mark H. Uncertainty estimation and figures of merit for multivariate calibration. Pure Appl Chem. 2006;78(3):633–61.

    Article  CAS  Google Scholar 

  45. Faber K, Kowalski BR. Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares. J Chemom. 1997;11(3):181–238.

    Article  CAS  Google Scholar 

  46. Martens H, Martens M. Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food Qual Prefer. 2000;11(1–2):5–16.

    Article  Google Scholar 

  47. Wentzell PD. The errors of my ways: maximum likelihood PCA seventeen years after bruce. 40 years of chemometrics—From Bruce Kowalski to the Future. ACS Sym Ser. 1199: American Chemical Society; 2015. p. 31–64.

  48. Karakach TK, Wentzell PD, Walter JA. Characterization of the measurement error structure in 1D 1H NMR data for metabolomics studies. Anal Chim Acta. 2009;636(2):163–74.

    Article  CAS  PubMed  Google Scholar 

  49. Duewer DL, Kowalski BR, Fasching JL. Improving the reliability of factor analysis of chemical data by utilizing the measured analytical uncertainty. Anal Chem. 1976;48(13):2002–10.

    Article  CAS  Google Scholar 

  50. Babamoradi H, van den Berg F, Rinnan Å. Bootstrap based confidence limits in principal component analysis—a case study. Chemom Intell Lab Syst. 2013;120:97–105.

    Article  CAS  Google Scholar 

  51. Babamoradi H, van den Berg F, Rinnan Å. Comparison of bootstrap and asymptotic confidence limits for control charts in batch MSPC strategies. Chemom Intell Lab Syst. 2013;127:102–11.

    Article  CAS  Google Scholar 

  52. Preisner O, Lopes JA, Menezes JC. Uncertainty assessment in FT-IR spectroscopy based bacteria classification models. Chemom Intell Lab Syst. 2008;94(1):33–42.

    Article  CAS  Google Scholar 

  53. Conlin AK, Martin EB, Morris AJ. Confidence limits for contribution plots. J Chemom. 2000;14(5–6):725–36.

    Article  CAS  Google Scholar 

  54. Pérez NF, Ferré J, Boqué R. Calculation of the reliability of classification in discriminant partial least-squares binary classification. Chemom Intell Lab Syst. 2009;95(2):122–8.

    Article  CAS  Google Scholar 

  55. Pérez NF, Ferré J, Boqué R. Multi-class classification with probabilistic discriminant partial least squares (p-DPLS). Anal Chim Acta. 2010;664(1):27–33.

    Article  CAS  PubMed  Google Scholar 

  56. Botella C, Ferré J, Boqué R. Classification from microarray data using probabilistic discriminant partial least squares with reject option. Talanta. 2009;80(1):321–8.

    Article  CAS  PubMed  Google Scholar 

  57. Appel IJ, Gronwald W, Spang R. Estimating classification probabilities in high-dimensional diagnostic studies. Bioinformatics. 2011;27(18):2563–70.

    CAS  PubMed  Google Scholar 

  58. Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst. 2001;58(2):109–30.

    Article  CAS  Google Scholar 

  59. Lin J. Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory. 1991;37(1):145–51.

    Article  Google Scholar 

  60. Harrington PDB. Multiple versus single set validation of multivariate models to avoid mistakes. Crit Rev Anal Chem. 2018;48(1):33–46.

    Article  CAS  PubMed  Google Scholar 

  61. Thompson M, Ellison SLR. Dark uncertainty. Accred Qual Assur. 2011;16(10):483–7.

    Article  Google Scholar 

  62. Wan C, de Harrington PB. Screening GC-MS data for carbamate pesticides with temperature-constrained–cascade correlation neural networks. Anal Chim Acta. 2000;408(1):1–12.

    Article  CAS  Google Scholar 

  63. Cardoso Galhardo CE, Rocha WFC. Exploratory analysis of biodiesel/diesel blends by Kohonen neural networks and infrared spectroscopy. Anal Methods. 2015;7(8):3512–20.

    Article  Google Scholar 

  64. van der Voet H. Pseudo-degrees of freedom for complex predictive models: the example of partial least squares. J Chemom. 1999;13(3–4):195–208.

    Article  Google Scholar 

  65. Davison AC, Hinkley DV. Bootstrap methods and their application. Cambridge: Cambridge University Press; 1997.

    Book  Google Scholar 

  66. Viant MR, Bearden DW, Bundy JG, Burton IW, Collette TW, Ekman DR, et al. International NMR-based environmental metabolomics Intercomparison exercise. Environ Sci Technol. 2009;43(1):219–25.

    Article  CAS  PubMed  Google Scholar 

  67. Engel MA. Multiple objective resource allocation in product and process development. Cambridge: Massachusetts Institute of Technology; 1999.

    Google Scholar 

  68. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  69. Massart DL, Vandeginste BGM. Handbook of chemometrics and qualimetrics: Elsevier; 1998.

Download references

Acknowledgements

This work was partially supported by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (National Council for Scientific and Technological Development) of Brazil [grant number REF.203264/2014-26]. The authors also thank Dr. Katrice Lippa, Dr. David Duewer, and Dr. Pamela Chu at NIST for productive discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Sheen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Disclaimer

Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology or the National Institute of Metrology, Quality and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

Electronic supplementary material

ESM 1

(PDF 732 kb)

ESM 2

(PY 58 kb)

ESM 3

(PY 85 kb)

ESM 4

(PY 40 kb)

ESM 5

(TEST 2559 kb)

ESM 6

(TEST 2004 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rocha, W.F.d.C., Sheen, D.A. & Bearden, D.W. Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation. Anal Bioanal Chem 410, 6305–6319 (2018). https://doi.org/10.1007/s00216-018-1240-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00216-018-1240-2

Keywords

Navigation