Skip to main content

Advertisement

Log in

Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

The pre-processing of analytical data in metabolomics must be considered as a whole to allow the construction of a global and unique object for any further simultaneous data analysis or multivariate statistical modelling. For 1D 1H-NMR metabolomics experiments, best practices for data pre-processing are well defined, but not yet for 2D experiments (for instance COSY in this paper).

Objective

By considering the added value of a second dimension, the objective is to propose two workflows dedicated to 2D NMR data handling and preparation (the Global Peak List and Vectorization approaches) and to compare them (with respect to each other and with 1D standards). This will allow to detect which methodology is the best in terms of amount of metabolomic content and to explore the advantages of the selected workflow in distinguishing among treatment groups and identifying relevant biomarkers. Therefore, this paper explores both the necessity of novel 2D pre-processing workflows, the evaluation of their quality and the evaluation of their performance in the subsequent determination of accurate (2D) biomarkers.

Methods

To select the more informative data source, MIC (Metabolomic Informative Content) indexes are used, based on clustering and inertia measures of quality. Then, to highlight biomarkers or critical spectral zones, the PLS-DA model is used, along with more advanced sparse algorithms (sPLS and L-sOPLS).

Results

Results are discussed according to two different experimental designs (one which is unsupervised and based on human urine samples, and the other which is controlled and based on spiked serum media). MIC indexes are shown, leading to the choice of the more relevant workflow to use thereafter. Finally, biomarkers are provided for each case and the predictive power of each candidate model is assessed with cross-validated measures of RMSEP.

Conclusion

In conclusion, it is shown that no solution can be universally the best in every case, but that 2D experiments allow to clearly find relevant cross peak biomarkers even with a poor initial separability between groups. The MIC measures linked with the candidate workflows (2D GPL, 2D vectorization, 1D, and with specific parameters) lead to visualize which data set must be used as a priority to more easily find biomarkers. The diversity of data sources, mainly 1D versus 2D, may often lead to complementary or confirmatory results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The metabolomics and metadata reported in this paper are available on demand from the Institute of Statistics, Biostatistics and Actuarial Sciences, UCLouvain, Belgium.

References

  • Barna, J. C., & Laue, E. D. (1987). Conventional and exponential sampling for 2D NMR experiments with application to a 2D NMR spectrum of a protein. Journal of Magnetic Resonance (1969), 75(2), 384–389.

    Article  CAS  Google Scholar 

  • Bylesjo, M., Rantalainen, M., Cloarec, O., & Nicholson, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20(8–10), 341–351.

    Article  Google Scholar 

  • Chung, D., & Chun, H. (2012). Keles S, Spls: Sparse partial least squares (SPLS) regression and classification. R package, version, 2, 1–1.

    Google Scholar 

  • Chun, H., & Keles, S. (2007). Sparse partial least squares regression with an application to genome scale transcription factor analysis. Madison: Department of Statistics, University of Wisconsin.

    Google Scholar 

  • Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78(7), 2262–2267.

    Article  CAS  Google Scholar 

  • Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Analytical Chemistry, 78(13), 4281–4290.

    Article  CAS  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.

    Article  Google Scholar 

  • Feraud, B. (2019). Statistical contributions to the analysis of 2D NMR spectra in metabolomics studies: From pre-processing workflows to 2D biomarker discovery. http://hdl.handle.net/2078.1/214124.

  • Feraud, B., Govaerts, B., Verleysen, M., & De Tullio, P. (2015). Statistical treatment of 2D NMR COSY spectra in metabolomics: Data preparation, clustering-based evaluation of the metabolomic informative content and comparison with 1H-NMR. Metabolomics, 11(6), 1756–1768.

    Article  CAS  Google Scholar 

  • Feraud, B., Munaut, C., Martin, M., Verleysen, M., & Govaerts, B. (2017). Combining strong sparsity and competitive predictive power with the L-sOPLS approach for biomarker discovery in metabolomics. Metabolomics, 13(11), 130.

    Article  Google Scholar 

  • Frydman, L., Scherf, T., & Lupulescu, A. (2002). The acquisition of multidimensional NMR spectra within a single scan. Proceedings of the National Academy of Sciences, 99(25), 15858–15862.

    Article  CAS  Google Scholar 

  • Giraudeau, P. (2014). Quantitative 2D liquid-state NMR. Magnetic Resonance in Chemistry, 52(6), 259–272.

    Article  CAS  Google Scholar 

  • Giraudeau, P., Tea, I., Remaud, G. S., & Akoka, S. (2014). Reference and normalization methods: Essential tools for the intercomparison of NMR spectra. Journal of Pharmaceutical and Biomedical Analysis, 93, 3–16.

    Article  CAS  Google Scholar 

  • Hoch, J. C., Maciejewski, M. W., Mobli, M., Schuyler, A. D., & Stern, A. S. (2014). Non-uniform sampling and maximum entropy reconstruction in multidimensional NMR. Accounts of Chemical Research, 47(2), 708–717.

    Article  CAS  Google Scholar 

  • Jezequel, T., Deborde, C., Maucourt, M., Zhendre, V., Moing, A., & Giraudeau, P. (2015). Absolute quantification of metabolites in tomato fruit extracts by fast 2D NMR. Metabolomics, 11(5), 1231–1242.

    Article  CAS  Google Scholar 

  • Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical Chemistry, 86(12), 5946–5954.

    Article  CAS  Google Scholar 

  • Le Guennec, A., Tea, I., Antheaume, I., Martineau, E., Charrier, B., Pathan, M., et al. (2012). Fast determination of absolute metabolite concentrations by spatially encoded 2D NMR: Application to breast cancer cell extracts. Analytical Chemistry, 84(24), 10831–10837.

    Article  Google Scholar 

  • Liland, K. H. (2011). Multivariate methods in metabolomics, from pre-processing to dimension reduction and statistical analysis. TrAC Trends in Analytical Chemistry, 30(6), 827–841.

    Article  CAS  Google Scholar 

  • MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, University of California Press, pp. 281–297.

  • Marchand, J., Martineau, E., Guitton, Y., Dervilly-Pinel, G., & Giraudeau, P. (2017). Multidimensional NMR approaches towards highly resolved, sensitive and high-throughput quantitative metabolomics. Current Opinion in Biotechnology, 43, 49–55.

    Article  CAS  Google Scholar 

  • Marchand, J., Martineau, E., Guitton, Y., Le Bizec, B., Dervilly-Pinel, G., & Giraudeau, P. (2018). A multidimensional 1H-NMR lipidomics workflow to address chemical food safety issues. Metabolomics, 14(5), 60.

    Article  Google Scholar 

  • Marjanska, M., Henry, P. G., Ugurbil, K., & Gruetter, R. (2008). Editing through multiple bonds: Threonine detection. Magnetic Resonance in Medicine, 59(2), 245–251.

    Article  CAS  Google Scholar 

  • Martin, M., Legat, B., Leenders, J., Vanwinsberghe, J., Rousseau, R., et al. (2017). PepsNMR for the 1H-NMR metabolomic data pre-processing. ISBA Discussion Paper, 2017/22, http://hdl.handle.net/2078.1/187159.

  • Martineau, E., Tea, I., Akoka, S., & Giraudeau, P. (2012). Absolute quantification of metabolites in breast cancer cell extracts by quantitative 2D 1H INADEQUATE NMR. NMR in Biomedicine, 25(8), 985–992.

    Article  CAS  Google Scholar 

  • Murtagh, F., & Legendre, P. (2011). Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285.

  • Ravanbakhsh, S., Liu, P., Bjorndahl, T., Mandal, R., Grant, J. R., Wilson, M., & Greiner, R. (2014). Accurate, fully-automated NMR spectral profiling for metabolomics. arXiv:1409.1456.

  • Rist, M. J., Roth, A., Frommherz, L., Weinert, C. H., Kruger, R., Merz, B., et al. (2017). Metabolite patterns predicting sex and age in participants of the Karlsruhe Metabolomics and Nutrition (KarMeN) study. PLoS ONE, 12(8), e0183228.

    Article  Google Scholar 

  • Rouger, L., Gouilleux, B., & Giraudeau, P. (2017). Fast n-dimensional data acquisition methods. Encyclopedia of spectroscopy and spectrometry (pp. 588–596). Oxford: Academic Press.

    Book  Google Scholar 

  • Rousseau, R. (2011). Statistical contribution to the analysis of metabonomic data in 1H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532.

  • Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra: Three case studies. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.

    Article  CAS  Google Scholar 

  • Thevenot, E. A., Roux, A., Xu, Y., Ezan, E., & Junot, C. (2015). Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research, 14(8), 3322–3335.

    Article  CAS  Google Scholar 

  • Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16(3), 119–128.

    Article  CAS  Google Scholar 

  • Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.

    Article  Google Scholar 

  • Wold, S., Sjostrom, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130.

    Article  CAS  Google Scholar 

  • Wold, S., Trygg, J., Berglund, A., & Antti, H. (2001). Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems, 58(2), 131–150.

    Article  CAS  Google Scholar 

  • Wu, Y., & Liang, L. (2016). Sample normalization methods in quantitative metabolomics. Journal of Chromatography A, 1430, 80–95. ISSN 0021-9673.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is gratefully acknowledged. Support from the CORSAIRE metabolomics platform (Biogenouest network) is also acknowledged. Pascal de Tullio is Research Director of the Fonds de la Recherche Scientifique (FNRS).

Author information

Authors and Affiliations

Authors

Contributions

BF, BG, PG and PT conceived and designed research. EM, JL and PT collected and supplied the data. BF analyzed data and wrote the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Baptiste Féraud.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This study analyzes collected data which involved human participants. The studies were approved by our local Ethics Committee (CHR Citadelle, Liège, Number B412201215082-1267) and all subjects gave their informed consent.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Softwares availability statement

The raw data were processed with the Bruker Topspin 3.5 software. Peak lists were extracted using ACD/Labs 12.00 (ACD/NMR processor). The R software (http://www.R-project.org) environment was exclusively used for statistical purpose, via existing packages (pls, spls, ropls), or coded ad hoc (PepsNMR package; MIC indexes, L-sOPLS, functions which are available here: https://github.com/ManonMartin/MBXUCL.).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Féraud, B., Leenders, J., Martineau, E. et al. Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics. Metabolomics 15, 63 (2019). https://doi.org/10.1007/s11306-019-1524-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-019-1524-3

Keywords

Navigation