Abstract
Introduction
The pre-processing of analytical data in metabolomics must be considered as a whole to allow the construction of a global and unique object for any further simultaneous data analysis or multivariate statistical modelling. For 1D 1H-NMR metabolomics experiments, best practices for data pre-processing are well defined, but not yet for 2D experiments (for instance COSY in this paper).
Objective
By considering the added value of a second dimension, the objective is to propose two workflows dedicated to 2D NMR data handling and preparation (the Global Peak List and Vectorization approaches) and to compare them (with respect to each other and with 1D standards). This will allow to detect which methodology is the best in terms of amount of metabolomic content and to explore the advantages of the selected workflow in distinguishing among treatment groups and identifying relevant biomarkers. Therefore, this paper explores both the necessity of novel 2D pre-processing workflows, the evaluation of their quality and the evaluation of their performance in the subsequent determination of accurate (2D) biomarkers.
Methods
To select the more informative data source, MIC (Metabolomic Informative Content) indexes are used, based on clustering and inertia measures of quality. Then, to highlight biomarkers or critical spectral zones, the PLS-DA model is used, along with more advanced sparse algorithms (sPLS and L-sOPLS).
Results
Results are discussed according to two different experimental designs (one which is unsupervised and based on human urine samples, and the other which is controlled and based on spiked serum media). MIC indexes are shown, leading to the choice of the more relevant workflow to use thereafter. Finally, biomarkers are provided for each case and the predictive power of each candidate model is assessed with cross-validated measures of RMSEP.
Conclusion
In conclusion, it is shown that no solution can be universally the best in every case, but that 2D experiments allow to clearly find relevant cross peak biomarkers even with a poor initial separability between groups. The MIC measures linked with the candidate workflows (2D GPL, 2D vectorization, 1D, and with specific parameters) lead to visualize which data set must be used as a priority to more easily find biomarkers. The diversity of data sources, mainly 1D versus 2D, may often lead to complementary or confirmatory results.
Similar content being viewed by others
Data availability
The metabolomics and metadata reported in this paper are available on demand from the Institute of Statistics, Biostatistics and Actuarial Sciences, UCLouvain, Belgium.
References
Barna, J. C., & Laue, E. D. (1987). Conventional and exponential sampling for 2D NMR experiments with application to a 2D NMR spectrum of a protein. Journal of Magnetic Resonance (1969), 75(2), 384–389.
Bylesjo, M., Rantalainen, M., Cloarec, O., & Nicholson, J. (2006). OPLS discriminant analysis: Combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics, 20(8–10), 341–351.
Chung, D., & Chun, H. (2012). Keles S, Spls: Sparse partial least squares (SPLS) regression and classification. R package, version, 2, 1–1.
Chun, H., & Keles, S. (2007). Sparse partial least squares regression with an application to genome scale transcription factor analysis. Madison: Department of Statistics, University of Wisconsin.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78(7), 2262–2267.
Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Analytical Chemistry, 78(13), 4281–4290.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Feraud, B. (2019). Statistical contributions to the analysis of 2D NMR spectra in metabolomics studies: From pre-processing workflows to 2D biomarker discovery. http://hdl.handle.net/2078.1/214124.
Feraud, B., Govaerts, B., Verleysen, M., & De Tullio, P. (2015). Statistical treatment of 2D NMR COSY spectra in metabolomics: Data preparation, clustering-based evaluation of the metabolomic informative content and comparison with 1H-NMR. Metabolomics, 11(6), 1756–1768.
Feraud, B., Munaut, C., Martin, M., Verleysen, M., & Govaerts, B. (2017). Combining strong sparsity and competitive predictive power with the L-sOPLS approach for biomarker discovery in metabolomics. Metabolomics, 13(11), 130.
Frydman, L., Scherf, T., & Lupulescu, A. (2002). The acquisition of multidimensional NMR spectra within a single scan. Proceedings of the National Academy of Sciences, 99(25), 15858–15862.
Giraudeau, P. (2014). Quantitative 2D liquid-state NMR. Magnetic Resonance in Chemistry, 52(6), 259–272.
Giraudeau, P., Tea, I., Remaud, G. S., & Akoka, S. (2014). Reference and normalization methods: Essential tools for the intercomparison of NMR spectra. Journal of Pharmaceutical and Biomedical Analysis, 93, 3–16.
Hoch, J. C., Maciejewski, M. W., Mobli, M., Schuyler, A. D., & Stern, A. S. (2014). Non-uniform sampling and maximum entropy reconstruction in multidimensional NMR. Accounts of Chemical Research, 47(2), 708–717.
Jezequel, T., Deborde, C., Maucourt, M., Zhendre, V., Moing, A., & Giraudeau, P. (2015). Absolute quantification of metabolites in tomato fruit extracts by fast 2D NMR. Metabolomics, 11(5), 1231–1242.
Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical Chemistry, 86(12), 5946–5954.
Le Guennec, A., Tea, I., Antheaume, I., Martineau, E., Charrier, B., Pathan, M., et al. (2012). Fast determination of absolute metabolite concentrations by spatially encoded 2D NMR: Application to breast cancer cell extracts. Analytical Chemistry, 84(24), 10831–10837.
Liland, K. H. (2011). Multivariate methods in metabolomics, from pre-processing to dimension reduction and statistical analysis. TrAC Trends in Analytical Chemistry, 30(6), 827–841.
MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, University of California Press, pp. 281–297.
Marchand, J., Martineau, E., Guitton, Y., Dervilly-Pinel, G., & Giraudeau, P. (2017). Multidimensional NMR approaches towards highly resolved, sensitive and high-throughput quantitative metabolomics. Current Opinion in Biotechnology, 43, 49–55.
Marchand, J., Martineau, E., Guitton, Y., Le Bizec, B., Dervilly-Pinel, G., & Giraudeau, P. (2018). A multidimensional 1H-NMR lipidomics workflow to address chemical food safety issues. Metabolomics, 14(5), 60.
Marjanska, M., Henry, P. G., Ugurbil, K., & Gruetter, R. (2008). Editing through multiple bonds: Threonine detection. Magnetic Resonance in Medicine, 59(2), 245–251.
Martin, M., Legat, B., Leenders, J., Vanwinsberghe, J., Rousseau, R., et al. (2017). PepsNMR for the 1H-NMR metabolomic data pre-processing. ISBA Discussion Paper, 2017/22, http://hdl.handle.net/2078.1/187159.
Martineau, E., Tea, I., Akoka, S., & Giraudeau, P. (2012). Absolute quantification of metabolites in breast cancer cell extracts by quantitative 2D 1H INADEQUATE NMR. NMR in Biomedicine, 25(8), 985–992.
Murtagh, F., & Legendre, P. (2011). Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285.
Ravanbakhsh, S., Liu, P., Bjorndahl, T., Mandal, R., Grant, J. R., Wilson, M., & Greiner, R. (2014). Accurate, fully-automated NMR spectral profiling for metabolomics. arXiv:1409.1456.
Rist, M. J., Roth, A., Frommherz, L., Weinert, C. H., Kruger, R., Merz, B., et al. (2017). Metabolite patterns predicting sex and age in participants of the Karlsruhe Metabolomics and Nutrition (KarMeN) study. PLoS ONE, 12(8), e0183228.
Rouger, L., Gouilleux, B., & Giraudeau, P. (2017). Fast n-dimensional data acquisition methods. Encyclopedia of spectroscopy and spectrometry (pp. 588–596). Oxford: Academic Press.
Rousseau, R. (2011). Statistical contribution to the analysis of metabonomic data in 1H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532.
Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra: Three case studies. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.
Thevenot, E. A., Roux, A., Xu, Y., Ezan, E., & Junot, C. (2015). Analysis of the human adult urinary metabolome variations with age, body mass index, and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. Journal of Proteome Research, 14(8), 3322–3335.
Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16(3), 119–128.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Wold, S., Sjostrom, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109–130.
Wold, S., Trygg, J., Berglund, A., & Antti, H. (2001). Some recent developments in PLS modeling. Chemometrics and Intelligent Laboratory Systems, 58(2), 131–150.
Wu, Y., & Liang, L. (2016). Sample normalization methods in quantitative metabolomics. Journal of Chromatography A, 1430, 80–95. ISSN 0021-9673.
Acknowledgements
Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is gratefully acknowledged. Support from the CORSAIRE metabolomics platform (Biogenouest network) is also acknowledged. Pascal de Tullio is Research Director of the Fonds de la Recherche Scientifique (FNRS).
Author information
Authors and Affiliations
Contributions
BF, BG, PG and PT conceived and designed research. EM, JL and PT collected and supplied the data. BF analyzed data and wrote the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This study analyzes collected data which involved human participants. The studies were approved by our local Ethics Committee (CHR Citadelle, Liège, Number B412201215082-1267) and all subjects gave their informed consent.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Softwares availability statement
The raw data were processed with the Bruker Topspin 3.5 software. Peak lists were extracted using ACD/Labs 12.00 (ACD/NMR processor). The R software (http://www.R-project.org) environment was exclusively used for statistical purpose, via existing packages (pls, spls, ropls), or coded ad hoc (PepsNMR package; MIC indexes, L-sOPLS, functions which are available here: https://github.com/ManonMartin/MBXUCL.).
Rights and permissions
About this article
Cite this article
Féraud, B., Leenders, J., Martineau, E. et al. Two data pre-processing workflows to facilitate the discovery of biomarkers by 2D NMR metabolomics. Metabolomics 15, 63 (2019). https://doi.org/10.1007/s11306-019-1524-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11306-019-1524-3