Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR

Féraud, Baptiste; Govaerts, Bernadette; Verleysen, Michel; de Tullio, Pascal

doi:10.1007/s11306-015-0830-7

Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with ¹H-NMR

Original Article
Published: 23 July 2015

Volume 11, pages 1756–1768, (2015)
Cite this article

Metabolomics Aims and scope Submit manuscript

Baptiste Féraud^1,2,
Bernadette Govaerts¹,
Michel Verleysen^2,3 &
…
Pascal de Tullio⁴

1016 Accesses
32 Citations
Explore all metrics

Abstract

Compared with the widely used ¹H-NMR spectroscopy, two-dimensional NMR experiments provide more sophisticated spectra which should facilitate the identification of relevant spectral zones or biomarkers in metabolomics. This paper focuses on ¹H-¹H COrrelation SpectroscopY (COSY) spectral data. In spite of longer inherent acquisition times, it is commonly accepted by users (biologists, healthcare professionals) that the introduction of an additional dimension probably represents a huge qualitative step for investigations in terms of metabolites identification. Moreover, it seems natural that more information leads to more predictive power. But, until now, very few statistical studies clearly proved this assumption. Therefore a fundamental question is “Is this supplementary information relevant?”. In order to extend the statistical properties developed for 1D spectroscopy to the challenges raised by 2D spectra, a rigorous study of the performances of COSY spectra is needed as a prerequisite. Having introduced new pre-processing concepts, such as the Global Peak List or an ad hoc 2D “bucketing”, this paper presents an innovative methodology based on multivariate clustering algorithms to evaluate this question. Numerical clustering quality indexes and graphical results are proposed, based both on the spectral presence or absence of peaks (binary position vectors) and on peak intensities, and through different levels of spectral resolution. The second goal of this paper is to compare clustering performances obtained on COSY and on ¹H-NMR spectra, with the aim of understanding to what extent the COSY spectra carry more Metabolomic Informative Content about the signal than 1D ones. The methodology is applied to two real experimental designs involving different groups of spectra (which define the signal): a 4-mixture cell culture media containing various supervised metabolites and a complex human serum based design. It is shown that COSY spectra appear to be statistically powerful and, in addition, provide better clustering results than corresponding ¹H-NMR when using unlabeled information. Consequently, additional information appears to be relevant for metabolomics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Combining rapid 2D NMR experiments with novel pre-processing workflows and MIC quality measures for metabolomics

Article 18 March 2020

Baptiste Féraud, Estelle Martineau, … Patrick Giraudeau

Metabolomics Data Treatment: Basic Directions of the Full Process

Identifying unknown metabolites using NMR-based metabolic profiling techniques

Article 17 July 2020

Isabel Garcia-Perez, Joram M. Posma, … Jeremy K. Nicholson

References

Akitt J.W., Mann B.E. (2000). NMR and Chemistry (Manual), Cheltenham UK, Stanley Thornes. p. 287.
Aue, W. P., Bartholdi, E., & Ernst, R. R. (1976). Two-dimensional spectroscopy. Application to nuclear magnetic resonance. The Journal of Chemical Physics, 64, 2229–2246.
Article CAS Google Scholar
Bruschweiler, R., & Bingol, K. (2011). Deconvolution of chemical mixtures with high complexity by NMR consensus trace clustering. Analytical Chemistry, 83(19), 7412–7417.
Article PubMed PubMed Central Google Scholar
Bruschweiler, R., Bingol, K., Bruschweiler-Li, L., & Li, D.-W. (2014). Customized metabolomics database for the analysis of NMR¹H-¹H TOCSY and ¹³C-¹H HSQC-TOCSY Spectra of Complex Mixtures. Analytical Chemistry, 86(11), 5494–5501.
Article PubMed PubMed Central Google Scholar
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Article CAS PubMed Google Scholar
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–1(2), 224–227.
Article Google Scholar
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.
Article Google Scholar
Giraudeau, P., Remaud, G., & Akoka, S. (2009). Evaluation of Ultrafast 2D NMR for quantitative analysis. Analytical Chemistry, 81(1), 479–484.
Article CAS PubMed Google Scholar
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
Article Google Scholar
Holliday, J. D., Hu, C. Y., & Willett, P. (2002). Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High Throughput Screening, 5(2), 155–166.
Article CAS PubMed Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article Google Scholar
Iman, R. L. (2008). Latin hypercube sampling. New York: Wiley.
Book Google Scholar
Keeler, J. (2010). Understanding NMR Spectroscopy (2nd ed., pp. 190–191). New York: Wiley.
Google Scholar
Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical chemistry, 86(12), 5946–5954.
Article CAS PubMed Google Scholar
Lloyd S. P., Least squares quantization in PCM, Technical Note, Bell Laboratories, IEEE Transactions on Information Theory 28, pp. 128-137 (1957, 1982).
MacKay, D. (2003). An Example Inference Task: Clustering, Information Theory, Inference and Learning Algorithms (pp. 284–292). Cambridge: Cambridge University Press.
Google Scholar
MacQueen J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, (vol 1), University of California Press, (pp. 281-297) .
Mao, X., & Ye, C. (1997). Phase-shift presaturation for water peak suppression in biomolecular NMR experiments, Science in China. Series C, Life sciences, 40(4), 345–350.
Article CAS Google Scholar
Marion, D., & Bax, A. (1988). Baseline distortion in real-fourier-transform NMR spectra. Journal of Magnetic Resonance (1969), 79(2), 252–356.
Article Google Scholar
Murtagh F., Legendre P., Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285 (2011).
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86–97.
Google Scholar
Nicholson, J., Connelly, J., Lindon, J. C., & Holmes, E. (2002). Metabonomics: a generic platform for the study of drug toxicity and gene function. Nature Reviews Drug Discovery, 1, 153–161.
Article CAS PubMed Google Scholar
Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis, 52(1), 596–613.
Article Google Scholar
Queiroz Junior, L. H. K., Ferreira, A. G., & Giraudeau, P. (2013). Optimization and practical implementation of ultrafast 2D NMR experiments. Quimica Nova, 36(4), 577–581.
Article Google Scholar
Rasmussen, L. G., Savorani, F., Larsen, T. M., Dragsted, L. O., Astrup, A., & Engelsen, S. B. (2011). Standardization of factors that influence human urine metabolomics. Metabolomics, 7(1), 71–83.
Article CAS Google Scholar
Rousseau R., Statistical contribution to the analysis of metabonomic data in ¹H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532 (2011).
Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification, Artificial Neural Networks, ICANN 2009. Berlin: Springer.
Google Scholar
Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.
Article CAS Google Scholar
Vanwinsberghe J., Bubble: development of a matlab tool for automated ¹H-NMR data processing in metabonomics, Master’s thesis, Université de Strasbourg (2005).
Vega-Vazquez, M., Cobas, J. C., & Martin-Pastor, M. (2010). Fast multidimensional localized parallel NMR spectroscopy for the analysis of samples. Magnetic Resonance in Chemistry, 48(10), 749–752.
Article CAS PubMed Google Scholar
Ward, J. H. (1963). Hierarchical Grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Article Google Scholar
Xi, Y., deRopp, J. S., Viant, M., Woodruff, D., & Yu, P. (2007). Automated screening for metabolites in complex mixtures using 2D COSY NMR spectroscopy. Metabolomics, 2(4), 221–233.
Article Google Scholar
Xia, J., & Wishart, D. (2010). MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics, 26(18), 2342–2344.
Article CAS PubMed Google Scholar
Yun, K., Sunghyouk, P., Jongheon, S., & Dong-Chan, O. (2013). Application of ¹³C-labeling and ¹³C-¹³C COSY NMR experiments in the structure determination of a microbial natural product. Archive of Pharmacal Research,. doi:10.1007/s12272-013-0254-8.
Google Scholar

Download references

Softwares

The raw data were firstly processed with the Bruker Topspin 2.1 software. Peak lists were extracted using ACD/Labs 12.00 (ACD/NMR processor). For manipulating the 1D and COSY data (pre-processings steps), generating the GPL matrices and performing the clustering processes, the R software environment was exclusively used (http://www.R-project.org).

Acknowledgments

This work was supported by the FNRS from which P. de Tullio is senior research associate. Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is also gratefully acknowledged.

Author information

Authors and Affiliations

Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université Catholique de Louvain (UCL), Voie du Roman Pays 20, bte L1.04.01, B-1348, Louvain-la-Neuve, Belgium
Baptiste Féraud & Bernadette Govaerts
Machine Learning Group, Université Catholique de Louvain (UCL), Voie du Roman Pays 20, bte L1.04.01, B-1348, Louvain-la-Neuve, Belgium
Baptiste Féraud & Michel Verleysen
SAMM, Université Paris I, Panthéon - Sorbonne, Paris, France
Michel Verleysen
Center for Interdisciplinary Research on Medicines (CIRM), Université de Liège (ULg), Liège, Belgium
Pascal de Tullio

Authors

Baptiste Féraud
View author publications
You can also search for this author in PubMed Google Scholar
Bernadette Govaerts
View author publications
You can also search for this author in PubMed Google Scholar
Michel Verleysen
View author publications
You can also search for this author in PubMed Google Scholar
Pascal de Tullio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baptiste Féraud.

Ethics declarations

Conflict of Interest

Authors declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This study analyzes collected data which involved human participants who had provided informed consent.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Féraud, B., Govaerts, B., Verleysen, M. et al. Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with ¹H-NMR. Metabolomics 11, 1756–1768 (2015). https://doi.org/10.1007/s11306-015-0830-7

Download citation

Received: 03 October 2014
Accepted: 13 July 2015
Published: 23 July 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s11306-015-0830-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with ¹H-NMR

Abstract

Access this article

Similar content being viewed by others

Combining rapid 2D NMR experiments with novel pre-processing workflows and MIC quality measures for metabolomics

Metabolomics Data Treatment: Basic Directions of the Full Process

Identifying unknown metabolites using NMR-based metabolic profiling techniques

References

Softwares

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Human and Animal Rights and Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR

Abstract

Access this article

Similar content being viewed by others

Combining rapid 2D NMR experiments with novel pre-processing workflows and MIC quality measures for metabolomics

Metabolomics Data Treatment: Basic Directions of the Full Process

Identifying unknown metabolites using NMR-based metabolic profiling techniques

References

Softwares

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Human and Animal Rights and Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with ¹H-NMR