Skip to main content

Advertisement

Log in

Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Compared with the widely used 1H-NMR spectroscopy, two-dimensional NMR experiments provide more sophisticated spectra which should facilitate the identification of relevant spectral zones or biomarkers in metabolomics. This paper focuses on 1H-1H COrrelation SpectroscopY (COSY) spectral data. In spite of longer inherent acquisition times, it is commonly accepted by users (biologists, healthcare professionals) that the introduction of an additional dimension probably represents a huge qualitative step for investigations in terms of metabolites identification. Moreover, it seems natural that more information leads to more predictive power. But, until now, very few statistical studies clearly proved this assumption. Therefore a fundamental question is “Is this supplementary information relevant?”. In order to extend the statistical properties developed for 1D spectroscopy to the challenges raised by 2D spectra, a rigorous study of the performances of COSY spectra is needed as a prerequisite. Having introduced new pre-processing concepts, such as the Global Peak List or an ad hoc 2D “bucketing”, this paper presents an innovative methodology based on multivariate clustering algorithms to evaluate this question. Numerical clustering quality indexes and graphical results are proposed, based both on the spectral presence or absence of peaks (binary position vectors) and on peak intensities, and through different levels of spectral resolution. The second goal of this paper is to compare clustering performances obtained on COSY and on 1H-NMR spectra, with the aim of understanding to what extent the COSY spectra carry more Metabolomic Informative Content about the signal than 1D ones. The methodology is applied to two real experimental designs involving different groups of spectra (which define the signal): a 4-mixture cell culture media containing various supervised metabolites and a complex human serum based design. It is shown that COSY spectra appear to be statistically powerful and, in addition, provide better clustering results than corresponding 1H-NMR when using unlabeled information. Consequently, additional information appears to be relevant for metabolomics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akitt J.W., Mann B.E. (2000). NMR and Chemistry (Manual), Cheltenham UK, Stanley Thornes. p. 287.

  • Aue, W. P., Bartholdi, E., & Ernst, R. R. (1976). Two-dimensional spectroscopy. Application to nuclear magnetic resonance. The Journal of Chemical Physics, 64, 2229–2246.

    Article  CAS  Google Scholar 

  • Bruschweiler, R., & Bingol, K. (2011). Deconvolution of chemical mixtures with high complexity by NMR consensus trace clustering. Analytical Chemistry, 83(19), 7412–7417.

    Article  PubMed  PubMed Central  Google Scholar 

  • Bruschweiler, R., Bingol, K., Bruschweiler-Li, L., & Li, D.-W. (2014). Customized metabolomics database for the analysis of NMR 1H-1H TOCSY and 13C-1H HSQC-TOCSY Spectra of Complex Mixtures. Analytical Chemistry, 86(11), 5494–5501.

    Article  PubMed  PubMed Central  Google Scholar 

  • Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.

    Article  CAS  PubMed  Google Scholar 

  • Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–1(2), 224–227.

    Article  Google Scholar 

  • Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.

    Article  Google Scholar 

  • Giraudeau, P., Remaud, G., & Akoka, S. (2009). Evaluation of Ultrafast 2D NMR for quantitative analysis. Analytical Chemistry, 81(1), 479–484.

    Article  CAS  PubMed  Google Scholar 

  • Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.

    Article  Google Scholar 

  • Holliday, J. D., Hu, C. Y., & Willett, P. (2002). Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High Throughput Screening, 5(2), 155–166.

    Article  CAS  PubMed  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Iman, R. L. (2008). Latin hypercube sampling. New York: Wiley.

    Book  Google Scholar 

  • Keeler, J. (2010). Understanding NMR Spectroscopy (2nd ed., pp. 190–191). New York: Wiley.

    Google Scholar 

  • Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical chemistry, 86(12), 5946–5954.

    Article  CAS  PubMed  Google Scholar 

  • Lloyd S. P., Least squares quantization in PCM, Technical Note, Bell Laboratories, IEEE Transactions on Information Theory 28, pp. 128-137 (1957, 1982).

  • MacKay, D. (2003). An Example Inference Task: Clustering, Information Theory, Inference and Learning Algorithms (pp. 284–292). Cambridge: Cambridge University Press.

    Google Scholar 

  • MacQueen J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, (vol 1), University of California Press, (pp. 281-297) .

  • Mao, X., & Ye, C. (1997). Phase-shift presaturation for water peak suppression in biomolecular NMR experiments, Science in China. Series C, Life sciences, 40(4), 345–350.

    Article  CAS  Google Scholar 

  • Marion, D., & Bax, A. (1988). Baseline distortion in real-fourier-transform NMR spectra. Journal of Magnetic Resonance (1969), 79(2), 252–356.

    Article  Google Scholar 

  • Murtagh F., Legendre P., Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285 (2011).

  • Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86–97.

    Google Scholar 

  • Nicholson, J., Connelly, J., Lindon, J. C., & Holmes, E. (2002). Metabonomics: a generic platform for the study of drug toxicity and gene function. Nature Reviews Drug Discovery, 1, 153–161.

    Article  CAS  PubMed  Google Scholar 

  • Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis, 52(1), 596–613.

    Article  Google Scholar 

  • Queiroz Junior, L. H. K., Ferreira, A. G., & Giraudeau, P. (2013). Optimization and practical implementation of ultrafast 2D NMR experiments. Quimica Nova, 36(4), 577–581.

    Article  Google Scholar 

  • Rasmussen, L. G., Savorani, F., Larsen, T. M., Dragsted, L. O., Astrup, A., & Engelsen, S. B. (2011). Standardization of factors that influence human urine metabolomics. Metabolomics, 7(1), 71–83.

    Article  CAS  Google Scholar 

  • Rousseau R., Statistical contribution to the analysis of metabonomic data in 1H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532 (2011).

  • Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification, Artificial Neural Networks, ICANN 2009. Berlin: Springer.

    Google Scholar 

  • Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.

    Article  CAS  Google Scholar 

  • Vanwinsberghe J., Bubble: development of a matlab tool for automated 1H-NMR data processing in metabonomics, Master’s thesis, Université de Strasbourg (2005).

  • Vega-Vazquez, M., Cobas, J. C., & Martin-Pastor, M. (2010). Fast multidimensional localized parallel NMR spectroscopy for the analysis of samples. Magnetic Resonance in Chemistry, 48(10), 749–752.

    Article  CAS  PubMed  Google Scholar 

  • Ward, J. H. (1963). Hierarchical Grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.

    Article  Google Scholar 

  • Xi, Y., deRopp, J. S., Viant, M., Woodruff, D., & Yu, P. (2007). Automated screening for metabolites in complex mixtures using 2D COSY NMR spectroscopy. Metabolomics, 2(4), 221–233.

    Article  Google Scholar 

  • Xia, J., & Wishart, D. (2010). MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics, 26(18), 2342–2344.

    Article  CAS  PubMed  Google Scholar 

  • Yun, K., Sunghyouk, P., Jongheon, S., & Dong-Chan, O. (2013). Application of 13C-labeling and 13C-13C COSY NMR experiments in the structure determination of a microbial natural product. Archive of Pharmacal Research,. doi:10.1007/s12272-013-0254-8.

    Google Scholar 

Download references

Softwares

The raw data were firstly processed with the Bruker Topspin 2.1 software. Peak lists were extracted using ACD/Labs 12.00 (ACD/NMR processor). For manipulating the 1D and COSY data (pre-processings steps), generating the GPL matrices and performing the clustering processes, the R software environment was exclusively used (http://www.R-project.org).

Acknowledgments

This work was supported by the FNRS from which P. de Tullio is senior research associate. Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baptiste Féraud.

Ethics declarations

Conflict of Interest

Authors declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This study analyzes collected data which involved human participants who had provided informed consent.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Féraud, B., Govaerts, B., Verleysen, M. et al. Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR. Metabolomics 11, 1756–1768 (2015). https://doi.org/10.1007/s11306-015-0830-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-015-0830-7

Keywords

Navigation