Abstract
Compared with the widely used 1H-NMR spectroscopy, two-dimensional NMR experiments provide more sophisticated spectra which should facilitate the identification of relevant spectral zones or biomarkers in metabolomics. This paper focuses on 1H-1H COrrelation SpectroscopY (COSY) spectral data. In spite of longer inherent acquisition times, it is commonly accepted by users (biologists, healthcare professionals) that the introduction of an additional dimension probably represents a huge qualitative step for investigations in terms of metabolites identification. Moreover, it seems natural that more information leads to more predictive power. But, until now, very few statistical studies clearly proved this assumption. Therefore a fundamental question is “Is this supplementary information relevant?”. In order to extend the statistical properties developed for 1D spectroscopy to the challenges raised by 2D spectra, a rigorous study of the performances of COSY spectra is needed as a prerequisite. Having introduced new pre-processing concepts, such as the Global Peak List or an ad hoc 2D “bucketing”, this paper presents an innovative methodology based on multivariate clustering algorithms to evaluate this question. Numerical clustering quality indexes and graphical results are proposed, based both on the spectral presence or absence of peaks (binary position vectors) and on peak intensities, and through different levels of spectral resolution. The second goal of this paper is to compare clustering performances obtained on COSY and on 1H-NMR spectra, with the aim of understanding to what extent the COSY spectra carry more Metabolomic Informative Content about the signal than 1D ones. The methodology is applied to two real experimental designs involving different groups of spectra (which define the signal): a 4-mixture cell culture media containing various supervised metabolites and a complex human serum based design. It is shown that COSY spectra appear to be statistically powerful and, in addition, provide better clustering results than corresponding 1H-NMR when using unlabeled information. Consequently, additional information appears to be relevant for metabolomics applications.
Similar content being viewed by others
References
Akitt J.W., Mann B.E. (2000). NMR and Chemistry (Manual), Cheltenham UK, Stanley Thornes. p. 287.
Aue, W. P., Bartholdi, E., & Ernst, R. R. (1976). Two-dimensional spectroscopy. Application to nuclear magnetic resonance. The Journal of Chemical Physics, 64, 2229–2246.
Bruschweiler, R., & Bingol, K. (2011). Deconvolution of chemical mixtures with high complexity by NMR consensus trace clustering. Analytical Chemistry, 83(19), 7412–7417.
Bruschweiler, R., Bingol, K., Bruschweiler-Li, L., & Li, D.-W. (2014). Customized metabolomics database for the analysis of NMR 1H-1H TOCSY and 13C-1H HSQC-TOCSY Spectra of Complex Mixtures. Analytical Chemistry, 86(11), 5494–5501.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–1(2), 224–227.
Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3), 32–57.
Giraudeau, P., Remaud, G., & Akoka, S. (2009). Evaluation of Ultrafast 2D NMR for quantitative analysis. Analytical Chemistry, 81(1), 479–484.
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
Holliday, J. D., Hu, C. Y., & Willett, P. (2002). Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High Throughput Screening, 5(2), 155–166.
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Iman, R. L. (2008). Latin hypercube sampling. New York: Wiley.
Keeler, J. (2010). Understanding NMR Spectroscopy (2nd ed., pp. 190–191). New York: Wiley.
Le Guennec, A., Giraudeau, P., & Caldarelli, S. (2014). Evaluation of fast 2D NMR for metabolomics. Analytical chemistry, 86(12), 5946–5954.
Lloyd S. P., Least squares quantization in PCM, Technical Note, Bell Laboratories, IEEE Transactions on Information Theory 28, pp. 128-137 (1957, 1982).
MacKay, D. (2003). An Example Inference Task: Clustering, Information Theory, Inference and Learning Algorithms (pp. 284–292). Cambridge: Cambridge University Press.
MacQueen J. B. (1967). Some Methods for classification and Analysis of Multivariate Observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, (vol 1), University of California Press, (pp. 281-297) .
Mao, X., & Ye, C. (1997). Phase-shift presaturation for water peak suppression in biomolecular NMR experiments, Science in China. Series C, Life sciences, 40(4), 345–350.
Marion, D., & Bax, A. (1988). Baseline distortion in real-fourier-transform NMR spectra. Journal of Magnetic Resonance (1969), 79(2), 252–356.
Murtagh F., Legendre P., Ward’s hierarchical clustering method: clustering criterion and agglomerative algorithm, arXiv preprint arXiv:1111.6285 (2011).
Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86–97.
Nicholson, J., Connelly, J., Lindon, J. C., & Holmes, E. (2002). Metabonomics: a generic platform for the study of drug toxicity and gene function. Nature Reviews Drug Discovery, 1, 153–161.
Plasse, M., Niang, N., Saporta, G., Villeminot, A., & Leblond, L. (2007). Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Computational Statistics and Data Analysis, 52(1), 596–613.
Queiroz Junior, L. H. K., Ferreira, A. G., & Giraudeau, P. (2013). Optimization and practical implementation of ultrafast 2D NMR experiments. Quimica Nova, 36(4), 577–581.
Rasmussen, L. G., Savorani, F., Larsen, T. M., Dragsted, L. O., Astrup, A., & Engelsen, S. B. (2011). Standardization of factors that influence human urine metabolomics. Metabolomics, 7(1), 71–83.
Rousseau R., Statistical contribution to the analysis of metabonomic data in 1H-NMR spectroscopy, PhD Thesis, UCL, http://hdl.handle.net/2078.1/75532 (2011).
Santos, J. M., & Embrechts, M. (2009). On the use of the adjusted rand index as a metric for evaluating supervised classification, Artificial Neural Networks, ICANN 2009. Berlin: Springer.
Sousa, S. A., Magalhaes, A., & Castro Ferreira, M. M. (2013). Optimized bucketing for NMR spectra. Chemometrics and Intelligent Laboratory Systems, 122, 93–102.
Vanwinsberghe J., Bubble: development of a matlab tool for automated 1H-NMR data processing in metabonomics, Master’s thesis, Université de Strasbourg (2005).
Vega-Vazquez, M., Cobas, J. C., & Martin-Pastor, M. (2010). Fast multidimensional localized parallel NMR spectroscopy for the analysis of samples. Magnetic Resonance in Chemistry, 48(10), 749–752.
Ward, J. H. (1963). Hierarchical Grouping to optimize an objective function. Journal of American Statistical Association, 58(301), 236–244.
Xi, Y., deRopp, J. S., Viant, M., Woodruff, D., & Yu, P. (2007). Automated screening for metabolites in complex mixtures using 2D COSY NMR spectroscopy. Metabolomics, 2(4), 221–233.
Xia, J., & Wishart, D. (2010). MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics, 26(18), 2342–2344.
Yun, K., Sunghyouk, P., Jongheon, S., & Dong-Chan, O. (2013). Application of 13C-labeling and 13C-13C COSY NMR experiments in the structure determination of a microbial natural product. Archive of Pharmacal Research,. doi:10.1007/s12272-013-0254-8.
Softwares
The raw data were firstly processed with the Bruker Topspin 2.1 software. Peak lists were extracted using ACD/Labs 12.00 (ACD/NMR processor). For manipulating the 1D and COSY data (pre-processings steps), generating the GPL matrices and performing the clustering processes, the R software environment was exclusively used (http://www.R-project.org).
Acknowledgments
This work was supported by the FNRS from which P. de Tullio is senior research associate. Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is also gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This study analyzes collected data which involved human participants who had provided informed consent.
Rights and permissions
About this article
Cite this article
Féraud, B., Govaerts, B., Verleysen, M. et al. Statistical treatment of 2D NMR COSY spectra in metabolomics: data preparation, clustering-based evaluation of the Metabolomic Informative Content and comparison with 1H-NMR. Metabolomics 11, 1756–1768 (2015). https://doi.org/10.1007/s11306-015-0830-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-015-0830-7