Abstract
The assessment of data analysis methods in 1H NMR based metabolic profiling is hampered owing to a lack of knowledge of the exact sample composition. In this study, an artificial complex mixture design comprising two artificially defined groups designated normal and disease, each containing 30 samples, was implemented using 21 metabolites at concentrations typically found in human urine and having a realistic distribution of inter-metabolite correlations. These artificial mixtures were profiled by 1H NMR spectroscopy and used to assess data analytical methods in the task of differentiating the two conditions. When metabolites were individually quantified, volcano plots provided an excellent method to track the effect size and significance of the change between conditions. Interestingly, the Welch t test detected a similar set of metabolites changing between classes in both quantified and spectral data, suggesting that differential analysis of 1H NMR spectra using a false discovery rate correction, taking into account fold changes, is a reliable approach to detect differential metabolites in complex mixture studies. Various multivariate regression methods based on partial least squares (PLS) were applied in discriminant analysis mode. The most reliable methods in quantified and spectral 1H NMR data were PLS and RPLS linear and logistic regression respectively. A jackknife based strategy for variable selection was assessed on both quantified and spectral data and results indicate that it may be possible to improve on the conventional Orthogonal-PLS methodology in terms of accuracy and sensitivity. A key improvement of our approach consists of objective criteria to select significant signals associated with a condition that provides a confidence level on the discoveries made, which can be implemented in metabolic profiling studies.
Similar content being viewed by others
References
Allen, J., Davey, H. M., Broadhurst, D., Heald, J. K., Rowland, J. J., Oliver, S. G., et al. (2003). High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotechnology, 21, 692–696.
Bundy, J., Davey, M., & Viant, M. (2009). Environmental metabolomics: a critical review and future perspectives. Metabolomics, 5, 3–21. doi:10.1007/s11306-008-0152-0.
Chadeau-Hyam, M., Ebbels, T. M. D., Brown, I. J., Chan, Q., Stamler, J., Huang, C. C., et al. (2010). Metabolic profiling and the metabolome-wide association study: significance level for biomarker identification. Journal of Proteome Research, 9, 4620–4627. doi:10.1021/pr1003449.
Cloarec, O., Dumas, M. E., Craig, A., Barton, R. H., Trygg, J., Hudson, J., et al. (2005a). Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. Analytical Chemistry, 77, 1282–1289. doi:10.1021/ac048630x.
Cloarec, O., Dumas, M. E., Trygg, J., Craig, A., Barton, R. H., Lindon, J. C., et al. (2005b). Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. Analytical Chemistry, 77, 517–526. doi:10.1021/ac048803i.
Couto Alves, A., Rantalainen, M., Holmes, E., Nicholson, J. K., & Ebbels, T. M. (2009). Analytic properties of statistical total correlation spectroscopy based information recovery in (1)H NMR metabolic data sets. Analytical Chemistry,. doi:10.1021/ac801982h.
Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K., & Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Analytical Chemistry, 78, 2262–2267.
Dieterle, F., Ross, A., Schlotterbeck, G., & Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in H-1 NMR metabonomics. Analytical Chemistry, 78, 4281–4290.
Ding, B., & Gentleman, R. (2004). Classification using generalized partial least squares. Bioconductor Project Working Papers, 5.
Dumas, M.-E., Maibaum, E. C., Teague, C., Ueshima, H., Zhou, B., Lindon, J. C., et al. (2006). Assessment of analytical reproducibility of 1H NMR spectroscopy based metabonomics for large-scale epidemiological research: the INTERMAP study. Analytical Chemistry, 78, 2199–2208. doi:10.1021/ac0517085.
Fiehn, O., Kopka, J., Dormann, P., Altmann, T., Trethewey, R. N., & Willmitzer, L. (2000). Metabolite profiling for plant functional genomics. Nat Biotech, 18, 1157–1161.
Fort, G. (2005). Inference in logistic regression models. http://perso.telecom-paristech.fr/~gfort/GLM/Programs.html.
Fort, G., & Lambert-Lacroix, S. (2005). Classification using partial least squares with penalized logistic regression. Bioinformatics, 21, 1104–1111. doi:10.1093/bioinformatics/bti114.
Holmes, E., Loo, R. L., Stamler, J., Bictash, M., Yap, I. K. S., Chan, Q., et al. (2008). Human metabolic phenotype diversity and its association with diet and blood pressure. Nature, 453, 396–400.
Keun, H. C., Ebbels, T. M. D., Antti, H., Bollard, M. E., Beckonert, O., Schlotterbeck, G., et al. (2002). Analytical reproducibility in 1H NMR-based metabonomic urinalysis. Chemical Research in Toxicology, 15, 1380–1386. doi:10.1021/tx0255774.
Lindon, J., Nicholson, J., & Holmes, E. (2007). The handbook of metabonomics and metabolomics. Amsterdam: Elsevier Science.
Lloyd, S. (2003). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28, 129–137.
Marx, B. D. (1996). Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics, 38, 374–381.
Muncey, H., Jones, R., De Iorio, M., & Ebbels, T. (2010). MetAssimulo: simulation of realistic NMR metabolic profiles. BMC Bioinformatics, 11, 496.
Nguyen, D. V., & Rocke, D. M. (2002). Multi-class cancer classification via partial least squares with gene expression profiles. Bioinformatics, 18, 1216–1226. doi:10.1093/bioinformatics/18.9.1216.
Saude, E. J., Adamko, D., Rowe, B. H., Marrie, T., & Sykes, B. D. (2007). Variation of metabolites in normal human urine. Metabolomics, 3, 439–451.
Trygg, J., & Wold, S. (2002). Orthogonal projections to latent structures (O-PLS). Journal of Chemometrics, 16, 119–128.
Westerhuis, J., Hoefsloot, H., Smit, S., Vis, D., Smilde, A., van Velzen, E., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4, 81–89. doi:10.1007/s11306-007-0099-6.
Wishart, D. S., Tzur, D., Knox, C., Eisner, R., Guo, A. C., Young, N., et al. (2007). HMDB: the human metabolome database. Nucleic Acids Research, 35, D521–D526.
Acknowledgments
Alexessander Couto Alves acknowledges an Imperial College Faculty of Medicine PhD studentship.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Alves, A.C., Li, J.V., Garcia-Perez, I. et al. Characterization of data analysis methods for information recovery from metabolic 1H NMR spectra using artificial complex mixtures. Metabolomics 8, 1170–1180 (2012). https://doi.org/10.1007/s11306-012-0422-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-012-0422-8