Characterization of data analysis methods for information recovery from metabolic 1H NMR spectra using artificial complex mixtures
- 398 Downloads
The assessment of data analysis methods in 1H NMR based metabolic profiling is hampered owing to a lack of knowledge of the exact sample composition. In this study, an artificial complex mixture design comprising two artificially defined groups designated normal and disease, each containing 30 samples, was implemented using 21 metabolites at concentrations typically found in human urine and having a realistic distribution of inter-metabolite correlations. These artificial mixtures were profiled by 1H NMR spectroscopy and used to assess data analytical methods in the task of differentiating the two conditions. When metabolites were individually quantified, volcano plots provided an excellent method to track the effect size and significance of the change between conditions. Interestingly, the Welch t test detected a similar set of metabolites changing between classes in both quantified and spectral data, suggesting that differential analysis of 1H NMR spectra using a false discovery rate correction, taking into account fold changes, is a reliable approach to detect differential metabolites in complex mixture studies. Various multivariate regression methods based on partial least squares (PLS) were applied in discriminant analysis mode. The most reliable methods in quantified and spectral 1H NMR data were PLS and RPLS linear and logistic regression respectively. A jackknife based strategy for variable selection was assessed on both quantified and spectral data and results indicate that it may be possible to improve on the conventional Orthogonal-PLS methodology in terms of accuracy and sensitivity. A key improvement of our approach consists of objective criteria to select significant signals associated with a condition that provides a confidence level on the discoveries made, which can be implemented in metabolic profiling studies.
KeywordsArtificial mixtures Data analysis t test PLS NMR
Alexessander Couto Alves acknowledges an Imperial College Faculty of Medicine PhD studentship.
- Chadeau-Hyam, M., Ebbels, T. M. D., Brown, I. J., Chan, Q., Stamler, J., Huang, C. C., et al. (2010). Metabolic profiling and the metabolome-wide association study: significance level for biomarker identification. Journal of Proteome Research, 9, 4620–4627. doi: 10.1021/pr1003449.PubMedCrossRefGoogle Scholar
- Cloarec, O., Dumas, M. E., Craig, A., Barton, R. H., Trygg, J., Hudson, J., et al. (2005a). Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. Analytical Chemistry, 77, 1282–1289. doi: 10.1021/ac048630x.PubMedCrossRefGoogle Scholar
- Cloarec, O., Dumas, M. E., Trygg, J., Craig, A., Barton, R. H., Lindon, J. C., et al. (2005b). Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. Analytical Chemistry, 77, 517–526. doi: 10.1021/ac048803i.PubMedCrossRefGoogle Scholar
- Ding, B., & Gentleman, R. (2004). Classification using generalized partial least squares. Bioconductor Project Working Papers, 5.Google Scholar
- Dumas, M.-E., Maibaum, E. C., Teague, C., Ueshima, H., Zhou, B., Lindon, J. C., et al. (2006). Assessment of analytical reproducibility of 1H NMR spectroscopy based metabonomics for large-scale epidemiological research: the INTERMAP study. Analytical Chemistry, 78, 2199–2208. doi: 10.1021/ac0517085.PubMedCrossRefGoogle Scholar
- Fort, G. (2005). Inference in logistic regression models. http://perso.telecom-paristech.fr/~gfort/GLM/Programs.html.
- Lindon, J., Nicholson, J., & Holmes, E. (2007). The handbook of metabonomics and metabolomics. Amsterdam: Elsevier Science.Google Scholar