Reflections on univariate and multivariate analysis of metabolomics data
Metabolomics experiments usually result in a large quantity of data. Univariate and multivariate analysis techniques are routinely used to extract relevant information from the data with the aim of providing biological knowledge on the problem studied. Despite the fact that statistical tools like the t test, analysis of variance, principal component analysis, and partial least squares discriminant analysis constitute the backbone of the statistical part of the vast majority of metabolomics papers, it seems that many basic but rather fundamental questions are still often asked, like: Why do the results of univariate and multivariate analyses differ? Why apply univariate methods if you have already applied a multivariate method? Why if I do not see something univariately I see something multivariately? In the present paper we address some aspects of univariate and multivariate analysis, with the scope of clarifying in simple terms the main differences between the two approaches. Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.
KeywordsUnivariate analysis Multivariate analysis Hypothesis testing Multiple test correction Overfitting Consistency at large
This project was financed by The Netherlands Metabolomics Centre (NMC), which is part of The Netherlands Genomics Initiative (NGI)/Netherlands Organization for Scientific Research. The authors wish to thank Claudio Luchinat and Renger Jellema for fruitful comments on the manuscript.
- Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57, 289–300.Google Scholar
- Dillon, W. R., & Goldstein, M. (1984). Multivariate analysis. New York: Wiley.Google Scholar
- Hui, B. S., & Wold, H. (1982). Consistency and consistency at large of partial least squares estimates (pp. 119–130). Amsterdam: North Holland.Google Scholar
- Jolliffe, I. T. (2002). Principal component analysis, Wiley Online Library.Google Scholar
- Pang, H. and T. Tong (2012). Recent advances in discriminant analysis for high-dimensional data classification. Journal of Biometrics & Biostatistics.Google Scholar
- Rosipal, R., & Trejo, L. J. (2002). Kernel partial least squares regression in reproducing Kernel Hilbert space. The Journal of Machine Learning Research, 2, 97–123.Google Scholar
- Schneeweiss, H. (1993). Consistency at large in models with latent variables. Amsterdam: Elsevier.Google Scholar
- Sokal, R. R., & Rohlf, F. J. (1995). Biometry. New York: W. H. Freeman and Company.Google Scholar
- Xu, Y., E. Correa and R. Goodacre (2013). Integrating multiple analytical platforms and chemometrics for comprehensive metabolic profiling: Application to meat spoilage detection. Analytical and bioanalytical chemistry: 1–12.Google Scholar