Abstract
An important application of metabolic profiles is to discover informative metabolites/biomarkers which are predictive of a clinical outcome under investigation. Therefore, there is a need to develop statistically efficient method for screening such kind of metabolites from the candidates. The most commonly used criteria to assess variable (metabolite) importance may be the P value obtained by performing t test on each metabolite alone, without considering the influence of other variables. In this work, a new strategy, called subwindow permutation analysis (SPA) coupled with partial least squares linear discriminant analysis (PLSLDA), is developed for statistical assessment of variable importance. The main contribution of SPA is that, unlike t test, it can output a conditional P value by implicitly taking into account the synergetic effect of all the other variables. In this sense, the conditional P value could to some extent help locate a good combination of informative variables. When applied to two metabolic datasets (type 2 diabetes mellitus data and childhood overweight data), it is shown that the performance of both the unsupervised principal component analysis (PCA) and the supervised PLSLDA are greatly improved when using the informative metabolites revealed by SPA. The source codes for implementing SPA in both MATLAB and R (R package for both Linux and Windows) are freely available at: http://code.google.com/p/spa2010/downloads/list.
Similar content being viewed by others
Abbreviations
- MPA:
-
Model population analysis
- PLSLDA:
-
Partial least squares linear discriminant analysis
- SPA:
-
Subwindow permutation analysis
- PCA:
-
Principal component analysis
- COSS:
-
COnditional Synergetic Score
References
Anastassiou, D. (2007). Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology, 3, 1–8.
Bain, J., Stevens, R., Wenner, B., Ilkayeva, O., Muoio, D., & Newgard, C. (2009). Metabolomics applied to diabetes research: Moving from information to knowledge. Diabetes, 58, 2429–2443.
Bertram, H. C., Eggers, N., & Eller, N. (2009). Potential of human saliva for nuclear magnetic resonance-based metabolomics and for health-related biomarker identification. Analytical Chemistry, 81, 9188–9193.
Boudonck, K. J., Mitchell, M. W., Wulff, J., & Ryals, J. A. (2009). Characterization of the biochemical variability of bovine milk using metabolomics. Metabolomics, 5, 375–386.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Chau, F.-T., Chan, H.-Y., Cheung, C.-Y., Xu, C.-J., Liang, Y., & Kvalheim, O. M. (2009). Recipe for uncovering the bioactive components in herbal medicine. Analytical Chemistry, 81, 7217–7225.
Crews, B., Wikoff, W. R., Patti, G. J., Woo, H. K., et al. (2009). Variability analysis of human plasma and cerebral spinal fluid reveals statistical significance of changes in mass spectrometry-based metabolomics data. Analytical Chemistry, 81, 8538–8544.
De Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory System, 18, 251–263.
De Monte, V. E., Geffen, G. M., May, C. R., & McFarland, K. (2004). Double cross-validation and improved sensitivity of the rapid screen of mild traumatic brain injury. Journal of Clinical and Experimental Neuropsychology, 26, 628–644.
Filzmoser, P., Liebmann, B., & Varmuza, K. (2009). Repeated double cross validation. Journal of Chemometrics, 23, 160–171.
Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.
Graham, S. F., Amigues, E., Migaud, M., & Browne, R. A. (2009). Application of NMR based metabolomics for mapping metabolite variation in European wheat. Metabolomics, 5, 302–306.
Hsing, T., Attoor, S., & Dougherty, E. (2003). Relation between permutation-test P values and classifier error estimates. Machine Learning, 52, 11–30.
Hulver, M. W., Berggren, J. R., Cortright, R. N., Dudek, R. W., et al. (2002). Skeletal muscle lipid metabolism with obesity. American Journal of Physiology. Endocrinology and Metabolism, 284, 741–747.
Kien, C. L., Bunn, J. Y., & Ugrasbul, F. (2005). Increasing dietary palmitic acid decreases fat oxidation and daily energy expenditure. American Journal of Clinical Nutrition, 82, 320–326.
Kvalheim, O. M., & Liang, Y.-Z. (1992). Heuristic evolving latent projections: Resolving two-way multicomponent data. 1. Selectivity, latent-projective graph, datascope, local rank, and unique resolution. Analytical Chemistry, 64, 936–946.
Li, H.-D., Liang, Y.-Z., Xu, Q.-S., & Cao, D.-S. (2009a). Model population analysis for variable selection. Journal of Chemometrics (accepted).
Li, X., Xu, Z., Lu, X., Yang, X., et al. (2009b). Comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry for metabonomics: Biomarker discovery for diabetes mellitus. Analytica Chimica Acta, 633, 257–262.
Liang, Y.-Z., Kvalheim, O. M., Keller, H. R., Massart, D. L., Kiechle, P., & Erni, F. (1992). Heuristic evolving latent projections: Resolving two-way multicomponent data. 2. Detection and resolution of minor constituents. Analytical Chemistry, 64, 946–953.
Lindgren, F., Hansen, B., & Karcher, W. (1996). Model validation by permutation tests: Applications to variable selection. Journal of Chemometrics, 10, 521–532.
Madigan, C., Ryan, M., Owens, D., Collins, P., & Tomkin, G. H. (2005). Comparison of diets high in monounsaturated versus polyunsaturated fatty acid on postprandial lipoproteins in diabetes. Irish Journal of Medical Science, 174, 8–20.
Madsen, R., Lundstedt, T., & Trygg, J. (2010). Chemometrics in metabolomics—a review in human disease diagnosis. Analytica Chimica Acta, 659, 23–33.
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.
McMillen, I., Rattanatray, L., Duffield, J., Morrison, J., et al. (2009). The early origins of later obesity: Pathways and mechanisms. Advances in Experimental Medicine and Biology, 646, 71–81.
Proenza, A. M., Roca, P., Cresp, C., Llad, I., & Palou, A. (1998). Blood amino acid compartmentation in men and women with different degrees of obesity. The Journal of Nutritional Biochemistry, 9, 697–704.
Rajalahti, T., Arneberg, R., Berven, F. S., Myhr, K.-M., Ulvik, R. J., & Kvalheim, O. M. (2009). Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemometrics and Intelligent Laboratory System, 95, 35–48.
Ridderstrale, M., & Groop, L. (2009). Genetic dissection of type 2 diabetes. Molecular and Cellular Endocrinology, 297, 10–17.
Selman, B. (2008). Computational science: A hard statistical view. Nature, 451, 639–640.
Stancáková, A., Javorský, M., Kuulasmaa, T., Haffner, S., Kuusisto, J., & Laakso, M. (2009). Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6414 Finnish men. Diabetes, 58, 1212–1221.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B, 36, 111–147.
Tan, B.-B., Liang, Y.-Z., Yi, L.-Z., Li, H.-D., et al. (2009). Identification of free fatty acids profiling of type 2 diabetes mellitus and exploring possible biomarkers by GC–MS coupled with chemometrics. Metabolomics. doi:10.1007/s11306-009-0189-8.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory System, 58, 109–130.
Wongravee, K., Lloyd, G., Hall, J., Holmboe, M., et al. (2009). Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles. Metabolomics, 5, 387–406.
Xu, Q.-S., & Liang, Y.-Z. (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory System, 56, 1–11.
Zeng, M.-M., Liang, Y.-Z., Li, H.-D., Wang, M., et al. (2010). Plasma metabolic fingerprinting of childhood obesity by GC/MS in conjunction with multivariate statistical analysis. Journal of Pharmaceutical and Biomedical Analysis, 52, 265–272.
Zhang, J., Yan, L., Chen, W., Lin, L., et al. (2009). Metabonomics research of diabetic nephropathy and type 2 diabetes mellitus based on UPLC-oaTOF-MS system. Analytica Chimica Acta, 650, 16–22.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320.
Acknowledgements
This work is financially supported by the National Nature Foundation Committee of P.R. China (Grants No. 20875104, No. 10771217 and No. 20975115), the international cooperation project on traditional Chinese medicines of ministry of science and technology of China (Grant No. 2007DFA40680). The studies meet with the approval of the university’s review board.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, HD., Zeng, MM., Tan, BB. et al. Recipe for revealing informative metabolites based on model population analysis. Metabolomics 6, 353–361 (2010). https://doi.org/10.1007/s11306-010-0213-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-010-0213-z