Skip to main content
Log in

Recipe for revealing informative metabolites based on model population analysis

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

An important application of metabolic profiles is to discover informative metabolites/biomarkers which are predictive of a clinical outcome under investigation. Therefore, there is a need to develop statistically efficient method for screening such kind of metabolites from the candidates. The most commonly used criteria to assess variable (metabolite) importance may be the P value obtained by performing t test on each metabolite alone, without considering the influence of other variables. In this work, a new strategy, called subwindow permutation analysis (SPA) coupled with partial least squares linear discriminant analysis (PLSLDA), is developed for statistical assessment of variable importance. The main contribution of SPA is that, unlike t test, it can output a conditional P value by implicitly taking into account the synergetic effect of all the other variables. In this sense, the conditional P value could to some extent help locate a good combination of informative variables. When applied to two metabolic datasets (type 2 diabetes mellitus data and childhood overweight data), it is shown that the performance of both the unsupervised principal component analysis (PCA) and the supervised PLSLDA are greatly improved when using the informative metabolites revealed by SPA. The source codes for implementing SPA in both MATLAB and R (R package for both Linux and Windows) are freely available at: http://code.google.com/p/spa2010/downloads/list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

MPA:

Model population analysis

PLSLDA:

Partial least squares linear discriminant analysis

SPA:

Subwindow permutation analysis

PCA:

Principal component analysis

COSS:

COnditional Synergetic Score

References

  • Anastassiou, D. (2007). Computational analysis of the synergy among multiple interacting genes. Molecular Systems Biology, 3, 1–8.

    Article  Google Scholar 

  • Bain, J., Stevens, R., Wenner, B., Ilkayeva, O., Muoio, D., & Newgard, C. (2009). Metabolomics applied to diabetes research: Moving from information to knowledge. Diabetes, 58, 2429–2443.

    Article  CAS  PubMed  Google Scholar 

  • Bertram, H. C., Eggers, N., & Eller, N. (2009). Potential of human saliva for nuclear magnetic resonance-based metabolomics and for health-related biomarker identification. Analytical Chemistry, 81, 9188–9193.

    Article  CAS  PubMed  Google Scholar 

  • Boudonck, K. J., Mitchell, M. W., Wulff, J., & Ryals, J. A. (2009). Characterization of the biochemical variability of bovine milk using metabolomics. Metabolomics, 5, 375–386.

    Article  CAS  Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Chau, F.-T., Chan, H.-Y., Cheung, C.-Y., Xu, C.-J., Liang, Y., & Kvalheim, O. M. (2009). Recipe for uncovering the bioactive components in herbal medicine. Analytical Chemistry, 81, 7217–7225.

    Article  CAS  PubMed  Google Scholar 

  • Crews, B., Wikoff, W. R., Patti, G. J., Woo, H. K., et al. (2009). Variability analysis of human plasma and cerebral spinal fluid reveals statistical significance of changes in mass spectrometry-based metabolomics data. Analytical Chemistry, 81, 8538–8544.

    Article  CAS  PubMed  Google Scholar 

  • De Jong, S. (1993). SIMPLS: an alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory System, 18, 251–263.

    Article  Google Scholar 

  • De Monte, V. E., Geffen, G. M., May, C. R., & McFarland, K. (2004). Double cross-validation and improved sensitivity of the rapid screen of mild traumatic brain injury. Journal of Clinical and Experimental Neuropsychology, 26, 628–644.

    PubMed  Google Scholar 

  • Filzmoser, P., Liebmann, B., & Varmuza, K. (2009). Repeated double cross validation. Journal of Chemometrics, 23, 160–171.

    Article  CAS  Google Scholar 

  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531–537.

    Article  CAS  PubMed  Google Scholar 

  • Graham, S. F., Amigues, E., Migaud, M., & Browne, R. A. (2009). Application of NMR based metabolomics for mapping metabolite variation in European wheat. Metabolomics, 5, 302–306.

    Article  CAS  Google Scholar 

  • Hsing, T., Attoor, S., & Dougherty, E. (2003). Relation between permutation-test P values and classifier error estimates. Machine Learning, 52, 11–30.

    Article  Google Scholar 

  • Hulver, M. W., Berggren, J. R., Cortright, R. N., Dudek, R. W., et al. (2002). Skeletal muscle lipid metabolism with obesity. American Journal of Physiology. Endocrinology and Metabolism, 284, 741–747.

    Google Scholar 

  • Kien, C. L., Bunn, J. Y., & Ugrasbul, F. (2005). Increasing dietary palmitic acid decreases fat oxidation and daily energy expenditure. American Journal of Clinical Nutrition, 82, 320–326.

    CAS  PubMed  Google Scholar 

  • Kvalheim, O. M., & Liang, Y.-Z. (1992). Heuristic evolving latent projections: Resolving two-way multicomponent data. 1. Selectivity, latent-projective graph, datascope, local rank, and unique resolution. Analytical Chemistry, 64, 936–946.

    Article  CAS  Google Scholar 

  • Li, H.-D., Liang, Y.-Z., Xu, Q.-S., & Cao, D.-S. (2009a). Model population analysis for variable selection. Journal of Chemometrics (accepted).

  • Li, X., Xu, Z., Lu, X., Yang, X., et al. (2009b). Comprehensive two-dimensional gas chromatography/time-of-flight mass spectrometry for metabonomics: Biomarker discovery for diabetes mellitus. Analytica Chimica Acta, 633, 257–262.

    Article  CAS  PubMed  Google Scholar 

  • Liang, Y.-Z., Kvalheim, O. M., Keller, H. R., Massart, D. L., Kiechle, P., & Erni, F. (1992). Heuristic evolving latent projections: Resolving two-way multicomponent data. 2. Detection and resolution of minor constituents. Analytical Chemistry, 64, 946–953.

    Article  CAS  Google Scholar 

  • Lindgren, F., Hansen, B., & Karcher, W. (1996). Model validation by permutation tests: Applications to variable selection. Journal of Chemometrics, 10, 521–532.

    Article  CAS  Google Scholar 

  • Madigan, C., Ryan, M., Owens, D., Collins, P., & Tomkin, G. H. (2005). Comparison of diets high in monounsaturated versus polyunsaturated fatty acid on postprandial lipoproteins in diabetes. Irish Journal of Medical Science, 174, 8–20.

    Article  CAS  PubMed  Google Scholar 

  • Madsen, R., Lundstedt, T., & Trygg, J. (2010). Chemometrics in metabolomics—a review in human disease diagnosis. Analytica Chimica Acta, 659, 23–33.

    Article  CAS  PubMed  Google Scholar 

  • Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.

    Article  Google Scholar 

  • McMillen, I., Rattanatray, L., Duffield, J., Morrison, J., et al. (2009). The early origins of later obesity: Pathways and mechanisms. Advances in Experimental Medicine and Biology, 646, 71–81.

    Article  PubMed  Google Scholar 

  • Proenza, A. M., Roca, P., Cresp, C., Llad, I., & Palou, A. (1998). Blood amino acid compartmentation in men and women with different degrees of obesity. The Journal of Nutritional Biochemistry, 9, 697–704.

    Article  CAS  Google Scholar 

  • Rajalahti, T., Arneberg, R., Berven, F. S., Myhr, K.-M., Ulvik, R. J., & Kvalheim, O. M. (2009). Biomarker discovery in mass spectral profiles by means of selectivity ratio plot. Chemometrics and Intelligent Laboratory System, 95, 35–48.

    Article  CAS  Google Scholar 

  • Ridderstrale, M., & Groop, L. (2009). Genetic dissection of type 2 diabetes. Molecular and Cellular Endocrinology, 297, 10–17.

    Article  PubMed  Google Scholar 

  • Selman, B. (2008). Computational science: A hard statistical view. Nature, 451, 639–640.

    Article  CAS  PubMed  Google Scholar 

  • Stancáková, A., Javorský, M., Kuulasmaa, T., Haffner, S., Kuusisto, J., & Laakso, M. (2009). Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6414 Finnish men. Diabetes, 58, 1212–1221.

    Article  PubMed  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B, 36, 111–147.

    Google Scholar 

  • Tan, B.-B., Liang, Y.-Z., Yi, L.-Z., Li, H.-D., et al. (2009). Identification of free fatty acids profiling of type 2 diabetes mellitus and exploring possible biomarkers by GC–MS coupled with chemometrics. Metabolomics. doi:10.1007/s11306-009-0189-8.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.

    Google Scholar 

  • Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory System, 58, 109–130.

    Article  CAS  Google Scholar 

  • Wongravee, K., Lloyd, G., Hall, J., Holmboe, M., et al. (2009). Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles. Metabolomics, 5, 387–406.

    Article  CAS  Google Scholar 

  • Xu, Q.-S., & Liang, Y.-Z. (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory System, 56, 1–11.

    Article  CAS  Google Scholar 

  • Zeng, M.-M., Liang, Y.-Z., Li, H.-D., Wang, M., et al. (2010). Plasma metabolic fingerprinting of childhood obesity by GC/MS in conjunction with multivariate statistical analysis. Journal of Pharmaceutical and Biomedical Analysis, 52, 265–272.

    Article  CAS  PubMed  Google Scholar 

  • Zhang, J., Yan, L., Chen, W., Lin, L., et al. (2009). Metabonomics research of diabetic nephropathy and type 2 diabetes mellitus based on UPLC-oaTOF-MS system. Analytica Chimica Acta, 650, 16–22.

    Article  CAS  PubMed  Google Scholar 

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B, 67, 301–320.

    Article  Google Scholar 

Download references

Acknowledgements

This work is financially supported by the National Nature Foundation Committee of P.R. China (Grants No. 20875104, No. 10771217 and No. 20975115), the international cooperation project on traditional Chinese medicines of ministry of science and technology of China (Grant No. 2007DFA40680). The studies meet with the approval of the university’s review board.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Zeng Liang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, HD., Zeng, MM., Tan, BB. et al. Recipe for revealing informative metabolites based on model population analysis. Metabolomics 6, 353–361 (2010). https://doi.org/10.1007/s11306-010-0213-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11306-010-0213-z

Keywords

Navigation