Skip to main content

Advertisement

Log in

Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data

  • Software/Database
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure.

Objective

The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data.

Methods

This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions.

Results

To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes.

Conclusion

In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The tool is developed under the R software and is available via the packMBPLSDA CRAN R package.

References

  • Ackermann, B. L., Hale, J. E., & Duffin, K. L. (2006). The role of mass spectrometry in biomarker discovery and measurement. Current Drug Metabolism, 7, 525–539.

    Article  CAS  Google Scholar 

  • Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173.

    Article  CAS  Google Scholar 

  • Bougeard, S., & Dray, S. (2018). Supervised multiblock analysis in R with the ade4 package. Journal of Statistical Sofware, 86, 1–18.

    Google Scholar 

  • Bougeard, S., Qannari, E. M., & Rose, N. (2011). Multiblock redundancy analysis: Interpretation tools and application in epidemiology. Journal of Chemometrics, 25, 467–475.

    Article  CAS  Google Scholar 

  • Dray, S., & Dufour, A. (2007). The ade4 package: Implementing the duality diagram for ecologists. Journal of Statistical Software, 22, 1–20.

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. New York: Chapman and Hall/CRC.

    Google Scholar 

  • Gaudreau, P., Morais, J. A., Shatenstein, B., Gray-Donald, K., Khalil, A., Dionne, I., et al. (2007). Nutrition as a determinant of successful aging: Description of the Quebec longitudinal study Nuage and results from cross-sectional pilot studies. Rejuvenation Research, 10, 377–386.

    Article  CAS  Google Scholar 

  • Giacomoni, F., Le Corguille, G., Monsoor, M., Landi, M., Pericard, P., Petera, M., et al. (2015). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics, 31, 1493–1495.

    Article  CAS  Google Scholar 

  • Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23.

    Article  CAS  Google Scholar 

  • Gunther, O. P., Shin, H., Ng, R. T., McMaster, W. R., McManus, B. M., Keown, P. A., et al. (2014). Novel multivariate methods for integration of genomics and proteomics data: Applications in a kidney transplant rejection study. OMICS: A Journal of Integrative Biology, 18, 682–695.

    Article  Google Scholar 

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer Nature.

    Book  Google Scholar 

  • Pujos-Guillot, E., Brandolini, M., Petera, M., Grissa, D., Joly, C., Lyan, B., et al. (2017). Systems metabolomics for prediction of metabolic syndrome. Journal of Proteome Research, 16, 2262–2272.

    Article  CAS  Google Scholar 

  • R Development Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

  • Ramautar, R., Berger, R., van der Greef, J., & Hankemeier, T. (2013). Human metabolomics: Strategies to understand biology. Current Opinion in Chemical Biology, 17, 841–846.

    Article  CAS  Google Scholar 

  • Rohart, F., Gautier, B., Singh, A., & Le Cao, K. A. (2017). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Computational Biology, 13, e1005752.

    Article  Google Scholar 

  • Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A., & Hendriks, M. M. W. (2014). Reflections on univariate and multivariate analysis of metabolomics data. Metabolomics, 10, 361–374.

    Article  CAS  Google Scholar 

  • Saporta, G. (2006). Probabilités, analyse de données et statistiques. Paris: Editions Technip.

    Google Scholar 

  • Singh, A., Gautier, B., Shannon, C., Vacher, M., Rohart, F., Tebbutt, S., & Lê Cao, K. A. (2016). DIABLO: From multi-omics assays to biomarker discovery, an integrative approach. https://doi.org/10.1101/067611.

  • Steuer, R. (2006). Review on the analysis and interpretation of correlations in metabolomic data. Briefings in Bioinformatics, 7, 151–158.

    Article  CAS  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36, 111–147.

    Google Scholar 

  • Tautenhahn, R., Bottcher, C., & Neumann, S. (2008). Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics, 9, 504.

    Article  Google Scholar 

  • Westerhuis, J. A., Hoefsloot, H. C. J., Smit, S., Vis, D. J., Smilde, A. K., van Velzen, E. J. J., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4, 81–89.

    Article  CAS  Google Scholar 

  • Westerhuis, J. A., Kourti, T., & Macgregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321.

    Article  CAS  Google Scholar 

  • Wold, S. (Ed.). (1984). Three PLS algorithms according to SW. In Report from the symposium MULTDAST (multivariate data analysis in science and technology) (pp. 26–30). Umeå, Sweden.

Download references

Acknowledgements

The authors would like to thank Charlotte Joly for the analyses by LC–MS and Stéphanie Monnerie for the provision of the data. All metabolomics analyses were performed within the metaboHUB French infrastructure (ANR-INBS-0010). The NuAge Study was supported by a research grant from the Canadian Institutes of Health Research (CIRH; MOP-62842). The NuAge Database and Biobank are supported by the Fonds de Recherche du Québec (FRQ; 2020-VICO-279753), the Quebec Network for Research on Aging funded by the Fonds de Recherche du Québec - Santé (FRQS) and by the Merck-Frosst Chair funded by La Fondation de l'Université de Sherbrooke.

Author information

Authors and Affiliations

Authors

Contributions

MP, MBB, BC and EPG conceived the study. MBB and SB designed the tool. EP, BC and PG provided the data. MBB analyzed the data. MBB, MP, BC and EPG wrote the manuscript. All authors read and approved the manuscript.

Corresponding author

Correspondence to Marion Brandolini-Bunlon.

Ethics declarations

Conflict of interest

All authors declare they have no conflict of interest.

Human and animal rights

All procedures performed in the study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was obtained from all individual participants included in the study. The NuAge Study has been approved by the Research Ethics Board (REB) of both the Geriatric University Institutes of Montreal and Sherbrooke. The management framework of the NuAge Database and Biobank has been approved by the REB of the CIUSSS-de-l'Estrie-CHUS.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1413 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brandolini-Bunlon, M., Pétéra, M., Gaudreau, P. et al. Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics 15, 134 (2019). https://doi.org/10.1007/s11306-019-1598-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-019-1598-y

Keywords

Navigation