Abstract
Introduction
Metabolomics is a powerful phenotyping tool in nutrition and health research, generating complex data that need dedicated treatments to enrich knowledge of biological systems. In particular, to investigate relations between environmental factors, phenotypes and metabolism, discriminant statistical analyses are generally performed separately on metabolomic datasets, complemented by associations with metadata. Another relevant strategy is to simultaneously analyse thematic data blocks by a multi-block partial least squares discriminant analysis (MBPLSDA) allowing determining the importance of variables and blocks in discriminating groups of subjects, taking into account data structure.
Objective
The present objective was to develop a full open-source standalone tool, allowing all steps of MBPLSDA for the joint analysis of metabolomic and epidemiological data.
Methods
This tool was based on the mbpls function of the ade4 R package, enriched with functionalities, including some dedicated to discriminant analysis. Provided indicators help to determine the optimal number of components, to check the MBPLSDA model validity, and to evaluate the variability of its parameters and predictions.
Results
To illustrate the potential of this tool, MBPLSDA was applied to a real case study involving metabolomics, nutritional and clinical data from a human cohort. The availability of different functionalities in a single R package allowed optimizing parameters for an efficient joint analysis of metabolomics and epidemiological data to obtain new insights into multidimensional phenotypes.
Conclusion
In particular, we highlighted the impact of filtering the metabolomic variables beforehand, and the relevance of a MBPLSDA approach in comparison to a standard PLS discriminant analysis method.
Similar content being viewed by others
Data availability
The tool is developed under the R software and is available via the packMBPLSDA CRAN R package.
References
Ackermann, B. L., Hale, J. E., & Duffin, K. L. (2006). The role of mass spectrometry in biomarker discovery and measurement. Current Drug Metabolism, 7, 525–539.
Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17, 166–173.
Bougeard, S., & Dray, S. (2018). Supervised multiblock analysis in R with the ade4 package. Journal of Statistical Sofware, 86, 1–18.
Bougeard, S., Qannari, E. M., & Rose, N. (2011). Multiblock redundancy analysis: Interpretation tools and application in epidemiology. Journal of Chemometrics, 25, 467–475.
Dray, S., & Dufour, A. (2007). The ade4 package: Implementing the duality diagram for ecologists. Journal of Statistical Software, 22, 1–20.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. New York: Chapman and Hall/CRC.
Gaudreau, P., Morais, J. A., Shatenstein, B., Gray-Donald, K., Khalil, A., Dionne, I., et al. (2007). Nutrition as a determinant of successful aging: Description of the Quebec longitudinal study Nuage and results from cross-sectional pilot studies. Rejuvenation Research, 10, 377–386.
Giacomoni, F., Le Corguille, G., Monsoor, M., Landi, M., Pericard, P., Petera, M., et al. (2015). Workflow4Metabolomics: A collaborative research infrastructure for computational metabolomics. Bioinformatics, 31, 1493–1495.
Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23.
Gunther, O. P., Shin, H., Ng, R. T., McMaster, W. R., McManus, B. M., Keown, P. A., et al. (2014). Novel multivariate methods for integration of genomics and proteomics data: Applications in a kidney transplant rejection study. OMICS: A Journal of Integrative Biology, 18, 682–695.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer Nature.
Pujos-Guillot, E., Brandolini, M., Petera, M., Grissa, D., Joly, C., Lyan, B., et al. (2017). Systems metabolomics for prediction of metabolic syndrome. Journal of Proteome Research, 16, 2262–2272.
R Development Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Ramautar, R., Berger, R., van der Greef, J., & Hankemeier, T. (2013). Human metabolomics: Strategies to understand biology. Current Opinion in Chemical Biology, 17, 841–846.
Rohart, F., Gautier, B., Singh, A., & Le Cao, K. A. (2017). mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Computational Biology, 13, e1005752.
Saccenti, E., Hoefsloot, H. C. J., Smilde, A. K., Westerhuis, J. A., & Hendriks, M. M. W. (2014). Reflections on univariate and multivariate analysis of metabolomics data. Metabolomics, 10, 361–374.
Saporta, G. (2006). Probabilités, analyse de données et statistiques. Paris: Editions Technip.
Singh, A., Gautier, B., Shannon, C., Vacher, M., Rohart, F., Tebbutt, S., & Lê Cao, K. A. (2016). DIABLO: From multi-omics assays to biomarker discovery, an integrative approach. https://doi.org/10.1101/067611.
Steuer, R. (2006). Review on the analysis and interpretation of correlations in metabolomic data. Briefings in Bioinformatics, 7, 151–158.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society B, 36, 111–147.
Tautenhahn, R., Bottcher, C., & Neumann, S. (2008). Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics, 9, 504.
Westerhuis, J. A., Hoefsloot, H. C. J., Smit, S., Vis, D. J., Smilde, A. K., van Velzen, E. J. J., et al. (2008). Assessment of PLSDA cross validation. Metabolomics, 4, 81–89.
Westerhuis, J. A., Kourti, T., & Macgregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12, 301–321.
Wold, S. (Ed.). (1984). Three PLS algorithms according to SW. In Report from the symposium MULTDAST (multivariate data analysis in science and technology) (pp. 26–30). Umeå, Sweden.
Acknowledgements
The authors would like to thank Charlotte Joly for the analyses by LC–MS and Stéphanie Monnerie for the provision of the data. All metabolomics analyses were performed within the metaboHUB French infrastructure (ANR-INBS-0010). The NuAge Study was supported by a research grant from the Canadian Institutes of Health Research (CIRH; MOP-62842). The NuAge Database and Biobank are supported by the Fonds de Recherche du Québec (FRQ; 2020-VICO-279753), the Quebec Network for Research on Aging funded by the Fonds de Recherche du Québec - Santé (FRQS) and by the Merck-Frosst Chair funded by La Fondation de l'Université de Sherbrooke.
Author information
Authors and Affiliations
Contributions
MP, MBB, BC and EPG conceived the study. MBB and SB designed the tool. EP, BC and PG provided the data. MBB analyzed the data. MBB, MP, BC and EPG wrote the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare they have no conflict of interest.
Human and animal rights
All procedures performed in the study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed consent
Informed consent was obtained from all individual participants included in the study. The NuAge Study has been approved by the Research Ethics Board (REB) of both the Geriatric University Institutes of Montreal and Sherbrooke. The management framework of the NuAge Database and Biobank has been approved by the REB of the CIUSSS-de-l'Estrie-CHUS.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Brandolini-Bunlon, M., Pétéra, M., Gaudreau, P. et al. Multi-block PLS discriminant analysis for the joint analysis of metabolomic and epidemiological data. Metabolomics 15, 134 (2019). https://doi.org/10.1007/s11306-019-1598-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11306-019-1598-y