Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology
Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.
KeywordsDiscriminant analyses Distance-based analyses Integrative analyses Metabolomics Multi-block methods Ordination methods
We are very grateful to Bernard Banaigs, Lucie Conchou, Laurent Dormont, Stéphane Greff, Maria Cristina Lorenzi, Thierry Pérez, Bertrand Schatz, Oriol Sacristán-Soriano and Olivier Thomas who kindly provided their data to illustrate the examples, Stéphane Dray and Denis Poinsot for their insightful comments on the manuscript and Zoe Welham for proof reading of the manuscript.
Compliance with Ethical Standards
Conflict of Interest
The authors declare that they have no conflict of interest.
- Allaire J, Cheng J, Xie Y, McPherson J, Chang W, Allen J, Wickham H, Atkins A, Hyndman R, Arslan R (2017) Rmarkdown: dynamic documents for R. R package version 1.6. https://CRAN.R-project.org/package=rmarkdown
- Archunan G (2009) Vertebrate pheromones and their biological importance. J Exp Zool India 12:227–239Google Scholar
- Chessel D, Hanafi M (1996) Analyses de la co-inertie de K nuages de points. Rev Stat Appliquée 44:35–60Google Scholar
- Conchou L, Cabioch L, Rodriguez LJV, Kjellberg F (2014) Daily rhythm of mutualistic pollinator activity and scent emission in Ficus Septica: ecological differentiation between co-occurring pollinators and potential consequences for chemical communication and facilitation of host speciation. PLoS One 9:e103581PubMedPubMedCentralCrossRefGoogle Scholar
- Gower JC (1971) Statistical methods of comparing different multivariate analyses of the same data. In: Tautu P (ed) Mathematics in the archaeological and historical sciences. Edinburgh University Press, Edinburgh, pp 138–149Google Scholar
- Jaccard P (1901) Étude comparative de la distribution florale dans une portion des Alpes et du Jura. Bull Soc Vaud Sci Nat 37:547–579Google Scholar
- Legendre P, Anderson MJ (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 69(1)Google Scholar
- Legendre P, Legendre L (2012) Numerical Ecology. Elsevier, AmsterdamGoogle Scholar
- Leurgans SE, Moyeed RA, Silverman BW (1993) Canonical correlation analysis when the data are curves. J R Stat Soc Ser B Methodol 55:725–740Google Scholar
- Löfstedt T, Trygg J (2011) OnPLS-a novel multiblock method for the modelling of predictive and orthogonal variation. J Chemom 25:441–455Google Scholar
- R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhyā Indian J Stat Ser A 329–358Google Scholar
- Rohart F, Gautier B, Singh A, Le Cao K-A (2017) mixOmics: An R package for ‘omics feature selection and multiple data integration. PLoS Comput Biol 13(11):e1005752Google Scholar
- Singh A, Gautier B, Shannon CP, Vacher M, Rohart F, Tebutt SJ, Le Cao K-A (2016) DIABLO-an integrative, multi-omics, multivariate method for multi-group classification. BioRxiv 067611. https://doi.org/10.1101/067611
- Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438Google Scholar
- Wold H (1985) Partial least squares. In: Kotz S, Johnson N (eds) Encyclopedia of statistical sciences. Wiley, New York, pp 581–591Google Scholar
- Wold S, Martens H, Wold H (1983) The multivariate calibration problem in chemistry solved by the PLS method. In Matrix Pencils, (Springer), pp. 286–293Google Scholar