Abstract
Compositional data are commonly used in chemical ecology to describe the biological role of chemical compounds in communication, defense or other behavioral modifications. Statistical analyses of compositional data, however, are challenging due to several constraints (e.g., constant sum constraint). We use an ontogenetic series of defensive gland secretions from larvae, three nymphal stages and adults of the oribatid model species Archegozetes longisetosus as a typical chemo-ecological data set to prepare a practical guide for compositional data analyses in chemical ecology. We compare various common and less common statistical and ordination methods to depict small quantitative and/or qualitative differences in compositional datasets: principal component analysis (PCA), non-metric multidimensional scaling (NMDS), multivariate statistical tests (Anderson’s permutational multivariate analyses of variance = PERMANOVA; permutational analyses of multivariate dispersions = PERMDIPS), linear discriminant analysis (LDA), the data mining algorithm Random Forests, bipartite network analysis and dynamic range boxes (dynRB). We summarize which methods are suitable for different research questions and how data needs to be structured and pre-processed. Network analyses and dynamic range boxes are promising tools for analyzing compositional data beyond the “classical” methods and provide additional information.
Similar content being viewed by others
References
Aitchison J (1982) The Statistical-analysis of compositional data. J Roy Stat Soc B Met 44:139–177
Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London
Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46. doi:10.1111/j.1442-9993.2001.01070.pp.x
Anderson MJ (2005) PERMANOVA: a FORTRAN computer program for permutational multivariate analysis of variance. Department of Statistics, University of Auckland, Auckland
Anderson MJ (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62:245–253. doi:10.1111/j.1541-0420.2005.00440.x
Anderson MJ, Walsh DCI (2013) PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing? Ecol Monogr 83:557–574. doi:10.1890/12-2010.1
Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693. doi:10.1111/j.1461-0248.2006.00926.x
Anderson MJ, Gorley RN, Clarke KR (2008) PERMANOVA + for PRIMER: guide to software and statistical methods. PRIMER-E, Plymouth
Bacon-Shone J (2011) A Short History of Compositional Data Analysis. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. John Wiley & Sons Ltd, Chichester, pp 2–11
Bischoff M, Jürgens A, Campbell DR (2014) Floral scent in natural hybrids of Ipomopsis (Polemoniaceae) and their parental species. Ann Bot-London 113:533–544. doi:10.1093/aob/mct279
Blüthgen N (2010) Why network analysis is often disconnected from community ecology: a critique and an ecologist’s guide. Basic Appl Ecol 11:185–195
Blüthgen N, Menzel F, Blüthgen N (2006a) Measuring specialization in species interaction networks. BMC Ecol 6:9. doi:10.1186/1472-6785-6-9
Blüthgen N, Mezger D, Linsenmair KE (2006b) Ant-hemipteran trophobioses in a Bornean—rainforest diversity, specificity and monopolisation. Insectes Soc 53:194–203. doi:10.1007/s00040-005-0858-1
Blüthgen N, Menzel F, Hovestadt T, Fiala B, Blüthgen N (2007) Specialization, constraints, and conflicting interests in mutualistic networks. Curr Biol 17:341–346. doi:10.1016/j.cub.2006.12.039
Bray JR, Curtis JT (1957) An ordination of the upland forest communities of southern wisconsin. Ecol Monogr 27:326–349
Breiman L (2001) Random Forests. Mach Learn 45:5–32. doi:10.1023/A:1010933404324
Brückner A, Heethoff M (2016) Scent of a mite: origin and chemical characterization of the lemon-like flavor of mite-ripened cheeses. Exp Appl Acarol 69:249–261. doi:10.1007/s10493-016-0040-7
Craig A, Cloarec O, Holmes E, Nicholson JK, Lindon JC (2006) Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal Chem 78(7):2262–2267
Dormann CF, Fründ J, Blüthgen N, Gruber B (2008a) Indices, graphs and null models: analyzing bipartite ecological networks. Open Ecol J 2:7–24
Dormann CF, Fründ J, Gruber B (2008b) Introducing the bipartite package: analysing ecological networks. R News 8:8–11
Emery VJ, Tsutsui ND (2016) Differential sharing of chemical cues by social parasites versus social mutualists in a three-species symbiosis. J Chem Ecol 42:277–285. doi:10.1007/s10886-016-0692-0
Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20:621–632. doi:10.1002/env.966
Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh
Goodpaster AM, Kennedy MA (2011) Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies. Chemometr Intell Lab Syst 109(2):162–170. doi:10.1016/j.chemolab.2011.08.009
Hair JF, Black WC, Babin BJ, Anderson RE (2009) Multivariate data analysis: a global perspective, 7th edn. Prentice Hall, New York
Heethoff M, Raspotnig G (2011) Is 7-hydroxyphthalide a natural compound of oil gland secretions?—Evidence from Archegozetes longisetosus (Acari, Oribatida). Acarologia 51:229–236. doi:10.1051/acarologia/20112004
Heethoff M, Raspotnig G (2012) Expanding the ‘enemy-free space’ for oribatid mites: evidence for chemical defense of juvenile Archegozetes longisetosus against the rove beetle Stenus juno. Exp Appl Acarol 56(2):93–97. doi:10.1007/s10493-011-9501-1
Heethoff M, Laumann M, Bergmann P (2007) Adding to the reproductive biology of the parthenogenetic oribatid mite, Archegozetes longisetosus (Acari, Oribatida, Trhypochthoniidae). Turk J Zool 31:151–159
Holland SM (2008) Non-metric multidimensional scaling (MDS). (online document) https://strata.uga.edu/software/pdf/mdsTutorial.pdf
Hutchinson GE (1957) Concluding remarks. Cold Spring Harb Symp Quant Biol 22:415–427
Jolliffe IT (2002) Principal component analysis. Springer Group, Heidelberg
Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11:94. doi:10.1186/1471-2156-11-94
Junker RR, Keller A (2015) Microhabitat heterogeneity across leaves and flower organs promotes bacterial diversity. FEMS Microbiol Ecol 91:97. doi:10.1093/femsec/fiv097
Junker RR, Loewel C, Gross R, Dötterl S, Keller A, Blüthgen N (2011) Composition of epiphytic bacterial communities differs on petals and leaves. Plant Biol 13:918–924. doi:10.1111/j.1438-8677.2011.00454.x
Junker RR, Kuppler J, Bathke AC, Schreyer ML, Trutschnig W (2016) Dynamic range boxes—A robust non-parametric approach to quantify size and overlap of n dimensional hypervolumes. Methods Ecol Evol. doi:10.1111/2041-210X.12611
Kohl SM, Klein MS, Hochrein J, Oefner PJ, Spang R, Gronwald W (2012) State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 8:146–160. doi:10.1007/s11306-011-0350-z
Kriesell L, Hilpert A, Leonhardt SD (2016) Different but the same: bumblebee species collect pollen of different plant sources but similar amino acid profiles. Apidologie. doi:10.1007/s13592-016-0454-6
Kruskal JB (1964) Multidimensional-scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29:1–27. doi:10.1007/Bf02289565
Kucera M, Malmgren BA (1998) Logratio transformation of compositional data—a resolution of the constant sum constraint. Mar Micropaleontol 34:117–120. doi:10.1016/S0377-8398(97)00047-9
Lachenbruch PA, Goldstein M (1979) Discriminant Analysis. Biometrics 35(1):69–85. doi:10.2307/2529937
Leonhardt SD, Blüthgen N (2012) The same, but different: pollen foraging in honeybee and bumblebee colonies. Apidologie 43:449–464. doi:10.1007/s13592-011-0112-y
Leonhardt SD, Schmitt T, Blüthgen N (2011) Tree resin composition, collection behavior and selective filters shape chemical profiles of tropical bees (Apidae: meliponini). PLoS One. doi:10.1371/journal.pone.0023445
Liaw A, Wiener M (2002) Classification and regression by Random Forest. R News 2:18–22
Lorenzi MC, Azzani L, Bagneres AG (2014) Evolutionary consequences of deception: complexity and informational content of colony signature are favored by social parasitism. Curr Zool 60:137–148
Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P (2004) Screening large-scale association study data: exploiting interactions using Random Forests. BMC Genet 5:32. doi:10.1186/1471-2156-5-32
Martin S, Drijfhout F (2009) A review of ant cuticular hydrocarbons. J Chem Ecol 35:1151–1161. doi:10.1007/s10886-009-9695-4
Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Math Geol 35:253–278. doi:10.1023/A:1023866030544
Mathis KA, Tsutsui ND (2016) Cuticular hydrocarbon cues are used for host acceptance by Pseudacteon spp. Phorid Flies that attack Azteca sericeasur ants. J Chem Ecol 42:286–293. doi:10.1007/s10886-016-0694-y
Menzel F, Orivel J, Kaltenpoth M, Schmitt T (2014) What makes you a potential partner? Insights from convergently evolved ant–ant symbioses. Chemoecology 24:105–119. doi:10.1007/s00049-014-0149-2
Minchin PR (1987) An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69:89–107. doi:10.1007/Bf00038690
Mitchell L (2011) A parallel Random Forest implementation for R. Technical report, EPCC
Næs T, Mevik BH (2001) Understanding the collinearity problem in regression and discriminant analysis. J Chemometrics 15:413–426. doi:10.1002/cem.676
Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H (2015) Vegan: community ecology package R package version 2: 3–5. http://CRAN.R-project.org/package=vegan
Palarea-Albaladejo J, Martin-Fernandez JA (2015) zCompositions—R package for multivariate imputation of left-censored data under a compositional approach. Chemom Intell Lab Syst 143:85–96
Ranganathan Y, Borges RM (2011) To transform or not to transform: that is the dilemma in the statistical analysis of plant volatiles. Plant Signal Behav 6:113–116. doi:10.4161/psb.6.1.14191
Reyment RA (1989) Compositional data analysis. Terra Nova 1:29–34. doi:10.1111/j.1365-3121.1989.tb00322.x
Sakata T, Norton RA (2003) Opisthonotal gland chemistry of a middle-derivative oribatid mite, Archegozetes longisetosus (Acari: trhypochthoniidae). Int J Acarol 29:345–350
Simpson GL, Oksanen J (2016) Analogue: analogue matching and modern analogue technique transfer function models. R package version 017-0. http://cranr-project.org/package=analogue
Sledge MF, Moneti G, Pieraccini G, Turillazzi S (2000) Use of solid-phase microextraction in the investigation of chemical communication in social wasps. J Chrom A 873:73–77. doi:10.1016/S0021-9673(99)01176-0
Späthe A, Reinecke A, Olsson SB, Kesavan S, Knaden M, Hansson BS (2013) Plant species- and status-specific odorant blends guide oviposition choice in the moth Manduca sexta. Chem Senses 38:147–159. doi:10.1093/chemse/bjs089
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random Forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comp Sci 43:1947–1958. doi:10.1021/Ci034160g
R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria ISBN 3-900051-07-0, URL http://www.R-project.org
van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genom 7:142. doi:10.1186/1471-2164-7-142
van den Boogaart KG, Tolosana R, Bren M (2014) Compositions: compositional data analysis. R package version 1.40-1. http://CRANR-project.org/package=compositions
van der Maarel E, Franklin J (2013) Vegetation Ecology. Wiley-Blackwell, New York
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. ISBN 0-387-95457-0
Wagner D, Brown MJF, Broun P, Cuevas W, Moses LE, Chao DL, Gordon DM (1998) Task-related differences in the cuticular hydrocarbon composition of harvester ants, Pogonomyrmex barbatus. J Chem Ecol 24:2021–2037
Wagner D, Tissot M, Cuevas W, Gordon DM (2000) Harvester ants utilize cuticular hydrocarbons in nestmate recognition. J Chem Ecol 26:2245–2257. doi:10.1023/A:1005529224856
Wehner K, Norton RA, Blüthgen N, Heethoff M (2016) Specialization of oribatid mites to forest microhabitats—the enigmatic role of litter. Ecosphere. doi:10.1002/ecs2.1336
Weiner CN, Werner M, Linsenmair KE, Blüthgen N (2014) Land-use impacts on plant-pollinator networks: interaction strength and specialization predict pollinator declines. Ecology 95:466–474
Weiss I, Ruther J, Stökl J (2015) Species specificity of the putative male antennal aphrodisiac pheromone in Leptopilina heterotoma, Leptopilina boulardi, and Leptopilina victoriae. Biomed Res Int. doi:10.1155/2015/202965
Wilkinson L (2002) Multidimensional scaling. Systat 10 2 Statistics II, Systat Software, Richmond: 119–145
Worley B, Halouska S, Powers R (2012) Utilities for quantifying separation in PCA/PLS-DA scores plots. Anal Biochem 433(2):102–104. doi:10.1016/j.ab.2012.10.011
Wurdack M, Herbertz S, Dowling D, Kroiss J, Strohm E, Baur H, Niehuis O, Schmitt T (2015) Striking cuticular hydrocarbon dimorphism in the mason wasp Odynerus spinipes and its possible evolutionary cause (Hymenoptera: chrysididae, Vespidae). P Roy Soc B-Biol Sci. doi:10.1098/rspb.2015.1777
Acknowledgements
Adrian Brückner is supported by a PhD scholarship from the German National Academic Foundation (Studienstiftung des deutschen Volkes). We thank Klaus Birkhofer (Lund University), Robert R. Junker (University of Salzburg) and Nico Blüthgen (TU Darmstadt) for discussing PERMANOVA/PERMDISP, dynamic range boxes and the network approach with us. We further thank Lukas Kauling for experimental assistance and carrying out a preliminary experiment. This study was partly funded by the German Research Foundation (DFG, HE 4593/5-1).
Author information
Authors and Affiliations
Corresponding author
Additional information
Handling Editor: Marko Rohlfs.
An erratum to this article is available at http://dx.doi.org/10.1007/s00049-016-0228-7.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Brückner, A., Heethoff, M. A chemo-ecologists’ practical guide to compositional data analysis. Chemoecology 27, 33–46 (2017). https://doi.org/10.1007/s00049-016-0227-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00049-016-0227-8