This chapter outlines the commonest multidimensional analysis used in ecology, ethnobiology, and conservation. After mastering how to create a scientific research workflow based on a problem-based platform (Chap. 7, in this book), readers will learn how to visualize complex datasets (e.g., species or ethnospecies lists, several social-political or environmental variables, and so on), and test multivariate hypotheses. We provide a full reproducible example of every analysis discussed in this chapter as an Online Material. We further encourage students and researchers to move from a “description-based multidimensional analysis” (such as using unconstrained ordination, like PCA) to an explicit hypothesis-testing framework that can greatly improve learning and research programs.
- Data analysis
- Hypothesis testing
- Scientific research workflow
This is a preview of subscription content, access via your institution.
Borcard D, Gillet F, Legendre P (2018) Numerical ecology with R, 2nd edn. Springer, New York
Gower JC, Legendre P (1986) Metric and euclidean properties of dissimilarity coefficients. J Classif 3:5–48
Dolédec S, Chessel D, ter Braak CJF, Champely S (1996) Matching species traits to environmental variables: a new three-table ordination method. Environ Ecol Stat 3:143–166
Legendre P, Legendre L (2012) Numerical ecology, 3rd English edn. Elsevier, Amsterdam
Chase JM, Leibold MA (2003) Ecological niches: linking classical and contemporary approaches. University of Chicago Press, Chicago
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MH, Szoecs E, Wagner E (2018) Vegan: community ecology package. R package version 2.5-1. https://CRAN.R-project.org/package=vegan
Dray S, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque G, Legendre P, Madi N, Wagner HH (2017) Adespatial: multivariate multiscale spatial analysis. R package version 0.0-9. https://CRAN.R-project.org/package=adespatial
Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22:1–20
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280
Legendre P, Borcard D (2018) Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41:1–5
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572
Rodrigues A, Bones F, Schneiders A, Oliveira L, Vibrans A, Gasper A (2018) Plant trait dataset for tree-like growth forms species of the subtropical Atlantic Rain Forest in Brazil. Data 3:16
Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3:217–223
Jombart T, Dray S (2010) Adephylo: exploratory analyses for the phylogenetic comparative method. Bioinformatics 26:1907–1909
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
Pavoine S, Dufour AB, Chessel D (2004) From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. J Theor Biol 228:523–537
Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge
Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems: data exploration. Methods Ecol Evol 1:3–14
Hadi AS, Ling RF (1998) Some cautionary notes on the use of principal components regression. Am Stat 52:15–19
Boaratti AZ, Silva FR (2015) Relationships between environmental gradients and geographic variation in the intraspecific body size of three species of frogs (Anura). Austral Ecol 40:869–9765
Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25:1965–1978
Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297
Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693
Legendre P, Galzin R, Harmelin-Vivien ML (1997) Relating behavior to habitat: solutions to the fourth-corner problem. Ecology 78:547–562
Garnier E, Cortez J, Billès G, Navas ML, Roumet C, Debussche M, Laurent G, Blanchard A, Aubry D, Bellmann A, Neill C, Toussaint JP (2004) Plant functional markers capture ecosystem properties during secondary succession. Ecology 85:2630–2637
Lavorel S, Grigulis K, McIntyre S, Williams NSG, Garden D, Dorrough J, Berman S, Quétier F, Thébault A, Bonis A (2007) Assessing functional diversity in the field – methodology matters! Funct Ecol 22:134–147
Peres-Neto PR, Dray S, ter Braak CJF (2017) Linking trait variation to the environment: critical issues with community-weighted mean correlation resolved by the fourth-corner approach. Ecography 40:806–816
ter Braak CJF, Peres-Neto P, Dray S (2017) A critical issue in model-based inference for studying trait-based community assembly and a solution. PeerJ 5:e2885
Rigal F, Cardoso P, Lobo JM, Triantis KA, Whittaker RJ, Amorim IR, Borges PAV (2018) Functional traits of indigenous and exotic ground-dwelling arthropods show contrasting responses to land-use change in an oceanic island, Terceira, Azores. Divers Distrib 24:36–47
Kleyer M, Dray S, Bello F, Lepš J, Pakeman RJ, Strauss B, Thuiller W, Lavorel S (2012) Assessing species and community functional responses to environmental gradients: which multivariate methods? J Veg Sci 23:805–821
Solé-Senan XO, Juárez-Escario A, Conesa JA, Recasens J (2018) Plant species, functional assemblages and partitioning of diversity in a Mediterranean agricultural mosaic landscape. Agric Ecosyst Environ 256:163–172
Laliberté E, Legendre P, Shipley B (2014) FD: measuring functional diversity from multiple traits, and other tools for functional ecology. R package version 1, pp 0–12
ter Braak CJF (1987) Ordination. In: RHG J, CJF T, OFR V (eds) Data analysis in community and landscape ecology. Cambridge University Press, Cambridge, pp 91–173 Pudoc, Wageningen, The Netherlands. Reissued in 1995
Dray S, Pélissier R, Couteron P, Fortin M, Legendre P, Peres-Neto PR, Bellier E, Bivand R, Blanchet FG, De Cáceres M, Dufour A, Heegaard E, Jombart T, Munoz F, Oksanen J, Thioulouse J, Wagner HH (2012) Community ecology in the age of multivariate multiscale spatial analysis. Ecol Monogr 82:257–275
Vasconcelos T, Santos T, Haddad C, Rossa-Feres DC (2010) Climatic variables and altitude as predictors of anuran species richness and number of reproductive modes in Brazil. J Trop Ecol 26:423–432
Hecnar SJ, M’Closke RT (1996) Regional dynamics and the status of amphibians. Ecology 77:2091–2097
de Cáceres MD, Legendre P (2009) Associations between species and groups of sites: indices and statistical inference. Ecology 90:3566–3574
Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol Monogr 67:345–366
McGeoch MA, Van Rensburg BJ, Botes A (2002) The verification and application of bioindicators: a case study of dung beetles in savanna ecosystem. J Appl Ecol 39:661–672
Tejeda-Cruz C, Mehltreter K, Sosa VJ (2008) Indicadores ecológicos multi-taxonómicos. In: Manson RH, Hernández-Ortiz V, Gallina S, Mehltreter K (eds) Agroecosistemas cafetaleros de Veracruz: Biodiversidad, Manejo y Conservación. INECOL – INE-SEMARNAT, México, pp 271–278
Roberts DW (2016) labdsv: ordination and multivariate analysis for ecology. R package version 1.8-0. https://CRAN.R-project.org/package=labdsv
Mayo DG (2018) Statistical inference as severe testing: how to get beyond the statistics wars. Cambridge Univ. Press, Cambridge
Taper ML, Lele S (2004) The nature of scientific evidence: statistical, philosophical, and empirical considerations. University of Chicago Press, Chicago
Salsburg D (2002) The lady tasting tea: how statistics revolutionized science in the twentieth century. Holt, New York, NY
Editors and Affiliations
1 Appendix for the Chapter Multidimensional Analysis for Testing Ecological, Ethnobiological, and Conservation Hypothesis
Thiago Gonçalves-Souza, Michel V. Garey, Fernando R. da Silva, Ulysses P. Albuquerque, and Diogo B. Provete.
0.1 Loading the required packages
1 Unconstrained ordination
1.0.1 Principal Component Analysis (PCA)
1.0.2 Principal Coordinate Analysis
1.0.3 Principal Components Regression
2.0.1 EXAMPLE 1
2.0.2 EXAMPLE 2
3 Community-Weighted Mean (CWM)
4 Constrained ordination
5.0.1 Indicator value index
0.1 Loading the Required Packages
1 Unconstrained Ordination
Here, we will firstly use examplar datasets available in the R package datasets that already comes with any R installation.
Let’s look at the data. This dataset contains the masurement of sepals and petals of 50 flowers for each of the tree species of the genus Iris. You can see more by using /Library/Frameworks/R.framework/Versions/3.5/Resources/library/datasets/help/iris.
Next, we will run a PCA with a few packages that offer different outputs.
Finally, the plot:
The package factoextra has many nice plot capabilities. Here, we will plot the equilibrium circle to check the relative contribution of each variable to the formation of the axes:
The variables more or less contributed the same to the formation of the ordination axes.
And finally, the same analysis with the
The advantages of this package include: dealing with missing data and ability to easily plot the centroid of categorical variable in the ordination diagram. Let’s see how it runs:
As you can see, it automatically returns the biplot with the equilibrium circle and the individual factor map plotted with the centroid corresponding to the mean of each species in each dimension. With the equilibrium circle you can judge with each individual variable contributed more or less, depending on the length of the vector (arrow), to the formation of the axes in the reduced space. In this example, all four variables contributed more or less equally (see p. 447 of Legendre & Legendre 2012).
And we can easily see the correlation of each variable with the first two axes. This is useful when you want to use the axes in subsequent analyses, and ideally have to interpret how axes are summing up the information:
We will import a dataset published by Rodrigues et al. (2018) Plant Trait Dataset for Tree-Like Growth Forms Species of the Subtropical Atlantic Rain Forest in Brazil. Data 3(2), 16; https://doi.org/10.3390/data3020016. Available here . The dataset presents traits of leaf, branch, maximum potential height, seed mass, and dispersion syndrome of 117 tree species. For the sake of simplicity, we will use just a few traits and categorize half of the species as being used by local people and the other half not used.
Before jumping in and analyzing the data right away, let’s first take a closer look at it to better understand it. Let’s quickly check the relationship among variables with a correlation matrix:
Another ke aspect in any dataset is the amoung of missing values, usually coded as NAs.
As you can see, there’re a few data missing. So let’s choose variables with as few missing values as possible that are important to test our hypothesis.
Remember that FactoMineR::PCA can handle missing values by replacing them with the mean value of the column. But here, for the sake of simplicity we will just remove them:
Let’s visualize our new, reduced data set:
Now we will create a vector to categorize half of the species as preferred and the other half as nonpreferred, in order to test our initial hypothesis. Of course, this is just to demonstrate how the method works and doesn’t mean that those species are really used for any medicinal purpose.
Since, values were measured in different scales, we have to standardize them before running a PCA. Luckily, this is done by default internally in the function dudi.pca, arguments center = TRUE and scale=TRUE.
Now, the fun part, let’s see the results.
First, a basic graph showing the results, disconsidering the groups:
Let’s see how variables contributed to the formation of the axes:
Plant height (Hpot95_m) and Bark prop were not very useful for the formation of the axes. Here, the angle of the arrows informes about the correlation of that variable with a given axis. For example, SLA is closely related to axis 1, meaning that this variable contributed very much to the formation of the axis, but not much to the second axis. Interestingly, Hpot95_m is contributing equally to the formation of both axes. This means that data points (species) arranged to the right of the ordination plot (positive side) have a large SLA and are tall. Similarly, thos to the left are small trees and have small SLA, but have tick leaves, large seeds, and high BarkProp.
Now, what about the preferred vs. nonPreferred species?
We can see that our hypothesis (see main text) was shamefully rejected, since the morphology of the preferred was not different from the nonPreferred plants. Of course, again, this example was just to illustrate how the method works and are not based on any real dataset aimed to test this hypothesis.
A cleaner graph without the row labels allow us to better see that the groups totally overlap:
Here, we will use the plant species trait data available in the package cluster to illustrate how a PCoA works. This dataset contains 31 morphological and reproductive variables for 136 plant species along a urbanization gradient. Let’s begin by loading the data:
Notice that the there’re many traits coded as semiquantitative variables (ordered factor in R language), binary variables, and two continuous variables. Here, we will use the dist.ktab of the package ade4 to calculate the Gower dissimilarity coefficient. First, we have to inform the function the nature of the variables
Notice that according to the Broken stick criterion, the first 12 eigenvectors should be selected for further use, e.g., in a Principal Components Regression.
Let’s look at the biplot:
The relationship between environmental gradients and patterns of geographic variation in intraspecific body size of Boana faber
Hypothesis: environmental gradients drive intraspecific body size of Boana faber
Dependent variable: body size of Boana faber
Independent variable: environmental variables (e.g., temperature and precipitation)
Let’s begin by importing the files
The first step is to implement the Principal Component Analysis (PCA), and then store the eigenvectors
We can see that the first axis explain 0.47% of the variation, while the second axis explain 0.28%, together they explain 0.75%.
Plotting the results in a biplot
We’ll select the first two axes of PCA to further use
Now, we’ll build several models of Principal Component Regressions (PCR) with different predictor variables for later use:
Using the Akaike Information Criteria corrected for small sample sizes (AICc) to select the best model
The model containing only the first axis of PCA is the most parcimonious with the lowest AICc value.
Now, let’s visualize the results:
The relationship between the number of fish species known by fishermen and their personal experiences
Hypothesis: personal experiences influence the number of fish species known by local fishermenDependent variable: number of fish species *Independent variable: personal experience (e.g., age, time as fisherman, number of fishing techniques used, if his father was also a fisherman)
Principal Component Analysis (PCA)
We can see that The first axis explain 0.57% of the variation, while the second axis explain 0.21%, together they explain 0.78%.
Visualizing the results:
Retaining the first two axes of PCA:
Building models of Principal Component Rrgressions (PCR) with different predictor variables
Using Akaike Information Criteria corrected for small samples (AICc) to select the best model
We can see that the model containing only the first PC is the most parcimonious one with the lowest AICc value.
Now, let’s visualize the results:
It is an illustrative dataset that examines the hypothesis that land use alters the arthropod specie composition in the Cerrado biome.
Hypothesis: land use alters the specie composition
Dependent variable: arthropod species
Independent variable: areas (pastures, sugarcanes, and Cerrado protected areas)
Importing the data with arthropod species composition:
To run a PERMANOVA, we first have to calculating a dissimilarity measure. Here, we’ll use the percentage different (aka. Bray-Curtis coefficient). We needd to exclude the first colum of data with treatments, that’s why the data[,-1].
PERMANOVA analysis uses the dissimilarity matrix as response variable and the column containing the treatments as predictor:
The results show that arthropod species composition are different among treatments with R2 = 0.64; and P = 0.001. Now, we’ll perform a post-hoc test to find which treatments are different between them. For this, we will use the function pairwise_adonis from the package pairwiseAdonis, available in GitHub .
The results show that all areas are harboring different species composition. Now, let’s see the results using a Non-Metric Multidimensional Scaling (nMDS), which is the non-metric version of PCoA. This method will not be covered in the chapter.
Knowledge about plant species with medicinal uses by people living along an urban-rural gradient.
Hypothesis: Urbanization decrease the knowledge of people about plants with medicinal uses
Dependent variable: plant species
Independent variable: people living in different areas (urban, periurban, and rural)
Loading the data with species composition of medicinal plants along urbanization gradient:
This time, we will use the Jaccard index of dissimilarity.
Running the PERMANOVA
The results show that knowledge about plant species is different across the gradient with R2 = 0.47; and P = 0.001. Now, we’ll run a post-hoc test to find which treatments are different between them:
The results show that knowledge about plant species composition by people living in rural areas is different from people living in periurban and urban areas, but it is not different between people living in periurban and urban areas.
Performing Non-Metric Multidimensional Scaling (nMDS) to visualize the results:
The figure shows the difference in knowldege about plant species composition with medicinal uses.
3 Community-Weighted Mean (CWM)
Importing the datasets:
Preparing the data:
Running the CWM
Common errors while using the function functcomp: avoid using data.frame - so, use “as.matrix”. Species names are not the same (spell or order) in T and Y; so, use the following code to double check if they are correct. The result values should be all TRUE
Combining PERMANOVA with CWM to test hypothesis
Details about the permutation procedure can be found here
Here, we’ll run the permutations by strata to avoid pseudoreplication. First, let’s create the strata argument to run in the adonis function
Standardize environmental variables before the PERMANOVA if they are in different scales. You can do that directly with dplyr::mutate
Lastly, run the permanova with adonis function
Check the homogeneity of multivariate dispersions:
Visualize the results with a PCoA:
You can check which variables are correlated with the first two axes by using:
4 Constrained Ordination
We will use two exampled to illustrate how an RDA works.
Illustrative dataset that examines species-environment relationship.
Hypothesis—The anurans species composition changes in response to environmental gradients.
Dependent variable—anurans species
Independent variable—environmental descriptors of habitat
First, let’s import all the data:
Then, let’s begin processing the data. We have to implement Hellinger transformation to ensure the matrix has Euclidean properties, because we’re dealing with species abundance with potentially many zeros.
The environmental variables were measured in different scales. So, we have to standardize them before consucting any analysis, to make them comparable:
Finally, let’s run an RDA
Calculating the P-value for the RDA model:
*P-value of the global model:
*P-value of rda axes:
*P-value of environmental variables:
Finally, let’s plot the results. This figure shows the variation in species composition constrained by environmental gradients
Illustrative dataset that tests whether cultivated medicinal plants change in relation to socioeconomic characteristics of households
Hypothesis:Plant species with medicinal uses changes in response to socioeconomic characteristics
Dependent variable: plant species
Independent variable: socioeconomic characteristics of the household
Let’s begin by importing the data:
Now, running the RDA
Calculating P value for the RDA model:
*P-value of global model
*P-value of rda axis
*P-value of environmental variables
Illustrative dataset that evaluate which Ephemeroptera species are bioindicators of environmental quality.
Hypothesis: Habitat contamination will affect different species resulting in the exclusion of some species in contaminated environments.
Dependent variable: Ephemeroptera species
Independent variable: preserved and anthropized streams
Let’s begin by importing the data:
We’ll divide the data into two parts, with a factor for preserved and modified areas:
Finally, let’s calculate the IndVal:
Illustrative dataset that examines the phytotherapy used by indigenous and non-indigenous populations
Hypothesis - Analyze if medicinal plants used to treat some illnesses are the same among indigenous populations and immigrants from the Southern, Brazil Dependent variable - phytotherapy species *Independent variable - populations (indigenous and European immigrants)
In this example we will used another package to calculate IndVal, which generates a different output
Creating the grouping variable to discriminate indigenous land from immigrant land:
Calculating IndVal index:
Rights and permissions
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gonçalves-Souza, T., Garey, M.V., da Silva, F.R., Albuquerque, U.P., Provete, D.B. (2019). Multidimensional Analyses for Testing Ecological, Ethnobiological, and Conservation Hypotheses. In: Albuquerque, U., de Lucena, R., Cruz da Cunha, L., Alves, R. (eds) Methods and Techniques in Ethnobiology and Ethnoecology . Springer Protocols Handbooks. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8919-5_8
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8918-8
Online ISBN: 978-1-4939-8919-5
eBook Packages: Springer Protocols