Abstract
This chapter outlines the commonest multidimensional analysis used in ecology, ethnobiology, and conservation. After mastering how to create a scientific research workflow based on a problembased platform (Chap. 7, in this book), readers will learn how to visualize complex datasets (e.g., species or ethnospecies lists, several socialpolitical or environmental variables, and so on), and test multivariate hypotheses. We provide a full reproducible example of every analysis discussed in this chapter as an Online Material. We further encourage students and researchers to move from a “descriptionbased multidimensional analysis” (such as using unconstrained ordination, like PCA) to an explicit hypothesistesting framework that can greatly improve learning and research programs.
Key words
 Data analysis
 Hypothesis testing
 Scientific research workflow
 Ordination
This is a preview of subscription content, access via your institution.
Buying options
References
Borcard D, Gillet F, Legendre P (2018) Numerical ecology with R, 2nd edn. Springer, New York
Gower JC, Legendre P (1986) Metric and euclidean properties of dissimilarity coefficients. J Classif 3:5–48
Dolédec S, Chessel D, ter Braak CJF, Champely S (1996) Matching species traits to environmental variables: a new threetable ordination method. Environ Ecol Stat 3:143–166
Legendre P, Legendre L (2012) Numerical ecology, 3rd English edn. Elsevier, Amsterdam
Chase JM, Leibold MA (2003) Ecological niches: linking classical and contemporary approaches. University of Chicago Press, Chicago
Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MH, Szoecs E, Wagner E (2018) Vegan: community ecology package. R package version 2.51. https://CRAN.Rproject.org/package=vegan
Dray S, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque G, Legendre P, Madi N, Wagner HH (2017) Adespatial: multivariate multiscale spatial analysis. R package version 0.09. https://CRAN.Rproject.org/package=adespatial
Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22:1–20
Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280
Legendre P, Borcard D (2018) BoxCoxchord transformations for community composition data prior to beta diversity analysis. Ecography 41:1–5
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572
Rodrigues A, Bones F, Schneiders A, Oliveira L, Vibrans A, Gasper A (2018) Plant trait dataset for treelike growth forms species of the subtropical Atlantic Rain Forest in Brazil. Data 3:16
Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3:217–223
Jombart T, Dray S (2010) Adephylo: exploratory analyses for the phylogenetic comparative method. Bioinformatics 26:1907–1909
Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338
Pavoine S, Dufour AB, Chessel D (2004) From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. J Theor Biol 228:523–537
Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge
Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems: data exploration. Methods Ecol Evol 1:3–14
Hadi AS, Ling RF (1998) Some cautionary notes on the use of principal components regression. Am Stat 52:15–19
Boaratti AZ, Silva FR (2015) Relationships between environmental gradients and geographic variation in the intraspecific body size of three species of frogs (Anura). Austral Ecol 40:869–9765
Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25:1965–1978
Anderson MJ (2001) A new method for nonparametric multivariate analysis of variance. Austral Ecol 26:32–46
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distancebased redundancy analysis. Ecology 82:290–297
Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693
Legendre P, Galzin R, HarmelinVivien ML (1997) Relating behavior to habitat: solutions to the fourthcorner problem. Ecology 78:547–562
Garnier E, Cortez J, Billès G, Navas ML, Roumet C, Debussche M, Laurent G, Blanchard A, Aubry D, Bellmann A, Neill C, Toussaint JP (2004) Plant functional markers capture ecosystem properties during secondary succession. Ecology 85:2630–2637
Lavorel S, Grigulis K, McIntyre S, Williams NSG, Garden D, Dorrough J, Berman S, Quétier F, Thébault A, Bonis A (2007) Assessing functional diversity in the field – methodology matters! Funct Ecol 22:134–147
PeresNeto PR, Dray S, ter Braak CJF (2017) Linking trait variation to the environment: critical issues with communityweighted mean correlation resolved by the fourthcorner approach. Ecography 40:806–816
ter Braak CJF, PeresNeto P, Dray S (2017) A critical issue in modelbased inference for studying traitbased community assembly and a solution. PeerJ 5:e2885
Rigal F, Cardoso P, Lobo JM, Triantis KA, Whittaker RJ, Amorim IR, Borges PAV (2018) Functional traits of indigenous and exotic grounddwelling arthropods show contrasting responses to landuse change in an oceanic island, Terceira, Azores. Divers Distrib 24:36–47
Kleyer M, Dray S, Bello F, Lepš J, Pakeman RJ, Strauss B, Thuiller W, Lavorel S (2012) Assessing species and community functional responses to environmental gradients: which multivariate methods? J Veg Sci 23:805–821
SoléSenan XO, JuárezEscario A, Conesa JA, Recasens J (2018) Plant species, functional assemblages and partitioning of diversity in a Mediterranean agricultural mosaic landscape. Agric Ecosyst Environ 256:163–172
Laliberté E, Legendre P, Shipley B (2014) FD: measuring functional diversity from multiple traits, and other tools for functional ecology. R package version 1, pp 0–12
ter Braak CJF (1987) Ordination. In: RHG J, CJF T, OFR V (eds) Data analysis in community and landscape ecology. Cambridge University Press, Cambridge, pp 91–173 Pudoc, Wageningen, The Netherlands. Reissued in 1995
Dray S, Pélissier R, Couteron P, Fortin M, Legendre P, PeresNeto PR, Bellier E, Bivand R, Blanchet FG, De Cáceres M, Dufour A, Heegaard E, Jombart T, Munoz F, Oksanen J, Thioulouse J, Wagner HH (2012) Community ecology in the age of multivariate multiscale spatial analysis. Ecol Monogr 82:257–275
Vasconcelos T, Santos T, Haddad C, RossaFeres DC (2010) Climatic variables and altitude as predictors of anuran species richness and number of reproductive modes in Brazil. J Trop Ecol 26:423–432
Hecnar SJ, M’Closke RT (1996) Regional dynamics and the status of amphibians. Ecology 77:2091–2097
de Cáceres MD, Legendre P (2009) Associations between species and groups of sites: indices and statistical inference. Ecology 90:3566–3574
Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol Monogr 67:345–366
McGeoch MA, Van Rensburg BJ, Botes A (2002) The verification and application of bioindicators: a case study of dung beetles in savanna ecosystem. J Appl Ecol 39:661–672
TejedaCruz C, Mehltreter K, Sosa VJ (2008) Indicadores ecológicos multitaxonómicos. In: Manson RH, HernándezOrtiz V, Gallina S, Mehltreter K (eds) Agroecosistemas cafetaleros de Veracruz: Biodiversidad, Manejo y Conservación. INECOL – INESEMARNAT, México, pp 271–278
Roberts DW (2016) labdsv: ordination and multivariate analysis for ecology. R package version 1.80. https://CRAN.Rproject.org/package=labdsv
Mayo DG (2018) Statistical inference as severe testing: how to get beyond the statistics wars. Cambridge Univ. Press, Cambridge
Taper ML, Lele S (2004) The nature of scientific evidence: statistical, philosophical, and empirical considerations. University of Chicago Press, Chicago
Salsburg D (2002) The lady tasting tea: how statistics revolutionized science in the twentieth century. Holt, New York, NY
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Appendix for the Chapter Multidimensional Analysis for Testing Ecological, Ethnobiological, and Conservation Hypothesis
Thiago GonçalvesSouza, Michel V. Garey, Fernando R. da Silva, Ulysses P. Albuquerque, and Diogo B. Provete.
21/05/2018

0.1 Loading the required packages

1 Unconstrained ordination

1.0.1 Principal Component Analysis (PCA)

1.0.2 Principal Coordinate Analysis

1.0.3 Principal Components Regression


2 PERMANOVA

2.0.1 EXAMPLE 1

2.0.2 EXAMPLE 2


3 CommunityWeighted Mean (CWM)

4 Constrained ordination

4.0.1 RDA


5 Clustering

5.0.1 Indicator value index

0.1 Loading the Required Packages
1 Unconstrained Ordination
Here, we will firstly use examplar datasets available in the R package datasets that already comes with any R installation.
Let’s look at the data. This dataset contains the masurement of sepals and petals of 50 flowers for each of the tree species of the genus Iris. You can see more by using /Library/Frameworks/R.framework/Versions/3.5/Resources/library/datasets/help/iris.
Next, we will run a PCA with a few packages that offer different outputs.
Finally, the plot:
The package factoextra has many nice plot capabilities. Here, we will plot the equilibrium circle to check the relative contribution of each variable to the formation of the axes:
The variables more or less contributed the same to the formation of the ordination axes.
And finally, the same analysis with the
The advantages of this package include: dealing with missing data and ability to easily plot the centroid of categorical variable in the ordination diagram. Let’s see how it runs:
As you can see, it automatically returns the biplot with the equilibrium circle and the individual factor map plotted with the centroid corresponding to the mean of each species in each dimension. With the equilibrium circle you can judge with each individual variable contributed more or less, depending on the length of the vector (arrow), to the formation of the axes in the reduced space. In this example, all four variables contributed more or less equally (see p. 447 of Legendre & Legendre 2012).
And we can easily see the correlation of each variable with the first two axes. This is useful when you want to use the axes in subsequent analyses, and ideally have to interpret how axes are summing up the information:
We will import a dataset published by Rodrigues et al. (2018) Plant Trait Dataset for TreeLike Growth Forms Species of the Subtropical Atlantic Rain Forest in Brazil. Data 3(2), 16; https://doi.org/10.3390/data3020016. Available here . The dataset presents traits of leaf, branch, maximum potential height, seed mass, and dispersion syndrome of 117 tree species. For the sake of simplicity, we will use just a few traits and categorize half of the species as being used by local people and the other half not used.
Before jumping in and analyzing the data right away, let’s first take a closer look at it to better understand it. Let’s quickly check the relationship among variables with a correlation matrix:
Another ke aspect in any dataset is the amoung of missing values, usually coded as NAs.
As you can see, there’re a few data missing. So let’s choose variables with as few missing values as possible that are important to test our hypothesis.
Remember that FactoMineR::PCA can handle missing values by replacing them with the mean value of the column. But here, for the sake of simplicity we will just remove them:
Let’s visualize our new, reduced data set:
Now we will create a vector to categorize half of the species as preferred and the other half as nonpreferred, in order to test our initial hypothesis. Of course, this is just to demonstrate how the method works and doesn’t mean that those species are really used for any medicinal purpose.
Since, values were measured in different scales, we have to standardize them before running a PCA. Luckily, this is done by default internally in the function dudi.pca, arguments center = TRUE and scale=TRUE.
Now, the fun part, let’s see the results.
First, a basic graph showing the results, disconsidering the groups:
Let’s see how variables contributed to the formation of the axes:
Plant height (Hpot95_m) and Bark prop were not very useful for the formation of the axes. Here, the angle of the arrows informes about the correlation of that variable with a given axis. For example, SLA is closely related to axis 1, meaning that this variable contributed very much to the formation of the axis, but not much to the second axis. Interestingly, Hpot95_m is contributing equally to the formation of both axes. This means that data points (species) arranged to the right of the ordination plot (positive side) have a large SLA and are tall. Similarly, thos to the left are small trees and have small SLA, but have tick leaves, large seeds, and high BarkProp.
Now, what about the preferred vs. nonPreferred species?
We can see that our hypothesis (see main text) was shamefully rejected, since the morphology of the preferred was not different from the nonPreferred plants. Of course, again, this example was just to illustrate how the method works and are not based on any real dataset aimed to test this hypothesis.
A cleaner graph without the row labels allow us to better see that the groups totally overlap:
Here, we will use the plant species trait data available in the package cluster to illustrate how a PCoA works. This dataset contains 31 morphological and reproductive variables for 136 plant species along a urbanization gradient. Let’s begin by loading the data:
Notice that the there’re many traits coded as semiquantitative variables (ordered factor in R language), binary variables, and two continuous variables. Here, we will use the dist.ktab of the package ade4 to calculate the Gower dissimilarity coefficient. First, we have to inform the function the nature of the variables
Notice that according to the Broken stick criterion, the first 12 eigenvectors should be selected for further use, e.g., in a Principal Components Regression.
Let’s look at the biplot:
The relationship between environmental gradients and patterns of geographic variation in intraspecific body size of Boana faber

Hypothesis: environmental gradients drive intraspecific body size of Boana faber

Dependent variable: body size of Boana faber

Independent variable: environmental variables (e.g., temperature and precipitation)
Let’s begin by importing the files
The first step is to implement the Principal Component Analysis (PCA), and then store the eigenvectors
We can see that the first axis explain 0.47% of the variation, while the second axis explain 0.28%, together they explain 0.75%.
Plotting the results in a biplot
We’ll select the first two axes of PCA to further use
Now, we’ll build several models of Principal Component Regressions (PCR) with different predictor variables for later use:
Using the Akaike Information Criteria corrected for small sample sizes (AICc) to select the best model
The model containing only the first axis of PCA is the most parcimonious with the lowest AICc value.
Now, let’s visualize the results:
The relationship between the number of fish species known by fishermen and their personal experiences
Hypothesis: personal experiences influence the number of fish species known by local fishermenDependent variable: number of fish species *Independent variable: personal experience (e.g., age, time as fisherman, number of fishing techniques used, if his father was also a fisherman)
Loading files:
Principal Component Analysis (PCA)
We can see that The first axis explain 0.57% of the variation, while the second axis explain 0.21%, together they explain 0.78%.
Visualizing the results:
Retaining the first two axes of PCA:
Building models of Principal Component Rrgressions (PCR) with different predictor variables
Using Akaike Information Criteria corrected for small samples (AICc) to select the best model
We can see that the model containing only the first PC is the most parcimonious one with the lowest AICc value.
Now, let’s visualize the results:
2 Permanova
It is an illustrative dataset that examines the hypothesis that land use alters the arthropod specie composition in the Cerrado biome.

Hypothesis: land use alters the specie composition

Dependent variable: arthropod species

Independent variable: areas (pastures, sugarcanes, and Cerrado protected areas)
Importing the data with arthropod species composition:
To run a PERMANOVA, we first have to calculating a dissimilarity measure. Here, we’ll use the percentage different (aka. BrayCurtis coefficient). We needd to exclude the first colum of data with treatments, that’s why the data[,1].
PERMANOVA analysis uses the dissimilarity matrix as response variable and the column containing the treatments as predictor:
The results show that arthropod species composition are different among treatments with R^{2} = 0.64; and P = 0.001. Now, we’ll perform a posthoc test to find which treatments are different between them. For this, we will use the function pairwise_adonis from the package pairwiseAdonis, available in GitHub .
The results show that all areas are harboring different species composition. Now, let’s see the results using a NonMetric Multidimensional Scaling (nMDS), which is the nonmetric version of PCoA. This method will not be covered in the chapter.
Knowledge about plant species with medicinal uses by people living along an urbanrural gradient.

Hypothesis: Urbanization decrease the knowledge of people about plants with medicinal uses

Dependent variable: plant species

Independent variable: people living in different areas (urban, periurban, and rural)
Loading the data with species composition of medicinal plants along urbanization gradient:
This time, we will use the Jaccard index of dissimilarity.
Running the PERMANOVA
The results show that knowledge about plant species is different across the gradient with R^{2} = 0.47; and P = 0.001. Now, we’ll run a posthoc test to find which treatments are different between them:
The results show that knowledge about plant species composition by people living in rural areas is different from people living in periurban and urban areas, but it is not different between people living in periurban and urban areas.
Performing NonMetric Multidimensional Scaling (nMDS) to visualize the results:
The figure shows the difference in knowldege about plant species composition with medicinal uses.
3 CommunityWeighted Mean (CWM)
Importing the datasets:
Preparing the data:
Running the CWM
Common errors while using the function functcomp: avoid using data.frame  so, use “as.matrix”. Species names are not the same (spell or order) in T and Y; so, use the following code to double check if they are correct. The result values should be all TRUE
Combining PERMANOVA with CWM to test hypothesis
Details about the permutation procedure can be found here
Here, we’ll run the permutations by strata to avoid pseudoreplication. First, let’s create the strata argument to run in the adonis function
Standardize environmental variables before the PERMANOVA if they are in different scales. You can do that directly with dplyr::mutate
Lastly, run the permanova with adonis function
Check the homogeneity of multivariate dispersions:
Visualize the results with a PCoA:
You can check which variables are correlated with the first two axes by using:
4 Constrained Ordination
We will use two exampled to illustrate how an RDA works.
Illustrative dataset that examines speciesenvironment relationship.

Hypothesis—The anurans species composition changes in response to environmental gradients.

Dependent variable—anurans species

Independent variable—environmental descriptors of habitat
First, let’s import all the data:
Then, let’s begin processing the data. We have to implement Hellinger transformation to ensure the matrix has Euclidean properties, because we’re dealing with species abundance with potentially many zeros.
The environmental variables were measured in different scales. So, we have to standardize them before consucting any analysis, to make them comparable:
Finally, let’s run an RDA
Calculating the Pvalue for the RDA model:
*Pvalue of the global model:
*Pvalue of rda axes:
*Pvalue of environmental variables:
Finally, let’s plot the results. This figure shows the variation in species composition constrained by environmental gradients
Illustrative dataset that tests whether cultivated medicinal plants change in relation to socioeconomic characteristics of households

Hypothesis:Plant species with medicinal uses changes in response to socioeconomic characteristics

Dependent variable: plant species

Independent variable: socioeconomic characteristics of the household
Let’s begin by importing the data:
Data Processing:
Now, running the RDA
Calculating P value for the RDA model:
*Pvalue of global model
*Pvalue of rda axis
*Pvalue of environmental variables
5 Clustering
Illustrative dataset that evaluate which Ephemeroptera species are bioindicators of environmental quality.

Hypothesis: Habitat contamination will affect different species resulting in the exclusion of some species in contaminated environments.

Dependent variable: Ephemeroptera species

Independent variable: preserved and anthropized streams
Let’s begin by importing the data:
We’ll divide the data into two parts, with a factor for preserved and modified areas:
Finally, let’s calculate the IndVal:
Illustrative dataset that examines the phytotherapy used by indigenous and nonindigenous populations
Hypothesis  Analyze if medicinal plants used to treat some illnesses are the same among indigenous populations and immigrants from the Southern, Brazil Dependent variable  phytotherapy species *Independent variable  populations (indigenous and European immigrants)
In this example we will used another package to calculate IndVal, which generates a different output
Creating the grouping variable to discriminate indigenous land from immigrant land:
Calculating IndVal index:
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
GonçalvesSouza, T., Garey, M.V., da Silva, F.R., Albuquerque, U.P., Provete, D.B. (2019). Multidimensional Analyses for Testing Ecological, Ethnobiological, and Conservation Hypotheses. In: Albuquerque, U., de Lucena, R., Cruz da Cunha, L., Alves, R. (eds) Methods and Techniques in Ethnobiology and Ethnoecology . Springer Protocols Handbooks. Humana Press, New York, NY. https://doi.org/10.1007/9781493989195_8
Download citation
DOI: https://doi.org/10.1007/9781493989195_8
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 9781493989188
Online ISBN: 9781493989195
eBook Packages: Springer Protocols