Skip to main content

Multidimensional Analyses for Testing Ecological, Ethnobiological, and Conservation Hypotheses

Part of the Springer Protocols Handbooks book series (SPH)


This chapter outlines the commonest multidimensional analysis used in ecology, ethnobiology, and conservation. After mastering how to create a scientific research workflow based on a problem-based platform (Chap. 7, in this book), readers will learn how to visualize complex datasets (e.g., species or ethnospecies lists, several social-political or environmental variables, and so on), and test multivariate hypotheses. We provide a full reproducible example of every analysis discussed in this chapter as an Online Material. We further encourage students and researchers to move from a “description-based multidimensional analysis” (such as using unconstrained ordination, like PCA) to an explicit hypothesis-testing framework that can greatly improve learning and research programs.

Key words

  • Data analysis
  • Hypothesis testing
  • Scientific research workflow
  • Ordination

This is a preview of subscription content, access via your institution.

Buying options

USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-8919-5_8
  • Chapter length: 24 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-8919-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more


  1. Borcard D, Gillet F, Legendre P (2018) Numerical ecology with R, 2nd edn. Springer, New York

    CrossRef  Google Scholar 

  2. Gower JC, Legendre P (1986) Metric and euclidean properties of dissimilarity coefficients. J Classif 3:5–48

    CrossRef  Google Scholar 

  3. Dolédec S, Chessel D, ter Braak CJF, Champely S (1996) Matching species traits to environmental variables: a new three-table ordination method. Environ Ecol Stat 3:143–166

    CrossRef  Google Scholar 

  4. Legendre P, Legendre L (2012) Numerical ecology, 3rd English edn. Elsevier, Amsterdam

    Google Scholar 

  5. Chase JM, Leibold MA (2003) Ecological niches: linking classical and contemporary approaches. University of Chicago Press, Chicago

    CrossRef  Google Scholar 

  6. Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O'Hara RB, Simpson GL, Solymos P, Stevens MH, Szoecs E, Wagner E (2018) Vegan: community ecology package. R package version 2.5-1.

    Google Scholar 

  7. Dray S, Blanchet G, Borcard D, Clappe S, Guenard G, Jombart T, Larocque G, Legendre P, Madi N, Wagner HH (2017) Adespatial: multivariate multiscale spatial analysis. R package version 0.0-9.

    Google Scholar 

  8. Dray S, Dufour AB (2007) The ade4 package: implementing the duality diagram for ecologists. J Stat Softw 22:1–20

    CrossRef  Google Scholar 

  9. Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280

    CrossRef  Google Scholar 

  10. Legendre P, Borcard D (2018) Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41:1–5

    CrossRef  Google Scholar 

  11. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572

    CrossRef  Google Scholar 

  12. Rodrigues A, Bones F, Schneiders A, Oliveira L, Vibrans A, Gasper A (2018) Plant trait dataset for tree-like growth forms species of the subtropical Atlantic Rain Forest in Brazil. Data 3:16

    CrossRef  Google Scholar 

  13. Revell LJ (2012) Phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3:217–223

    CrossRef  Google Scholar 

  14. Jombart T, Dray S (2010) Adephylo: exploratory analyses for the phylogenetic comparative method. Bioinformatics 26:1907–1909

    CrossRef  CAS  Google Scholar 

  15. Gower JC (1966) Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325–338

    CrossRef  Google Scholar 

  16. Pavoine S, Dufour AB, Chessel D (2004) From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. J Theor Biol 228:523–537

    CrossRef  Google Scholar 

  17. Quinn GP, Keough MJ (2002) Experimental design and data analysis for biologists. Cambridge University Press, Cambridge

    CrossRef  Google Scholar 

  18. Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems: data exploration. Methods Ecol Evol 1:3–14

    CrossRef  Google Scholar 

  19. Hadi AS, Ling RF (1998) Some cautionary notes on the use of principal components regression. Am Stat 52:15–19

    Google Scholar 

  20. Boaratti AZ, Silva FR (2015) Relationships between environmental gradients and geographic variation in the intraspecific body size of three species of frogs (Anura). Austral Ecol 40:869–9765

    CrossRef  Google Scholar 

  21. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. Int J Climatol 25:1965–1978

    CrossRef  Google Scholar 

  22. Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46

    Google Scholar 

  23. McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297

    CrossRef  Google Scholar 

  24. Anderson MJ, Ellingsen KE, McArdle BH (2006) Multivariate dispersion as a measure of beta diversity. Ecol Lett 9:683–693

    CrossRef  Google Scholar 

  25. Legendre P, Galzin R, Harmelin-Vivien ML (1997) Relating behavior to habitat: solutions to the fourth-corner problem. Ecology 78:547–562

    Google Scholar 

  26. Garnier E, Cortez J, Billès G, Navas ML, Roumet C, Debussche M, Laurent G, Blanchard A, Aubry D, Bellmann A, Neill C, Toussaint JP (2004) Plant functional markers capture ecosystem properties during secondary succession. Ecology 85:2630–2637

    CrossRef  Google Scholar 

  27. Lavorel S, Grigulis K, McIntyre S, Williams NSG, Garden D, Dorrough J, Berman S, Quétier F, Thébault A, Bonis A (2007) Assessing functional diversity in the field – methodology matters! Funct Ecol 22:134–147

    Google Scholar 

  28. Peres-Neto PR, Dray S, ter Braak CJF (2017) Linking trait variation to the environment: critical issues with community-weighted mean correlation resolved by the fourth-corner approach. Ecography 40:806–816

    CrossRef  Google Scholar 

  29. ter Braak CJF, Peres-Neto P, Dray S (2017) A critical issue in model-based inference for studying trait-based community assembly and a solution. PeerJ 5:e2885

    CrossRef  Google Scholar 

  30. Rigal F, Cardoso P, Lobo JM, Triantis KA, Whittaker RJ, Amorim IR, Borges PAV (2018) Functional traits of indigenous and exotic ground-dwelling arthropods show contrasting responses to land-use change in an oceanic island, Terceira, Azores. Divers Distrib 24:36–47

    CrossRef  Google Scholar 

  31. Kleyer M, Dray S, Bello F, Lepš J, Pakeman RJ, Strauss B, Thuiller W, Lavorel S (2012) Assessing species and community functional responses to environmental gradients: which multivariate methods? J Veg Sci 23:805–821

    CrossRef  Google Scholar 

  32. Solé-Senan XO, Juárez-Escario A, Conesa JA, Recasens J (2018) Plant species, functional assemblages and partitioning of diversity in a Mediterranean agricultural mosaic landscape. Agric Ecosyst Environ 256:163–172

    CrossRef  Google Scholar 

  33. Laliberté E, Legendre P, Shipley B (2014) FD: measuring functional diversity from multiple traits, and other tools for functional ecology. R package version 1, pp 0–12

    Google Scholar 

  34. ter Braak CJF (1987) Ordination. In: RHG J, CJF T, OFR V (eds) Data analysis in community and landscape ecology. Cambridge University Press, Cambridge, pp 91–173 Pudoc, Wageningen, The Netherlands. Reissued in 1995

    Google Scholar 

  35. Dray S, Pélissier R, Couteron P, Fortin M, Legendre P, Peres-Neto PR, Bellier E, Bivand R, Blanchet FG, De Cáceres M, Dufour A, Heegaard E, Jombart T, Munoz F, Oksanen J, Thioulouse J, Wagner HH (2012) Community ecology in the age of multivariate multiscale spatial analysis. Ecol Monogr 82:257–275

    CrossRef  Google Scholar 

  36. Vasconcelos T, Santos T, Haddad C, Rossa-Feres DC (2010) Climatic variables and altitude as predictors of anuran species richness and number of reproductive modes in Brazil. J Trop Ecol 26:423–432

    CrossRef  Google Scholar 

  37. Hecnar SJ, M’Closke RT (1996) Regional dynamics and the status of amphibians. Ecology 77:2091–2097

    CrossRef  Google Scholar 

  38. de Cáceres MD, Legendre P (2009) Associations between species and groups of sites: indices and statistical inference. Ecology 90:3566–3574

    CrossRef  Google Scholar 

  39. Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol Monogr 67:345–366

    Google Scholar 

  40. McGeoch MA, Van Rensburg BJ, Botes A (2002) The verification and application of bioindicators: a case study of dung beetles in savanna ecosystem. J Appl Ecol 39:661–672

    CrossRef  Google Scholar 

  41. Tejeda-Cruz C, Mehltreter K, Sosa VJ (2008) Indicadores ecológicos multi-taxonómicos. In: Manson RH, Hernández-Ortiz V, Gallina S, Mehltreter K (eds) Agroecosistemas cafetaleros de Veracruz: Biodiversidad, Manejo y Conservación. INECOL – INE-SEMARNAT, México, pp 271–278

    Google Scholar 

  42. Roberts DW (2016) labdsv: ordination and multivariate analysis for ecology. R package version 1.8-0.

  43. Mayo DG (2018) Statistical inference as severe testing: how to get beyond the statistics wars. Cambridge Univ. Press, Cambridge

    CrossRef  Google Scholar 

  44. Taper ML, Lele S (2004) The nature of scientific evidence: statistical, philosophical, and empirical considerations. University of Chicago Press, Chicago

    CrossRef  Google Scholar 

  45. Salsburg D (2002) The lady tasting tea: how statistics revolutionized science in the twentieth century. Holt, New York, NY

    Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Thiago Gonçalves-Souza .

Editor information

Editors and Affiliations

1 Appendix for the Chapter Multidimensional Analysis for Testing Ecological, Ethnobiological, and Conservation Hypothesis

Thiago Gonçalves-Souza, Michel V. Garey, Fernando R. da Silva, Ulysses P. Albuquerque, and Diogo B. Provete.


  • 0.1 Loading the required packages

  • 1 Unconstrained ordination

    • 1.0.1 Principal Component Analysis (PCA)

    • 1.0.2 Principal Coordinate Analysis

    • 1.0.3 Principal Components Regression


    • 2.0.1 EXAMPLE 1

    • 2.0.2 EXAMPLE 2

  • 3 Community-Weighted Mean (CWM)

  • 4 Constrained ordination

    • 4.0.1 RDA

  • 5 Clustering

    • 5.0.1 Indicator value index

0.1 Loading the Required Packages

1 Unconstrained Ordination

Here, we will firstly use examplar datasets available in the R package datasets that already comes with any R installation.

Let’s look at the data. This dataset contains the masurement of sepals and petals of 50 flowers for each of the tree species of the genus Iris. You can see more by using /Library/Frameworks/R.framework/Versions/3.5/Resources/library/datasets/help/iris.

Next, we will run a PCA with a few packages that offer different outputs.

Finally, the plot:

The package factoextra has many nice plot capabilities. Here, we will plot the equilibrium circle to check the relative contribution of each variable to the formation of the axes:

The variables more or less contributed the same to the formation of the ordination axes.

And finally, the same analysis with the

The advantages of this package include: dealing with missing data and ability to easily plot the centroid of categorical variable in the ordination diagram. Let’s see how it runs:

As you can see, it automatically returns the biplot with the equilibrium circle and the individual factor map plotted with the centroid corresponding to the mean of each species in each dimension. With the equilibrium circle you can judge with each individual variable contributed more or less, depending on the length of the vector (arrow), to the formation of the axes in the reduced space. In this example, all four variables contributed more or less equally (see p. 447 of Legendre & Legendre 2012).

And we can easily see the correlation of each variable with the first two axes. This is useful when you want to use the axes in subsequent analyses, and ideally have to interpret how axes are summing up the information:

We will import a dataset published by Rodrigues et al. (2018) Plant Trait Dataset for Tree-Like Growth Forms Species of the Subtropical Atlantic Rain Forest in Brazil. Data 3(2), 16; Available here . The dataset presents traits of leaf, branch, maximum potential height, seed mass, and dispersion syndrome of 117 tree species. For the sake of simplicity, we will use just a few traits and categorize half of the species as being used by local people and the other half not used.

Before jumping in and analyzing the data right away, let’s first take a closer look at it to better understand it. Let’s quickly check the relationship among variables with a correlation matrix:

Another ke aspect in any dataset is the amoung of missing values, usually coded as NAs.

As you can see, there’re a few data missing. So let’s choose variables with as few missing values as possible that are important to test our hypothesis.

Remember that FactoMineR::PCA can handle missing values by replacing them with the mean value of the column. But here, for the sake of simplicity we will just remove them:

Let’s visualize our new, reduced data set:

Now we will create a vector to categorize half of the species as preferred and the other half as nonpreferred, in order to test our initial hypothesis. Of course, this is just to demonstrate how the method works and doesn’t mean that those species are really used for any medicinal purpose.

Since, values were measured in different scales, we have to standardize them before running a PCA. Luckily, this is done by default internally in the function dudi.pca, arguments center = TRUE and scale=TRUE.

Now, the fun part, let’s see the results.

First, a basic graph showing the results, disconsidering the groups:

Let’s see how variables contributed to the formation of the axes:

Plant height (Hpot95_m) and Bark prop were not very useful for the formation of the axes. Here, the angle of the arrows informes about the correlation of that variable with a given axis. For example, SLA is closely related to axis 1, meaning that this variable contributed very much to the formation of the axis, but not much to the second axis. Interestingly, Hpot95_m is contributing equally to the formation of both axes. This means that data points (species) arranged to the right of the ordination plot (positive side) have a large SLA and are tall. Similarly, thos to the left are small trees and have small SLA, but have tick leaves, large seeds, and high BarkProp.

Now, what about the preferred vs. nonPreferred species?

We can see that our hypothesis (see main text) was shamefully rejected, since the morphology of the preferred was not different from the nonPreferred plants. Of course, again, this example was just to illustrate how the method works and are not based on any real dataset aimed to test this hypothesis.

A cleaner graph without the row labels allow us to better see that the groups totally overlap:

Here, we will use the plant species trait data available in the package cluster to illustrate how a PCoA works. This dataset contains 31 morphological and reproductive variables for 136 plant species along a urbanization gradient. Let’s begin by loading the data:

Notice that the there’re many traits coded as semiquantitative variables (ordered factor in R language), binary variables, and two continuous variables. Here, we will use the dist.ktab of the package ade4 to calculate the Gower dissimilarity coefficient. First, we have to inform the function the nature of the variables

Notice that according to the Broken stick criterion, the first 12 eigenvectors should be selected for further use, e.g., in a Principal Components Regression.

Let’s look at the biplot:

The relationship between environmental gradients and patterns of geographic variation in intraspecific body size of Boana faber

  • Hypothesis: environmental gradients drive intraspecific body size of Boana faber

  • Dependent variable: body size of Boana faber

  • Independent variable: environmental variables (e.g., temperature and precipitation)

Let’s begin by importing the files

The first step is to implement the Principal Component Analysis (PCA), and then store the eigenvectors

We can see that the first axis explain 0.47% of the variation, while the second axis explain 0.28%, together they explain 0.75%.

Plotting the results in a biplot

We’ll select the first two axes of PCA to further use

Now, we’ll build several models of Principal Component Regressions (PCR) with different predictor variables for later use:

Using the Akaike Information Criteria corrected for small sample sizes (AICc) to select the best model

The model containing only the first axis of PCA is the most parcimonious with the lowest AICc value.

Now, let’s visualize the results:

The relationship between the number of fish species known by fishermen and their personal experiences

Hypothesis: personal experiences influence the number of fish species known by local fishermenDependent variable: number of fish species *Independent variable: personal experience (e.g., age, time as fisherman, number of fishing techniques used, if his father was also a fisherman)

Loading files:

Principal Component Analysis (PCA)

We can see that The first axis explain 0.57% of the variation, while the second axis explain 0.21%, together they explain 0.78%.

Visualizing the results:

Retaining the first two axes of PCA:

Building models of Principal Component Rrgressions (PCR) with different predictor variables

Using Akaike Information Criteria corrected for small samples (AICc) to select the best model

We can see that the model containing only the first PC is the most parcimonious one with the lowest AICc value.

Now, let’s visualize the results:

2 Permanova

It is an illustrative dataset that examines the hypothesis that land use alters the arthropod specie composition in the Cerrado biome.

  • Hypothesis: land use alters the specie composition

  • Dependent variable: arthropod species

  • Independent variable: areas (pastures, sugarcanes, and Cerrado protected areas)

Importing the data with arthropod species composition:

To run a PERMANOVA, we first have to calculating a dissimilarity measure. Here, we’ll use the percentage different (aka. Bray-Curtis coefficient). We needd to exclude the first colum of data with treatments, that’s why the data[,-1].

PERMANOVA analysis uses the dissimilarity matrix as response variable and the column containing the treatments as predictor:

The results show that arthropod species composition are different among treatments with R2 = 0.64; and P = 0.001. Now, we’ll perform a post-hoc test to find which treatments are different between them. For this, we will use the function pairwise_adonis from the package pairwiseAdonis, available in GitHub .

The results show that all areas are harboring different species composition. Now, let’s see the results using a Non-Metric Multidimensional Scaling (nMDS), which is the non-metric version of PCoA. This method will not be covered in the chapter.

Knowledge about plant species with medicinal uses by people living along an urban-rural gradient.

  • Hypothesis: Urbanization decrease the knowledge of people about plants with medicinal uses

  • Dependent variable: plant species

  • Independent variable: people living in different areas (urban, periurban, and rural)

Loading the data with species composition of medicinal plants along urbanization gradient:

This time, we will use the Jaccard index of dissimilarity.

Running the PERMANOVA

The results show that knowledge about plant species is different across the gradient with R2 = 0.47; and P = 0.001. Now, we’ll run a post-hoc test to find which treatments are different between them:

The results show that knowledge about plant species composition by people living in rural areas is different from people living in periurban and urban areas, but it is not different between people living in periurban and urban areas.

Performing Non-Metric Multidimensional Scaling (nMDS) to visualize the results:

The figure shows the difference in knowldege about plant species composition with medicinal uses.

3 Community-Weighted Mean (CWM)

Importing the datasets:

Preparing the data:

Running the CWM

Common errors while using the function functcomp: avoid using data.frame - so, use “as.matrix”. Species names are not the same (spell or order) in T and Y; so, use the following code to double check if they are correct. The result values should be all TRUE

Combining PERMANOVA with CWM to test hypothesis

Details about the permutation procedure can be found here

Here, we’ll run the permutations by strata to avoid pseudoreplication. First, let’s create the strata argument to run in the adonis function

Standardize environmental variables before the PERMANOVA if they are in different scales. You can do that directly with dplyr::mutate

Lastly, run the permanova with adonis function

Check the homogeneity of multivariate dispersions:

Visualize the results with a PCoA:

You can check which variables are correlated with the first two axes by using:

4 Constrained Ordination

We will use two exampled to illustrate how an RDA works.

Illustrative dataset that examines species-environment relationship.

  • Hypothesis—The anurans species composition changes in response to environmental gradients.

  • Dependent variable—anurans species

  • Independent variable—environmental descriptors of habitat

First, let’s import all the data:

Then, let’s begin processing the data. We have to implement Hellinger transformation to ensure the matrix has Euclidean properties, because we’re dealing with species abundance with potentially many zeros.

The environmental variables were measured in different scales. So, we have to standardize them before consucting any analysis, to make them comparable:

Finally, let’s run an RDA

Calculating the P-value for the RDA model:

*P-value of the global model:

*P-value of rda axes:

*P-value of environmental variables:

Finally, let’s plot the results. This figure shows the variation in species composition constrained by environmental gradients

Illustrative dataset that tests whether cultivated medicinal plants change in relation to socioeconomic characteristics of households

  • Hypothesis:Plant species with medicinal uses changes in response to socioeconomic characteristics

  • Dependent variable: plant species

  • Independent variable: socioeconomic characteristics of the household

Let’s begin by importing the data:

Data Processing:

Now, running the RDA

Calculating P value for the RDA model:

*P-value of global model

*P-value of rda axis

*P-value of environmental variables

5 Clustering

Illustrative dataset that evaluate which Ephemeroptera species are bioindicators of environmental quality.

  • Hypothesis: Habitat contamination will affect different species resulting in the exclusion of some species in contaminated environments.

  • Dependent variable: Ephemeroptera species

  • Independent variable: preserved and anthropized streams

Let’s begin by importing the data:

We’ll divide the data into two parts, with a factor for preserved and modified areas:

Finally, let’s calculate the IndVal:

Illustrative dataset that examines the phytotherapy used by indigenous and non-indigenous populations

Hypothesis - Analyze if medicinal plants used to treat some illnesses are the same among indigenous populations and immigrants from the Southern, Brazil Dependent variable - phytotherapy species *Independent variable - populations (indigenous and European immigrants)

In this example we will used another package to calculate IndVal, which generates a different output

Creating the grouping variable to discriminate indigenous land from immigrant land:

Calculating IndVal index:

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Verify currency and authenticity via CrossMark

Cite this protocol

Gonçalves-Souza, T., Garey, M.V., da Silva, F.R., Albuquerque, U.P., Provete, D.B. (2019). Multidimensional Analyses for Testing Ecological, Ethnobiological, and Conservation Hypotheses. In: Albuquerque, U., de Lucena, R., Cruz da Cunha, L., Alves, R. (eds) Methods and Techniques in Ethnobiology and Ethnoecology . Springer Protocols Handbooks. Humana Press, New York, NY.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8918-8

  • Online ISBN: 978-1-4939-8919-5

  • eBook Packages: Springer Protocols