Skip to main content
Log in

Using the right tool for the job: the difference between unsupervised and supervised analyses of multivariate ecological data

  • Highlighted Student Research
  • Published:
Oecologia Aims and scope Submit manuscript

Abstract

Ecologists often collect data with the aim of determining which of many variables are associated with a particular cause or consequence. Unsupervised analyses (e.g. principal components analysis, PCA) summarize variation in the data, without regard to the response. Supervised analyses (e.g., partial least squares, PLS) evaluate the variables to find the combination that best explain a causal relationship. These approaches are not interchangeable, especially when the variables most responsible for a causal relationship are not the greatest source of overall variation in the data—a situation that ecologists are likely to encounter. To illustrate the differences between unsupervised and supervised techniques, we analyze a published dataset using both PCA and PLS and compare the questions and answers associated with each method. We also use simulated datasets representing situations that further illustrate differences between unsupervised and supervised analyses. For simulated data with many correlated variables that were unrelated to the response, PLS was better than PCA at identifying which variables were associated with the response. There are many applications for both unsupervised and supervised approaches in ecology. However, PCA is currently overused, at least in part because supervised approaches, such as PLS, are less familiar.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data accessibility

Reproducible R code for generating and analyzing data is archived in Zenodo (https://doi.org/10.5281/zenodo.3568392). Complete data from Muir et al. (2017b) are available on Data Dryad.

References

Download references

Acknowledgements

We thank Christopher Muir and Colin M. Orians for comments on a draft of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

ERS and EEC conceived and designed the study. ERS analyzed data and led the writing of the manuscript. Both authors contributed significantly to drafts and approve the final version for publication.

Corresponding author

Correspondence to Eric R. Scott.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Communicated by Casey P. terHorst.

Supervised multivariate analyses are underutilized in ecology. These analyses give different results than unsupervised approaches (e.g. PCA) which find main axes of variation without respect to a response. Here, we show how unsupervised and supervised approaches are not interchangeable and require different interpretation. In particular, unsupervised approaches are likely to miss significant relationships with variables that are not part of a main axis of variation, a situation which may be common in ecological datasets.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 96 KB)

Supplementary file2 (XLSX 47 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scott, E.R., Crone, E.E. Using the right tool for the job: the difference between unsupervised and supervised analyses of multivariate ecological data. Oecologia 196, 13–25 (2021). https://doi.org/10.1007/s00442-020-04848-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00442-020-04848-w

Keywords

Navigation