Cluster Analysis

  • Daniel Borcard
  • François Gillet
  • Pierre Legendre
Part of the Use R! book series (USE R)


In most cases, data exploration and the computation of association matrices are preliminary steps towards deeper analyses. In this chapter you will go further by experimenting one of the large groups of analytical methods used in ecology: clustering. Practically, you will learn how to choose among various clustering methods and compute them, apply these techniques to the Doubs River data to identify groups of sites and fish species. You will also explore two methods of constrained clustering, a powerful modelling approach where the clustering process is constrained by an external data set.


  1. Borcard, D., Gillet, F., Legendre, P.: Numerical Ecology with R. UseR! Series. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
  2. Borthagaray, A.I., Arim, M., Marquet, P.A.: Inferring species roles in metacommunity structure from species co-occurrence networks. Proc. R. Soc. B. 281, 20141425 (2014)CrossRefGoogle Scholar
  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.G.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)zbMATHGoogle Scholar
  4. Chytrý, M., Tichy, L., Holt, J., Botta-Duka, Z.: Determination of diagnostic species with statistical fidelity measures. J. Veg. Sci. 13, 79–90 (2002)CrossRefGoogle Scholar
  5. Clua, E., Buray, N., Legendre, P., Mourier, J., Planes, S.: Behavioural response of sicklefin lemon sharks Negaprion acutidens to underwater feeding for ecotourism purposes. Mar. Ecol. Prog. Ser. 414, 257–266 (2010)CrossRefGoogle Scholar
  6. Davé, R.N., Krishnapuram, R.: Robust clustering methods: a unified view. IEEE Trans. Fuzzy Syst. 5, 270–293 (1997)CrossRefGoogle Scholar
  7. De Cáceres, M., Legendre, P.: Associations between species and groups of sites: indices and statistical inference. Ecology. 90, 3566–3574 (2009)CrossRefGoogle Scholar
  8. De Cáceres, M., Font, X., Oliva, F.: The management of numerical vegetation classifications with fuzzy clustering methods. J. Veg. Sci. 21, 1138–1151 (2010)CrossRefGoogle Scholar
  9. De’ath, G.: Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology. 83, 1105–1117 (2002)Google Scholar
  10. Dufrêne, M., Legendre, P.: Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecol. Monogr. 67, 345–366 (1997)Google Scholar
  11. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  12. Efron, B., Halloran, E., Holmes, S.: Bootstrap confidence levels for phylogenetic trees. Proc Nat Acad Sci USA. 93, 13429–13434 (1996)CrossRefzbMATHGoogle Scholar
  13. Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 39, 783–791 (1985)CrossRefGoogle Scholar
  14. Gordon, A.D.: Classification in the presence of constraints. Biometrics. 29, 821–827 (1973)CrossRefGoogle Scholar
  15. Gordon, A.D., Birks, H.J.B.: Numerical methods in quaternary palaeoecology. I. Zonation of pollen diagrams. New Phytol. 71, 961–979 (1972)CrossRefGoogle Scholar
  16. Gordon, A.D., Birks, H.J.B.: Numerical methods in quaternary palaeoecology. II. Comparison of pollen diagrams. New Phytol. 73, 221–249 (1974)CrossRefGoogle Scholar
  17. Gower, J.C.: Comparing classifications. In: Felsenstein, J. (ed.) Numerical Taxonomy. NATO ASI Series, vol. G-1, pp. 137–155. Springer, Berlin (1983)CrossRefGoogle Scholar
  18. Grimm, E.C.: CONISS: A FORTRAN 77 program for stratigraphically constrained cluster analysis by the method of incremental sum of squares. Comput. Geosci. 13, 13–35 (1987)CrossRefGoogle Scholar
  19. Hardy, O.J.: Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. J. Ecol. 96, 914–926 (2008)CrossRefGoogle Scholar
  20. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (2005)zbMATHGoogle Scholar
  21. Lance, G.N., Williams, W.T.: A generalized sorting strategy for computer classifications. Nature. 212, 218 (1966)CrossRefGoogle Scholar
  22. Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. I. Hierarchical systems. Comput. J. 9, 373–380 (1967)CrossRefGoogle Scholar
  23. Legendre, P.: Species associations: the Kendall coefficient of concordance revisited. J. Agric. Biol. Environ. Stat. 10, 226–245 (2005)CrossRefGoogle Scholar
  24. Legendre, P.: Coefficient of concordance. In: Salking, N.J. (ed.) Encyclopedia of Research Design, vol. 1, pp. 164–169. SAGE Publications, Los Angeles (2010)Google Scholar
  25. Legendre, P., De Cáceres, M.: Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol. Lett. 16, 951–963 (2013)CrossRefGoogle Scholar
  26. Legendre, P., Legendre, L.: Numerical Ecology, 3rd English edn. Elsevier, Amsterdam (2012)Google Scholar
  27. Legendre, P., Rogers, D.J.: Characters and clustering in taxonomy: a synthesis of two taximetric procedures. Taxon. 21, 567–606 (1972)CrossRefGoogle Scholar
  28. Legendre, P., Dallot, S., Legendre, L.: Succession of species within a community: chronological clustering with applications to marine and freshwater zooplankton. Am. Nat. 125, 257–288 (1985)CrossRefGoogle Scholar
  29. Legendre, P., Oden, N.L., Sokal, R.R., Vaudor, A., Kim, J.: Approximate analysis of variance of spatially autocorrelated regional data. J. Classif. 7, 53–75 (1990)CrossRefGoogle Scholar
  30. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika. 50, 159–179 (1985)CrossRefGoogle Scholar
  31. Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31, 274–295 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  32. Olesen, J.M., Bascompte, J., Dupont, Y.L., Jordano, P.: The modularity of pollination networks. Proc. Natl. Acad. Sci. 104, 19891–19896 (2007)CrossRefGoogle Scholar
  33. Raup, D.M., Crick, R.E.: Measurement of faunal similarity in paleontology. J. Paleontol. 53, 1213–1227 (1979)Google Scholar
  34. Shimodaira, H.: An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002)CrossRefGoogle Scholar
  35. Shimodaira, H.: Approximately unbiased tests of regions using multistep- multiscale bootstrap resampling. Ann. Stat. 32, 2616–2641 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Suzuki, R., Shimodaira, H.: Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 22, 1540–1542 (2006)CrossRefGoogle Scholar
  37. Ward, J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)MathSciNetCrossRefGoogle Scholar
  38. Williams, W.T., Lambert, J.M.: Multivariate methods in plant ecology. I. Association-analysis in plant communities. J. Ecol. 47, 83–101 (1959)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Daniel Borcard
    • 1
  • François Gillet
    • 2
  • Pierre Legendre
    • 1
  1. 1.Département de sciences biologiquesUniversité de MontréalMontréalCanada
  2. 2.UMR Chrono-environnementUniversité Bourgogne Franche-ComtéBesançonFrance

Personalised recommendations