Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Vegetation classification by two new iterative reallocation optimization algorithms

Abstract

This paper presents two new non-hierarchical iterative reallocation optimization algorithms for vegetation classification. The OPTimal PARTitioning algorithm (OPTPART) optimizes the ratio of within-cluster similarity to among-cluster similarity; the OPTimal SILhouette algorithm (OPTSIL) optimizes the difference between the similarity of each sample to the cluster to which it is assigned and its similarity to the most similar cluster. The algorithms were tested on three vegetation datasets (Mt. Field Massif, Tazmania, Australia; Podyj/Thayatal National Park, Austria/Czech Republic; and Shoshone National Forest, Wyoming, USA) using three dissimilarity/distance matrices (Bray-Curtis, chord distance, and Hellinger distance) and compared to five other commonly used or recently introduced vegetation classification algorithms (flexible-β, TWINSPAN, PAM, ISOPAM, and K-means) using eight goodness-of-clustering evaluators. Five of the eight evaluators were species-based and operate on the distribution of individual taxa among clusters; three were community-based and operate on the compositional similarity of clusters. OPTPART was initialized from random partitions and from the results of a flexible-β classification as the initial partition; OPTSIL was initialized from partitions resulting from OPTPART, flexible-β, and K-means classifications. Algorithms were ranked from best to worst on each clustering evaluator for each dissimilarity/distance matrix for each dataset, and summarized by median ranks. OPTPART, SIL/OPT (OPTSIL from an OPTPART initial partition), and SIL/FLEX (OPTSIL from a flexible-β initial partition) ranked 1–3 respectively for results pooled across all three datasets and dissimilarity/distance matrices. OPTPART, SIL/OPT, and SIL/FLEX consistently ranked 1–3 across the individual datasets, although the order varied slightly by dataset.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Aho K, Roberts DW, Weaver T (2008) Using geometric and non-geometric internal evaluators to compare eight vegetation classification results. J Veg Sci 19:549–562

  2. Belbin L (1987) The use of non-hierarchical allocation methods for clustering large sets of data. Aust J Compet 19:32–41

  3. Belbin L, McDonald C (1993) Comparing three classification strategies for use in ecology. J Veg Sci 4:341–348

  4. Bray JR, Curtis JT (1957) An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr 27:326–349

  5. Bruelheide H, Chytrý M (2000) Towards unification of national vegetation classifications: a comparison of two methods for analysis of large data sets. J Veg Sci 11:295–306

  6. Chytrý M, Vicherek J (1995) Lesní Vegetace Národní ho Parku Podyjí/Thayatal [Forest vegetation of the National Park Podyj/Thayatal]. Academia, Praha

  7. Chytrý M, Tichý L, Holt J, Botta-Dukát Z (2002) Determination of diagnostic species with statistical fidelity measures. J Veg Sci 13:79–90

  8. Dale MB (1991) Knowing when to stop: cluster concept–concept cluster. In: Feoli EL, Orlóci L (eds) Computer-assisted vegetation analysis. Kluwer Academic Publishers, Dordrecht, pp 149–171

  9. De’ath G (1999) Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data. Plant Ecol 144:191199

  10. De Cáceres M, Wiser SK (2012) Towards consistency in vegetation classification. J Veg Sci 23:387–393

  11. De Cáceres M, Wiser SK (2013) Updating vegetation classifications: an example with New Zealand’s woody vegetation. J Veg Sci 24:80–93

  12. Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible asymmetric approach. Ecol Monogr 67:345–367

  13. Faith DP, Minchin PR, Belbin L (1987) Compositional dissimilarity as a robust measure of ecological distance: a theoretical model and computer simulations. Vegetatio 69:57–68

  14. Goodall DW (1973) Sample similarity and species correlation. In: Whittaker RH (ed) Ordination and classification of communities. Handbook for vegetation science. Dr. W. Junk, The Hague

  15. Hartigan JA, Wong MA (1979) A \(K\)-means clustering algorithm. Appl Stat 28:100–108

  16. Hill MO, Bunce RGH, Shaw MW (1975) Indicator species analysis, a divisive polythetic method of classification, and its application to a survey of native pinewoods in Scotland. J Ecol 63:597–613

  17. Hill MO (1979) Twinspan—a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Ecology and Systematics. Cornell University, Ithaca

  18. Kampichler C, van der Jeugd HP (2013) Determining patterns of variability in ecological communities: time lag analysis revisited. Environ Ecol Stat 20:271–284

  19. Kaufman L, Rousseeuw PJ (1990) Finding groups in data. Wiley, New York

  20. Lance GN, Williams WT (1966) Computer programs for hierarchical polythetic classification (“similarity analyses”). Comput J 9:60–64

  21. Lance GN, Williams WT (1967a) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9:373–380

  22. Lance GN, Williams WT (1967b) A general theory of classificatory sorting strategies: II. Clustering systems. Comput J 10:271–277

  23. Legendre P, De Cáceres M (2013) Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol Lett 16:951–963

  24. Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280

  25. Legendre P, Legendre L (2012) Numerical ecology, 3rd edn. Elsevier, Amsterdam

  26. Lötter MC, Mucina L, Witkowski, ETF (2013) The classification conundrum: species fidelity as leading criterion in search of a rigorous method to classify a complex forest data set. Community Ecol 14:121–132

  27. Maechler M, Rousseeuw PJ, Struyf A, Hubert M, Hornik K (2013) Cluster: Cluster analysis basics and extensions. R package 1.14.4, R Foundation for Statistical Computing, Vienna

  28. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297

  29. McCune B, Grace JB (2002) Analysis of ecological communities. MjM Software, Gleneden Beach

  30. Minchin PR (1983) A comparative evaluation of techniques for ecological ordination using simulated vegetation data and an integrated ordination–classification analysis of the alpine and subalpine plant communities of the Mt. Field plateau, Tasmania. Ph. D. thesis, University of Tasmania

  31. Minchin PR (1987) An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69:57–68

  32. Minchin PR (1989) Montane vegetation of the Mt. Field massif, Tasmania: a test of some hypotheses about properties of community patterns. Vegetatio 83:97–110

  33. Orlóci L (1967) An agglomerative method for classification of plant communities. J Ecol 55:193–206

  34. Orlóci L (1978) Multivariate analysis in vegetation research, 2nd edn. Dr. W, Junk, The Hague

  35. Peet RK, Roberts DW (2013) Classification of natural and semi-natural vegetation. In: van der Maarel E, Franklin J (eds) Vegetation ecology, 2nd edn. Wiley-Blackwell, Oxford, pp 26–62

  36. Pielou EC (1984) The interpretation of ecological data: a primer on classification and ordination. Wiley, New York

  37. Podani J (2005) Multivariate exploratory analysis of ordinal data in ecology: pitfalls, problems and solutions. J Veg Sci 16:497–510

  38. Podani J, Csányi B (2010) Detecting indicator species: some extensions of the IndVal measure. Ecol Indic 10:1119–1124

  39. Podani J, Feoli E (1991) A general strategy for the simultaneous classification of variables and objects in ecological data tables. J Veg Sci 2:435–444

  40. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

  41. Roberts DW (2008) Statistical analysis of multidimensional fuzzy set ordinations. Ecology 89:1246–1260

  42. Roberts DW (2013) Optimal partitioning of similarity relations. R package 2.1-1, R Foundation for Statistical Computing, Vienna 

  43. Rödder D, Engler JO (2011) Quantitative metrics of overlaps in Grinnellian niches: advances and possible drawbacks. Glob Ecol Biogeogr 20:915–927

  44. Roleček J, Tichý L, Zelený D, Chytrý M (2009) Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20:596–602

  45. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

  46. Schmidtlein S, Tichý L, Feilhauer H, Faude U (2010) A brute-force approach to vegetation classification. J Veg Sci 21:1162–1171

  47. Schmidtlein S (2012) isopam. R package 0.9-12, R Foundation for Statistical Computing, Vienna 

  48. Sokal RR, Rohlf FJ (1995) Biometry, 3rd edn. WH Freeman, New York

  49. Sokal RR, Sneath PHA (1963) Principles of numerical taxonomy. W.H. Freeman, San Francisco

  50. Sutcliffe PR, Pitcher CR, Caley MJ, Possingham HP (2012) Biological surrogacy in tropical seabed assemblages fails. Ecol Appl 22:1762–1771

  51. Tarsitano A (2003) A computational study of several relocation methods for \(k\)-means algorithms. Pattern Recognit 36:2955–2966

  52. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323

  53. Tichý L, Chytrý M, Hájek M, Talbot S, Botta-Dukát Z (2010) OptimClass: using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities. J Veg Sci 21:287–299

  54. Whittaker RH (1960) Vegetation of the Siskiyou Mountains. Ecol Monogr 30:279–338

  55. Williams WT, Lambert JM, Lance GN (1966) Multivariate methods in plant ecology: V. Similarity analysis and information analysis. J Ecol 54:427–445

Download references

Acknowledgments

I would like to thank Enrico Feoli; the first draft of the OPTPART algorithm was written while I was a Fellow at the Centro di Ecologia Teorica ed Applicata (CETA, Center for Theoretical and Applied Ecology) under sponsorship from Professor Feoli. The final version of the algorithm was written, while I was a Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) in Santa Barbara, California. I would like to thank Milan Chytrý for the use of the Dyje Valley data and Peter Minchin for the use of the Mt. Field data. I thank Lubomír Tichý and Laco Mucina for repeated discussions about TWINSPAN, and János Podani for discussions about reallocation algorithms in vegetation ecology among other subjects. I thank two anonymous reviewers for multiple recommendations and insights.

Author information

Correspondence to David W. Roberts.

Additional information

Communicated by P. R. Minchin and J. Oksanen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1 (PDF 28 kb)

Online Resource 2 (PDF 28 kb)

Online Resource 3 (PDF 28 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roberts, D.W. Vegetation classification by two new iterative reallocation optimization algorithms. Plant Ecol 216, 741–758 (2015). https://doi.org/10.1007/s11258-014-0403-2

Download citation

Keywords

  • Optimal reallocation clustering
  • Vegetation classification
  • TWINSPAN
  • OPTPART
  • OPTSIL
  • Flexible-β
  • K-means
  • PAM
  • ISOPAM