Advertisement

Plant Ecology

, Volume 216, Issue 5, pp 741–758 | Cite as

Vegetation classification by two new iterative reallocation optimization algorithms

  • David W. Roberts
Article

Abstract

This paper presents two new non-hierarchical iterative reallocation optimization algorithms for vegetation classification. The OPTimal PARTitioning algorithm (OPTPART) optimizes the ratio of within-cluster similarity to among-cluster similarity; the OPTimal SILhouette algorithm (OPTSIL) optimizes the difference between the similarity of each sample to the cluster to which it is assigned and its similarity to the most similar cluster. The algorithms were tested on three vegetation datasets (Mt. Field Massif, Tazmania, Australia; Podyj/Thayatal National Park, Austria/Czech Republic; and Shoshone National Forest, Wyoming, USA) using three dissimilarity/distance matrices (Bray-Curtis, chord distance, and Hellinger distance) and compared to five other commonly used or recently introduced vegetation classification algorithms (flexible-β, TWINSPAN, PAM, ISOPAM, and K-means) using eight goodness-of-clustering evaluators. Five of the eight evaluators were species-based and operate on the distribution of individual taxa among clusters; three were community-based and operate on the compositional similarity of clusters. OPTPART was initialized from random partitions and from the results of a flexible-β classification as the initial partition; OPTSIL was initialized from partitions resulting from OPTPART, flexible-β, and K-means classifications. Algorithms were ranked from best to worst on each clustering evaluator for each dissimilarity/distance matrix for each dataset, and summarized by median ranks. OPTPART, SIL/OPT (OPTSIL from an OPTPART initial partition), and SIL/FLEX (OPTSIL from a flexible-β initial partition) ranked 1–3 respectively for results pooled across all three datasets and dissimilarity/distance matrices. OPTPART, SIL/OPT, and SIL/FLEX consistently ranked 1–3 across the individual datasets, although the order varied slightly by dataset.

Keywords

Optimal reallocation clustering Vegetation classification TWINSPAN OPTPART OPTSIL Flexible-β K-means PAM ISOPAM 

Notes

Acknowledgments

I would like to thank Enrico Feoli; the first draft of the OPTPART algorithm was written while I was a Fellow at the Centro di Ecologia Teorica ed Applicata (CETA, Center for Theoretical and Applied Ecology) under sponsorship from Professor Feoli. The final version of the algorithm was written, while I was a Fellow at the National Center for Ecological Analysis and Synthesis (NCEAS) in Santa Barbara, California. I would like to thank Milan Chytrý for the use of the Dyje Valley data and Peter Minchin for the use of the Mt. Field data. I thank Lubomír Tichý and Laco Mucina for repeated discussions about TWINSPAN, and János Podani for discussions about reallocation algorithms in vegetation ecology among other subjects. I thank two anonymous reviewers for multiple recommendations and insights.

Supplementary material

11258_2014_403_MOESM1_ESM.pdf (27 kb)
Online Resource 1 (PDF 28 kb)
11258_2014_403_MOESM2_ESM.pdf (27 kb)
Online Resource 2 (PDF 28 kb)
11258_2014_403_MOESM3_ESM.pdf (27 kb)
Online Resource 3 (PDF 28 kb)

References

  1. Aho K, Roberts DW, Weaver T (2008) Using geometric and non-geometric internal evaluators to compare eight vegetation classification results. J Veg Sci 19:549–562CrossRefGoogle Scholar
  2. Belbin L (1987) The use of non-hierarchical allocation methods for clustering large sets of data. Aust J Compet 19:32–41Google Scholar
  3. Belbin L, McDonald C (1993) Comparing three classification strategies for use in ecology. J Veg Sci 4:341–348CrossRefGoogle Scholar
  4. Bray JR, Curtis JT (1957) An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr 27:326–349CrossRefGoogle Scholar
  5. Bruelheide H, Chytrý M (2000) Towards unification of national vegetation classifications: a comparison of two methods for analysis of large data sets. J Veg Sci 11:295–306CrossRefGoogle Scholar
  6. Chytrý M, Vicherek J (1995) Lesní Vegetace Národní ho Parku Podyjí/Thayatal [Forest vegetation of the National Park Podyj/Thayatal]. Academia, PrahaGoogle Scholar
  7. Chytrý M, Tichý L, Holt J, Botta-Dukát Z (2002) Determination of diagnostic species with statistical fidelity measures. J Veg Sci 13:79–90CrossRefGoogle Scholar
  8. Dale MB (1991) Knowing when to stop: cluster concept–concept cluster. In: Feoli EL, Orlóci L (eds) Computer-assisted vegetation analysis. Kluwer Academic Publishers, Dordrecht, pp 149–171CrossRefGoogle Scholar
  9. De’ath G (1999) Extended dissimilarity: a method of robust estimation of ecological distances from high beta diversity data. Plant Ecol 144:191199Google Scholar
  10. De Cáceres M, Wiser SK (2012) Towards consistency in vegetation classification. J Veg Sci 23:387–393CrossRefGoogle Scholar
  11. De Cáceres M, Wiser SK (2013) Updating vegetation classifications: an example with New Zealand’s woody vegetation. J Veg Sci 24:80–93CrossRefGoogle Scholar
  12. Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible asymmetric approach. Ecol Monogr 67:345–367Google Scholar
  13. Faith DP, Minchin PR, Belbin L (1987) Compositional dissimilarity as a robust measure of ecological distance: a theoretical model and computer simulations. Vegetatio 69:57–68CrossRefGoogle Scholar
  14. Goodall DW (1973) Sample similarity and species correlation. In: Whittaker RH (ed) Ordination and classification of communities. Handbook for vegetation science. Dr. W. Junk, The HagueGoogle Scholar
  15. Hartigan JA, Wong MA (1979) A \(K\)-means clustering algorithm. Appl Stat 28:100–108CrossRefGoogle Scholar
  16. Hill MO, Bunce RGH, Shaw MW (1975) Indicator species analysis, a divisive polythetic method of classification, and its application to a survey of native pinewoods in Scotland. J Ecol 63:597–613CrossRefGoogle Scholar
  17. Hill MO (1979) Twinspan—a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Ecology and Systematics. Cornell University, IthacaGoogle Scholar
  18. Kampichler C, van der Jeugd HP (2013) Determining patterns of variability in ecological communities: time lag analysis revisited. Environ Ecol Stat 20:271–284CrossRefGoogle Scholar
  19. Kaufman L, Rousseeuw PJ (1990) Finding groups in data. Wiley, New YorkCrossRefGoogle Scholar
  20. Lance GN, Williams WT (1966) Computer programs for hierarchical polythetic classification (“similarity analyses”). Comput J 9:60–64CrossRefGoogle Scholar
  21. Lance GN, Williams WT (1967a) A general theory of classificatory sorting strategies: 1. Hierarchical systems. Comput J 9:373–380CrossRefGoogle Scholar
  22. Lance GN, Williams WT (1967b) A general theory of classificatory sorting strategies: II. Clustering systems. Comput J 10:271–277CrossRefGoogle Scholar
  23. Legendre P, De Cáceres M (2013) Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecol Lett 16:951–963CrossRefPubMedGoogle Scholar
  24. Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129:271–280CrossRefGoogle Scholar
  25. Legendre P, Legendre L (2012) Numerical ecology, 3rd edn. Elsevier, AmsterdamGoogle Scholar
  26. Lötter MC, Mucina L, Witkowski, ETF (2013) The classification conundrum: species fidelity as leading criterion in search of a rigorous method to classify a complex forest data set. Community Ecol 14:121–132CrossRefGoogle Scholar
  27. Maechler M, Rousseeuw PJ, Struyf A, Hubert M, Hornik K (2013) Cluster: Cluster analysis basics and extensions. R package 1.14.4, R Foundation for Statistical Computing, ViennaGoogle Scholar
  28. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. University of California Press, Berkeley, pp 281–297Google Scholar
  29. McCune B, Grace JB (2002) Analysis of ecological communities. MjM Software, Gleneden BeachGoogle Scholar
  30. Minchin PR (1983) A comparative evaluation of techniques for ecological ordination using simulated vegetation data and an integrated ordination–classification analysis of the alpine and subalpine plant communities of the Mt. Field plateau, Tasmania. Ph. D. thesis, University of TasmaniaGoogle Scholar
  31. Minchin PR (1987) An evaluation of the relative robustness of techniques for ecological ordination. Vegetatio 69:57–68CrossRefGoogle Scholar
  32. Minchin PR (1989) Montane vegetation of the Mt. Field massif, Tasmania: a test of some hypotheses about properties of community patterns. Vegetatio 83:97–110CrossRefGoogle Scholar
  33. Orlóci L (1967) An agglomerative method for classification of plant communities. J Ecol 55:193–206CrossRefGoogle Scholar
  34. Orlóci L (1978) Multivariate analysis in vegetation research, 2nd edn. Dr. W, Junk, The HagueGoogle Scholar
  35. Peet RK, Roberts DW (2013) Classification of natural and semi-natural vegetation. In: van der Maarel E, Franklin J (eds) Vegetation ecology, 2nd edn. Wiley-Blackwell, Oxford, pp 26–62Google Scholar
  36. Pielou EC (1984) The interpretation of ecological data: a primer on classification and ordination. Wiley, New YorkGoogle Scholar
  37. Podani J (2005) Multivariate exploratory analysis of ordinal data in ecology: pitfalls, problems and solutions. J Veg Sci 16:497–510CrossRefGoogle Scholar
  38. Podani J, Csányi B (2010) Detecting indicator species: some extensions of the IndVal measure. Ecol Indic 10:1119–1124CrossRefGoogle Scholar
  39. Podani J, Feoli E (1991) A general strategy for the simultaneous classification of variables and objects in ecological data tables. J Veg Sci 2:435–444CrossRefGoogle Scholar
  40. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  41. Roberts DW (2008) Statistical analysis of multidimensional fuzzy set ordinations. Ecology 89:1246–1260CrossRefPubMedGoogle Scholar
  42. Roberts DW (2013) Optimal partitioning of similarity relations. R package 2.1-1, R Foundation for Statistical Computing, Vienna Google Scholar
  43. Rödder D, Engler JO (2011) Quantitative metrics of overlaps in Grinnellian niches: advances and possible drawbacks. Glob Ecol Biogeogr 20:915–927CrossRefGoogle Scholar
  44. Roleček J, Tichý L, Zelený D, Chytrý M (2009) Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity. J Veg Sci 20:596–602CrossRefGoogle Scholar
  45. Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65CrossRefGoogle Scholar
  46. Schmidtlein S, Tichý L, Feilhauer H, Faude U (2010) A brute-force approach to vegetation classification. J Veg Sci 21:1162–1171CrossRefGoogle Scholar
  47. Schmidtlein S (2012) isopam. R package 0.9-12, R Foundation for Statistical Computing, Vienna Google Scholar
  48. Sokal RR, Rohlf FJ (1995) Biometry, 3rd edn. WH Freeman, New YorkGoogle Scholar
  49. Sokal RR, Sneath PHA (1963) Principles of numerical taxonomy. W.H. Freeman, San FranciscoGoogle Scholar
  50. Sutcliffe PR, Pitcher CR, Caley MJ, Possingham HP (2012) Biological surrogacy in tropical seabed assemblages fails. Ecol Appl 22:1762–1771CrossRefPubMedGoogle Scholar
  51. Tarsitano A (2003) A computational study of several relocation methods for \(k\)-means algorithms. Pattern Recognit 36:2955–2966CrossRefGoogle Scholar
  52. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323CrossRefGoogle Scholar
  53. Tichý L, Chytrý M, Hájek M, Talbot S, Botta-Dukát Z (2010) OptimClass: using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities. J Veg Sci 21:287–299Google Scholar
  54. Whittaker RH (1960) Vegetation of the Siskiyou Mountains. Ecol Monogr 30:279–338CrossRefGoogle Scholar
  55. Williams WT, Lambert JM, Lance GN (1966) Multivariate methods in plant ecology: V. Similarity analysis and information analysis. J Ecol 54:427–445CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Ecology DepartmentMontana State UniversityBozemanUSA

Personalised recommendations