In this paper, I re-examine the subtropical rainforest succession previously studied by Williams, Lance, Webb, Tracey and Dale (1969) (WLWTD) using a clustering procedure based on the Minimal Message Length principle of induction. This principle permits the optimal number of clusters to be estimated automatically. Optimality is defined here as a trade-off between quality of fit and complexity of model, both measured in message length units.
Because of the common unit of measurement, we can assess the numerical effectiveness of the procedures adopted in the previous study and compare the results obtained by using density as against presence/absence data or the value of numeric data independent of presence/absence effects. The results also bear on the “principle of explicability” which posits that users seek interpretable results, even if they are less efficient in purely numerical terms.
The optimal density result identified 8 clusters, although these were further clustered into 3 higher level groupings. The pattern of 2 temporal stages followed by spatial segregation is clear, with extra detail concerning aberrant stands and temporal dependency in the third spatial stage also apparent. This analysis was the most effective at recovering structure in the data, of those examined.
Imposing the WLWTD analysis on density data was markedly suboptimal and even the number of clusters recognised (7) was strictly incorrect. However, by subjective interpretation WLWTD selected a number of clusters which was very close to the optimal density solution. For this reason insight gained into the processes operating was not overly compromised. The optimal density result cleans up a few corners and adds more detail but the main outlines are sufficiently clear in the subjectively assessed presence data.
The results from optimal presence/absence analysis were understandable and effective, though considerably less detailed than those obtained using the density data or those from WLWTD’s original analyses. Indeed the 3 clusters established using the presence data reflect the higher level of structure which is recognisable in the density result. Using numeric data with 0 values set to missing values, showed little of interest.
Invocation of Kodratoff’s principle of explicability, which argues for interpretability to dominate efficiency, was unnecessary since the efficient analyses were directly interpretable. The introduction of domain knowledge during the subjective interpretation in the original analysis was apparently sufficient to counter any losses due to the inefficiency of the clustering method. Given more effective clustering methods and using the density data, it becomes unnecessary.
Minimal Message Length
Numeric data with 0 values set to missing values
Williams, Lance, Webb, Tracey and Dale (1969)
Anand, M. 2000. Fundamentals of vegetation change: complexity rules. Acta Biotheoretica 48: 1–14.
Austin, M. P. 1970. An applied ecological example of mixed data classification. In: R. S. Anderssen and M. R. Osborne (eds.), Data Representation, Univ. Queensland Press, Brisbane. pp. 113–117.
Barsalou, L. W. 1995. Deriving categories to achieve goals. In: A. Ram and D. B. Leake (eds.), Goal Directed Learning. MIT Press Cambridge MA. pp. 121–176.
Boerlijst, M. C. and P. Hogeweg. 1991. Spiral wave structure in pre-biotic evolution: hypercycles stable against parasites. Physica D 48: 17–28.
Boulton, D. M. and C. S. Wallace 1970. A program for numerical classification. Comput. J. 13: 63–69.
Boulton, D. M. and C. S. Wallace. 1973. An information measure for hierarchic classification. Comput. J. 16: 254–261.
Brokaw, N. and R. T. Busing. 2000. Niche versus chance in tree diversity in forest gaps. TREE 15: 183–188.
Bunge, M. 1969. Metaphysics, epistemology and methodology of levels. In: L. L. Whyte, A. G. Wilson and D. Wilson (eds.), Hierarchic Structures, American Elsevier, New York. pp. 17–28.
Critchley, C. N. R. 2000. Ecological assessment of plant communities by reference to species traits and habitat preferences. Biodiversity and Conservation 9: 87–100.
Dale, M. B. 1976. Hierarchy and level: prolegomena to a cladistic classification Tech. Memo. 1, CSIRO Division of Tropical Crops and Pastures, St. Lucia, Brisbane.
Dale, M.B. 1999. The dynamics of diversity: mixed strategy systems. Coenoses 13: 105–113.
Dale, M. B. and D.J. Anderson. 1973. Inosculate analysis of vegetation data. Austral. J. Bot. 21: 253–276.
Dale, M. B. and M. M. Barson. 1989. On the use of grammars in vegetation science. Vegetatio 81: 79–94.
Dale, M. B. and P. Hogeweg. 1998. The dynamics of diversity: a cellular automaton approach. Coenoses 13: 3–15.
Dale, M. B. and D. Walker. 1970. Information analysis of pollen diagrams. Pollen et Spores 2: 21–37.
Diday, E. 1988. The symbolic approach in clustering and related methods of data analysis: the basic choices. In: H. H. Bock (ed.), Classification and Related Methods of Data Analysis, North Holland, Amsterdam. pp. 673–683.
Edgoose, T and L. Allison. 1999. MML Markov classification of sequential data. Statistics and Computing 9: 269–278.
Edwards, R. T. and D. Dowe. 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Art. Intell 1394 Springer, pp. 96–109.
Gatsuk, L. E., O. V. Smirnova, L. I. Vorontzova, L. B. Zaugolnova and L. Zhukova. 1980. Age states of plants of various growth forms: a review. J. Ecol. 68: 675–696.
Hilderman, R. J. and H. J. Hamilton. 1999. Heuristics for ranking the interestingness of discovered knowledge. Proc. 3rd Pacific-Asia Conf. Knowledge Discovery PKDD’99, Beijing, Springer Verlag Berlin. pp. 204–209.
Huisman, J., H. Olff, and L. F. M. Fresco. 1993. A hierarchical set of models for species response analysis. J. Vegetation Science 4: 37–46.
Kodratoff, Y. 1986. Leçons d’apprentissage symbolique, Cepaduesed., Toulouse.
Kullback, S. and R. A. Leibler. 1951. On information and sufficiency. Ann. Math. Statist. 22: 79–86.
Legendre, P. and E. Gallagher. 2000. Ecologically meaningful transformations for ordination biplots of species data. Ecology (submitted).
Mackay, D. M. 1969. Recognition and action. In: S. Watanabe (ed.), Methodologies of Pattern Recognition. Academic Press, London. pp. 409–416.
Pazzani, M. J. and D. Kibler. 1992. The utility of knowledge in inductive learning. Machine Learning 9: 57–94.
Quinlan, R. and R. L. Rivest. 1989. Inferring decision trees using the Minimum Description Length Principle. Information and Computation 80: 227–248.
Rissanen, J. (1995) Stochastic complexity in learning. In: P. Vitányi (ed.), Computational Learning Theory, Lecture Notes in Computer Science, 904. Springer Verlag, Berlin, pp. 196–201.
Stevens, W. L. 1937. Significance of grouping. Ann. Eug. Lond. 8: 57–69.
Wallace, C. S. 1995 Multiple factor analysis by MML estimation Tech Rep. 95/218, Dept. Computer Science, Monash University, Australia.
Wallace C. S. 1998. Intrinsic classification of spatially-correlated data. Comput. J. 41: 602–611.
Wallace, C. S. and D. M. Boulton. 1968. An information measure for classification. Comput. J. 11: 185–195.
Wallace, C. S. and D. L. Dowe. 2000. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10: 73–83.
Watanabe, S. 1969. Knowing and Guessing. J. Wiley, New York.
Wildi, O. and M. Schütz. 2000. Reconstruction of a long-term recovery process from pasture to forest. Community Ecology 1: 25–32.
Williams, W. T. and M. B. Dale. 1962. Partitioned correlation matrices for heterogenous quantitative data. Nature 196: 502.
Williams, W. T., J. M. Lambert, and G. N. Lance. 1966. Multivariate methods in plant ecology. V. Similarity analysis and information analysis. J. Ecol. 54: 427–446.
Williams, W. T., G. N. Lance, L. J. Webb, J. G. Tracey and M. B. Dale. 1969. Studies in the numerical analysis of complex rain-forest communities. III. The analysis of successional change. J. Ecol. 57: 513–535.
About this article
Cite this article
Dale, M.B. Mt Glorious revisited: secondary succession in subtropical rainforest. COMMUNITY ECOLOGY 1, 181–193 (2000). https://doi.org/10.1556/ComEc.1.2000.2.8
- Minimum Message Length