Abstract
In previous studies a minimum message length fuzzy clustering method was applied to vegetation data and shown to give sensible estimates for the number of clusters as well as consistent estimates of cluster parameters. The minimum message length method provides a principled method of choosing between models and between classes of models. It comprises 2 components; one coding the model and its associated (meta)parameter values, the other coding the data, given the model. The program uses uncorrelated Gaussian distributions as a model for the distribution of attributes within clusters. This assumption may not be acceptable and in this paper a more general model, the t-distribution, has been examined. The t-distribution provides a class of thick-tailed models, while including the Gaussian as a subclass. This should be appropriate in hierarchical clustering where, even if the final clusters had internal Gaussian distributions, the upper levels would not. In addition, it may provide a better model of within-cluster distribution of the attributes even in the final clusters. Although forcing the use of t-distributions was not profitable, allowing a choice between Gaussian and t-distributions for each attribute in each class resulted in improved results. This was despite only one attribute actually selecting the t-distribution over the Gaussian.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agusta, Y. and D. L. Dowe. 2002. MML clustering of continuous-valued data using Gaussian and t distributions. In: B. McKay and J. Slaney (eds.), Lecture Notes on Artificial Intelligence 2557. Springer, Berlin. pp. 143–154.
Agusta, Y. and D. L. Dowe. 2003a. Unsupervised learning of gamma mixture models using Minimum Message Length, (to appear). In: Proc. 3rd IASTED International Conference on Artificial Intelligence and Applications (AIA 2003), ACTA Press, Calgary. pp. 457–462.
Agusta, Y. and D. L. Dowe. 2003b. Unsupervised learning of corre-lated multivariate Gaussian mixture models using MML. Lecture Notes in Artificial Intelligence (LNAI) 2903, Springer, Berlin. pp. 477–489.
Bouguila, N. and D. Ziou. 2006. Unsupervised selection of a finite dirichlet mixture model: An MML-based approach. IEEE Transactions on Knowledge and Data Engineering 18: 993–1009.
Dale, M. B. 2000. Mt Glorious revisited: secondary succession in subtropical rainforest. Community Ecology 1: 181–193.
Dale, M. B. 2001. Minimal message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecology 2: 171–180.
Dale, M. B. 2002. Models, measures and messages: an essay on the role for induction. Community Ecology 3: 191–204.
Dale, M. B. 2005. On gradients and response curves. Community Ecology 6: 155–166.
Dale, M. B., L. Allison and P. E. R. Dale. 2007. Segmentation and clustering as complementary sources of information. Acta Oe-cologica 30:1–10.
Dale, M. B., L. Salmina and L. Mucina. 2001. Minimum message length clustering: an explication and some applications to vegetation data. Community Ecology 2: 231–247.
Dale, P. E. R. and M. B. Dale. 2002. Optimal classification to describe environmental change: pictures from the exposition. Community Ecology 3: 19–30.
Gitay, H. and A. D. Q. Agnew. 1989. Plant community structure, connectance, niche limitation and species guilds within a dune slack grassland. Vegetatio 83: 241–248.
Needham, S. L. and D. L. Dowe. 2001. Message length as an effective Ockham’s razor in decision tree induction. In: Proc. 8 International Workshop on Artificial Intelligence and Statistics (AI+STATS 2001), Key West, Florida, U.S.A. pp. 253–260.
Wallace, C. S. and D. M. Boulton. 1968. An information measure for classification. Computer Journal 11: 185–194.
Wallace, C. S. and M. B. Dale. 2005. Hierarchical clusters of vegetation types. Community Ecology 6: 57–74.
Author information
Authors and Affiliations
Rights and permissions
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dale, M.B. Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data. COMMUNITY ECOLOGY 8, 9–13 (2007). https://doi.org/10.1556/ComEc.8.2007.1.2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1556/ComEc.8.2007.1.2