Minimum message length clustering: an explication and some applications to vegetation data

Dale, M. B.; Salmina, L.; Mucina, L.

doi:10.1556/ComEc.2.2001.2.11

Minimum message length clustering: an explication and some applications to vegetation data

Open access
Published: 30 December 2001

Volume 2, pages 231–247, (2001)
Cite this article

Download PDF

You have full access to this open access article

Community Ecology Aims and scope Submit manuscript

Minimum message length clustering: an explication and some applications to vegetation data

Download PDF

M. B. Dale¹,
L. Salmina² &
L. Mucina³

93 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we examine the application of a particular approach to induction, the minimum message length principle and illustrate some of the problems that can be addressed through its use. The MML principle seeks to identify an optimal model within some specified parameterised class of models and for this paper we have chosen to concentrate on a single model class, that of mixture separation or fuzzy clustering. The first section presents, in outline, an MML methodology for fuzzy clustering. We then present some applications, including the nature of the within-cluster model, examination of the univocality of results for different groups of species and the effectiveness of presence data compared to purely quantitative data. Finally, we examine some possibilities of extending MML methodology to include within-class correlation of species, the existence of dependence between observed samples and the comparison of different classes of models.

Article PDF

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Abbreviations

MML:: Minimum Message Length
MDL:: Minimum Description Length

References

Akaike, H. 1978. A Bayesian analysis of the minimum AIC procedure. Annals Inst. Statist. Mathematics 30:9–14.
Article Google Scholar
Arabie, P. and J. D. Carroll. 1980. MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235.
Article Google Scholar
Austin, M. P. 1970. An applied ecological example of mixed data classification. In: R. S. Anderssen and M. R. Osborne (eds.), Data Representation. Univ. Queensland Press, Brisbane, pp. 113–117.
Google Scholar
Babad, Y. M. and J. A. Hoffer. 1984. Even no data has value. Commun. Assoc. Comput. Mach. 27: 748–756.
Google Scholar
Bezdek, J. C. 1974. Numerical taxonomy with fuzzy sets. J. Math. Biol. 1: 57–71.
Article Google Scholar
Boerlijst, M. C. and P. Hogeweg. 1991. Spiral wave structure in prebiotic evolution: hypercycles stable against parasites. Physica D 48: 17–28.
Google Scholar
Boik, R. J. 1987. The Fisher-Pitman permutation test: a non-robust alternative to the normal theory F-test when variances are heterogeneous. Brit. J. Math. Statist. Psychol. 40:26–42.
Article Google Scholar
Boulton, D. M. and C. S. Wallace. 1970. A program for numerical classification. Comput. J. 13: 63–69.
Article Google Scholar
Boulton, D. M. and C. S. Wallace. 1973. An information measure for hierarchic classification. Comput. J. 16: 254–261.
Article Google Scholar
Boulton, D. M. and C. S. Wallace. 1975. An information measure for single-link classification. Comput J. 18: 236–238.
Article Google Scholar
Bradfield, G. E. and N. C. Kenkel. 1987. Nonlinear ordination using flexible shortest path adjustment of ecological distance. Ecology 68: 750–753.
Article Google Scholar
Carley, K. and M. Palmquist. 1992. Extracting, representing and analyzing mental models Social Forces 70: 601–636.
Article Google Scholar
Chaitin, G. J. 1966. On the length of programs for computing finite sequences. J. Assoc. Comput. Mach. 13:547–549.
Article Google Scholar
Chatfield, C. 1995. Model uncertainty, data mining and statistical inference J. Royal Statistical Soc. Series A 158: 419–466.
Article Google Scholar
Dale, M. B. 1987. Knowing when to stop: cluster concept-concept cluster. Coenoses 3: 11–32.
Google Scholar
Dale, M. B. 1988. Some fuzzy approaches to phytosociology: ideals and instances. Folia Geobot. Phytotax. 23: 239–274.
Article Google Scholar
Dale, M. B. 1994. Straightening the horseshoe: a Riemannian resolution? Coenoses 9: 43–53.
Google Scholar
Dale, M. B. 1999. The dynamics of diversity: mixed strategy systems. Coenoses 13:105–113.
Google Scholar
Dale, M. B. 2000a. On plexus representation of dissimilarities. Community Ecol. 1: 43–56.
Article Google Scholar
Dale, M. B. 2000b. Mt Glorious revisited: secondary succession in subtropical rainforest. Community Ecol. 1: 181–193.
Article Google Scholar
Dale, M. B. 2001. Minimum message length clustering, environmental heterogeneity and the variable Poisson model. Community Ecol. 2:171–180.
Article Google Scholar
Dale, M. B. (submitted) Models, measures and messages: a role for induction.
Dale, M. B. and P. Hogeweg. 1998. The dynamics of diversity: a cellular automaton approach. Coenoses 13:3–15.
Google Scholar
Edgoose, T. and L. Allison. 1999. MML Markov classification of sequential data. Statistics and Computing 9: 269–278.
Article Google Scholar
Edwards, R. T. and D. Dowe. 1998. Single factor analysis in MML mixture modelling. Lecture Notes in Artificial Intelligence 1394, Springer-Verlag, pp. 96–109.
Ganesalingam, S. and G. J. McLachlan. 1980. A comparison of the mixture and classification approaches to cluster analysis. Commun. Statist. Theor Meth. A9: 923–933.
Article Google Scholar
Goodall, D. W. and E. Feoli. 1988. Application of probabilistic methods in the analysis of phytosociological data. Coenoses 1: 1–10.
Google Scholar
Gordon, A. D. 1994. Identifying genuine clusters in a classification. Comput. Statist. Data Analysis 18: 561–581.
Article Google Scholar
Hayes, A. F. 1996. Permutation test is not distribution free. Psychol. Methods 1: 184–198.
Article Google Scholar
Hill, M. O., R. G. H. Bunce and M. W. Shaw. 1975. Indicator species analysis: a divisive polythetic method of classification and its application to a survey of native pinewoods in Scotland. J. Ecol. 63: 597–613.
Article Google Scholar
Hoffman, R. L. and A. K. Jain. 1987. Sparse decomposition for exploratory pattern analysis. I. E. E. E. Trans. Patt. Anal. Mach. Intell. PAMI-9: 551–560.
Google Scholar
Hubert, L. and P. Arabie. 1994. The analysis of proximity matrices through sums of matrices having (anti-)Robinson forms. Brit. J. Math. Statist. Psychol. 47:1–40.
Article Google Scholar
Kolmogorov, A. N. 1965. Three approaches to the quantitative description of information. Prob. Inform. Transmission 1: 4–7 (translation).
Google Scholar
Krishna-Iyer, P. V. 1949. The first and second moments of some probability distributions arising from points on alattice and their application. Biometrika 36: 135–141.
Article Google Scholar
Legendre, P. and E. D. Gallagher. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280.
Article PubMed Google Scholar
Li, C. and G. Biswas. 1999. Temporal pattern generation using hidden Markov model-based unsupervised classification. In: Advances in Intelligent Data Analysis, Lecture Notes in Computer Science 1642, Springer-Verlag, Berlin, pp. 245–256.
Google Scholar
Li, C. and G. Biswas. 2000. Bayesian temporal data clustering using hidden Markov model representation. In: P. Langley (ed.), Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA. pp. 543–550.
Google Scholar
Liu, R. Y., J. M. Parelius and K. Singh. 1999. Multivariate analysis by data depth: descriptive statistics (with discussion). Ann. Statist. 27:783–885.
Google Scholar
Lux, A. 2000. Die Dynamik der Kraut-Gras-Schicht in einem Mittelund Niederwaldsystem. Untersuchungen im Gebiet des Kehrenbergs bei Bad Windsheim. Dissertationes Botanicae Vol. 333.
Lux, A. and F. A. Bemmerlein-Lux 1998. Two vegetation maps of the same island: floristic units versus structural units. Appl. Veg. Sci. 1:201–210.
Article Google Scholar
Oliver, J. J. and C. S. Forbes. 1997. Bayesian approaches to segmenting a simple time series. Tech. Rep. 97/336 Dept. Comput. Sci. Software Engineering, Monash University. Clayton, Victoria 3168, Australia..
Pillar, V. D. 1996. A randomization-based solution for vegetation classification and homogeneity testing. Coenoses 11: 29–36.
Google Scholar
Richardson, S. and P.J. Green. 1997. On Bayesian analysis of mixtures with an unknown number of components. J. Roy. Statist. Soc. B 59: 731–792.
Article Google Scholar
Rissanen, J. 1983. A universal prior for integers and estimation by minimum description length. Annals of Statistics 11: 416–431.
Article Google Scholar
Rissanen, J. 1995. Stochastic complexity in learning. In: P. Vitányi (ed.), Computational Learning Theory, Lecture Notes in Computer Science 904, Springer Verlag, Berlin, pp. 196–201.
Google Scholar
Robinson, P. A. 1954. The distribution of plant populations. Ann. Bot. 18: 35–45.
Article Google Scholar
Sandland, R. L. and P. C. Young. 1979. Probabilistic tests and stopping rules associated with hierarchical classification techniques. Aust. J. Ecol. 4: 399–406.
Article Google Scholar
Savill, N. J., P. Rohani and P. Hogeweg. 1997. Self-reinforcing spatial patterns enslave evolution in a host-parasitoid system. J. theoret. Biol. 188: 11–20.
Article CAS Google Scholar
Shipley, B. and P. A. Keddy. 1987. The individualistic and community-unit concepts as falsifiable hypotheses. Vegetatio 69: 47–55.
Article Google Scholar
Stevens, W. L. 1937. Significance of grouping. Ann. Eug. London. 8:57–69.
Article Google Scholar
Van der Maarel, E. 1990. Ecotones and ecoclines are different. J. Veg. Sci. 1:135–138.
Article Google Scholar
Viswanathan, M., C. S. Wallace, D. L. Dowe and K. B. Korb. 1999. Finding outpoints in noisy binary sequences: a revised empirical examination. In: N. Foo (ed.), AI-99 Lecture Notes in Artificial Intelligence 1747, Springer-Verlag, Berlin, pp. 405–416.
Google Scholar
Wallace, C. S. 1990. Classification by minimum message length inference. In: G. Goos and J. Hartmanis (eds.), Advances in Computing and Information – ICCI’90, Springer-Verlag, Berlin, pp. 72–81.
Chapter Google Scholar
Wallace, C. S. 1995. Multiple factor analysis by MML estimation. Tech. Rep. 95/218, Dept Computer Science, Monash University, Clayton, Victoria 3168, Australia. 21 pp.
Wallace, C. S. 1998. Intrinsic classification of spatially correlated data. Comput. J. 41: 602–611.
Article Google Scholar
Wallace, C. S. and D. L. Dowe. 2000. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing 10: 73–83.
Article Google Scholar
Wallace, C. S. and P. R. Freeman. 1987. Estimation and inference by compact coding. J. Roy. Statist. Soc. Ser. B 49: 240–252.
Google Scholar
Wallace, C. S. and P. R. Freeman. 1992. Single factor analysis by minimum message length estimation. J. Roy. Statist. Soc. Ser. B 54: 195–209.
Google Scholar
Watanabe, S. 1969. Knowing and Guessing. Wiley, New York.
Google Scholar
Williams, W. T. and M. B. Dale. 1962. Partitioned correlation matrices for heterogenous quantitative data. Nature 196: 502.
Google Scholar
Williams, W. T., G.N. Lance, L.J. Webb, J.G. Tracey. and J.H. Connell. 1969. Studies in the numerical analysis of complex rainforest communities IV A method for the elucidation of small scale pattern. J. Ecol. 57: 635–654.
Article Google Scholar
Yarranton, G. A., W. J. Beasleigh, R. G. Morrison and M. I. Shafti. 1972. On the classification of phytosociological data into nonexclusive groups with a conjecture about determining the optimum number of groups in a classification. Vegetatio 24: 1–12.
Article Google Scholar

Download references

Acknowledgments

Our thanks to Sanyi Bartha who provided some extremely useful and important comments on an earlier draft.

Author information

Authors and Affiliations

Australian School of Environmental Studies, Griffith University, Nathan, Qld, 4111, Australia
M. B. Dale
Department of Botany and Ecology, University of Latvia, 4 Kronvalda Blvd, LV1586, Riga, Latvia
L. Salmina
School of Life Sciences, University of the North. Qwa-Qwa Campus, Private Bag X13, 9866, Phuthaditjhaba, South Africa
L. Mucina

Authors

M. B. Dale
View author publications
You can also search for this author in PubMed Google Scholar
L. Salmina
View author publications
You can also search for this author in PubMed Google Scholar
L. Mucina
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. B. Dale.

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Dale, M.B., Salmina, L. & Mucina, L. Minimum message length clustering: an explication and some applications to vegetation data. COMMUNITY ECOLOGY 2, 231–247 (2001). https://doi.org/10.1556/ComEc.2.2001.2.11

Download citation

Published: 30 December 2001
Issue Date: June 2001
DOI: https://doi.org/10.1556/ComEc.2.2001.2.11

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Minimum message length clustering: an explication and some applications to vegetation data

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimum message length clustering: an explication and some applications to vegetation data

Abstract

Article PDF

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation