Statistics and Computing

, Volume 10, Issue 1, pp 73–83

MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions


  • Chris S. Wallace
    • Computer Science and Software EngineeringMonash University
  • David L. Dowe
    • Computer Science and Software EngineeringMonash University

DOI: 10.1023/A:1008992619036

Cite this article as:
Wallace, C.S. & Dowe, D.L. Statistics and Computing (2000) 10: 73. doi:10.1023/A:1008992619036


Minimum Message Length (MML) is an invariant Bayesian point estimation technique which is also statistically consistent and efficient. We provide a brief overview of MML inductive inference (Wallace C.S. and Boulton D.M. 1968. Computer Journal, 11: 185–194; Wallace C.S. and Freeman P.R. 1987. J. Royal Statistical Society (Series B), 49: 240–252; Wallace C.S. and Dowe D.L. (1999). Computer Journal), and how it has both an information-theoretic and a Bayesian interpretation. We then outline how MML is used for statistical parameter estimation, and how the MML mixture modelling program, Snob (Wallace C.S. and Boulton D.M. 1968. Computer Journal, 11: 185–194; Wallace C.S. 1986. In: Proceedings of the Nineteenth Australian Computer Science Conference (ACSC-9), Vol. 8, Monash University, Australia, pp. 357–366; Wallace C.S. and Dowe D.L. 1994b. In: Zhang C. et al. (Eds.), Proc. 7th Australian Joint Conf. on Artif. Intelligence. World Scientific, Singapore, pp. 37–44. See uses the message lengths from various parameter estimates to enable it to combine parameter estimation with selection of the number of components and estimation of the relative abundances of the components. The message length is (to within a constant) the logarithm of the posterior probability (not a posterior density) of the theory. So, the MML theory can also be regarded as the theory with the highest posterior probability. Snob currently assumes that variables are uncorrelated within each component, and permits multi-variate data from Gaussian, discrete multi-category (or multi-state or multinomial), Poisson and von Mises circular distributions, as well as missing data. Additionally, Snob can do fully-parameterised mixture modelling, estimating the latent class assignments in addition to estimating the number of components, the relative abundances of the parameters and the component parameters. We also report on extensions of Snob for data which has sequential or spatial correlations between observations, or correlations between attributes.

clusteringmixture modellingminimum message lengthMMLSnobinductioncodinginformation theorystatistical inferencemachine learningclassificationintrinsic classificationunsupervised learningnumerical taxonomy

Copyright information

© Kluwer Academic Publishers 2000