Minimum Message Length Grouping of Ordered Data

  • Leigh J. Fitzgibbon
  • Lloyd Allison
  • David L. Dowe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1968)


Explicit segmentation is the partitioning of data into homogeneous regions by specifying cut-points. W. D. Fisher (1958) gave an early example of explicit segmentation based on the minimisation of squared error. Fisher called this the grouping problem and came up with a polynomial time Dynamic Programming Algorithm (DPA). Oliver, Baxter and colleagues (1996, 1997, 1998) have applied the informationtheoretic Minimum Message Length (MML) principle to explicit segmentation. They have derived formulas for specifying cut-points imprecisely and have empirically shown their criterion to be superior to other segmentation methods (AIC, MDL and BIC). We use a simple MML criterion and Fisher’s DPA to perform numerical Bayesian (summing and) integration (using message lengths) over the cut-point location parameters. This gives an estimate of the number of segments, which we then use to estimate the cut-point positions and segment parameters by minimising the MML criterion. This is shown to have lower Kullback-Leibler distances on generated data.


Message Length Order Data Segment Parameter Minimum Message Length Computer Cience 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Csaki, editors, Proceeding 2nd International Symposium on Information Theory, pages 267–281. Akademia Kiado, Budapest, 1973.Google Scholar
  2. 2.
    R. A. Baxter and J. J. Oliver. MDL and MML: Similarities and differences. Technical report TR 207, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia, 1994.Google Scholar
  3. 3.
    R. A. Baxter and J. J. Oliver. The kindest cut: minimum message length segmentation. In S. Arikawa and A. K. Sharma, editors, Proc. 7th Int. Workshop on Algorithmic Learning Theory, volume 1160 of LCNS, pages 83–90. Springer-Verlag Berlin, 1996.Google Scholar
  4. 4.
    J.H. Conway and N.J.A Sloane. Sphere Packings, Lattices and Groups. Springer-Verlag, London, 1988.Google Scholar
  5. 5.
    D. L. Dowe, R. A. Baxter, J. J. Oliver, and C. S. Wallace. Point estimation using the Kullback-Leibler loss function and MML. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD98), volume 1394 of LNAI, pages 87–95, 1998.Google Scholar
  6. 6.
    D. L. Dowe, J. J. Oliver, and C. S. Wallace. MML estimation of the parameters of the spherical Fisher distribution. In S. Arikawa and A. K. Sharma, editors, Proc. 7th Int. Workshop on Algorithmic Learning Theory, volume 1160 of LCNS, pages 213–227. Springer-Verlag Berlin, 1996.Google Scholar
  7. 7.
    T. Edgoose and L. Allison. MML markov classification of sequential data. Statistics and Computing, 9:269–278, 1999.CrossRefGoogle Scholar
  8. 8.
    W. D. Fisher. On grouping for maximum homogeneity. Jrnl. Am. Stat. Soc., 53:789–798, 1958.zbMATHGoogle Scholar
  9. 9.
    R. E. Kass and A. E. Raftery. Bayes factors. Journal of the American Statistical Association, 90(430):773–795, 1995.zbMATHCrossRefGoogle Scholar
  10. 10.
    J. J. Oliver, R. A. Baxter, and C. S. Wallace. Minimum message length segmentation. In X. Wu, R. Kotagiri, and K. Korb, editors, Research and Development in Knowledge Discovery and Data Mining (PAKDD-98), pages 83–90. Springer, 1998.Google Scholar
  11. 11.
    J. J. Oliver and C. S. Forbes. Bayesian approaches to segmenting a simple time series. Technical Report 97/336, Dept. Computer Science, Monash University, Australia 3168, December 1997.Google Scholar
  12. 12.
    J. J. Rissanen. Modeling by shortest data description. Automatica, 14:465–471, 1978.zbMATHCrossRefGoogle Scholar
  13. 13.
    J. J. Rissanen. A universal prior for integers and estimation by minimum description length. Annals of Statistics, 11(2):416–431, 1983.zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    J. J. Rissanen. Hypothesis selection and testing by the MDL principle. Computer Jrnl., 42(4):260–269, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    G. Schwarz. Estimating the dimension of a model. The Annals of Statistics, 6:461–464, 1978.zbMATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    S. Sclove. Time-series segmentation: A model and a method. Information Sciences, 29:7–25, 1983.zbMATHCrossRefGoogle Scholar
  17. 17.
    M. Viswanathan, C.S. Wallace, D.L. Dowe, and K. Korb. Finding cutpoints in noisy binary sequences-a revised empirical evaluation. In 12th Australian Joint Conference onArtificial Intelligence, 1999. A sequel has been submitted to Machine Learning Journal.Google Scholar
  18. 18.
    C. S. Wallace and D. M. Boulton. An information measure for classiifcation. Computer Jrnl., 11(2):185–194, August 1968.zbMATHGoogle Scholar
  19. 19.
    C. S. Wallace and D. L. Dowe. Minimum message length and Kolmogorov complexity. Computer Jrnl., 42(4):270–283, 1999.zbMATHCrossRefGoogle Scholar
  20. 20.
    C. S. Wallace and D. L. Dowe. Rejoinder. Computer Jrnl., 42(4):345–357, 1999.zbMATHCrossRefGoogle Scholar
  21. 21.
    C. S. Wallace and D. L. Dowe. MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions. Statistics and Computing, 10:73–83, 2000.CrossRefGoogle Scholar
  22. 22.
    C. S. Wallace and P. R. Freeman. Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B, 49:240–265, 1987.zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Leigh J. Fitzgibbon
    • 1
  • Lloyd Allison
    • 1
  • David L. Dowe
    • 1
  1. 1.School of Computer Science and Software EngineeringMonash UniversityClaytonAustralia

Personalised recommendations