Advertisement

Towards a Simple Clustering Criterion Based on Minimum Length Encoding

  • Marcus-Christopher Ludl
  • Gerhard Widmer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2430)

Abstract

We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.

Keywords

Synthetic Dataset Minimum Description Length Message Length Candidate Cluster Instance Space 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    R. A. Baxter and J. Oliver. MDL and MML: Similarities and differences. Technical report, Dept. of Computer Science, Monash University, Clayton, 1994. (TR 207).Google Scholar
  2. 2.
    P. S. Bradley and U. M. Fayyad. Refining initial points for k-means clustering. In Proceedings of the 15th Int. Conference on Machine Learning, 91–99, 1998.Google Scholar
  3. 3.
    P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. Autoclass: A bayesian classification system. In Proceedings of the 5th International Workshop on Machine Learning, 54–64, 1988.Google Scholar
  4. 4.
    D. Keim and A. Hinneburg. Clustering techniques for large data sets: From the past to the future. In Tutorial Notes for ACM SIGKDD 1999 International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 1999.Google Scholar
  5. 5.
    R. Koschke and T. Eisenbarth. A framework for experimental evaluation of clustering techniques. In Proceedings of the International Workshop on Program Comprehension (IWPC2000), Limerick, Ireland, 2000. IEEE.Google Scholar
  6. 6.
    M.-C. Ludl and G. Widmer. Relative unsupervised discretization for association rule mining. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000), Lyon, 2000.Google Scholar
  7. 7.
    A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of NIPS 14, 2002. (to appear).Google Scholar
  8. 8.
    J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised learning using MML. In Proceedings of the 13th International Conference on Machine Learning, 364–372, San Francisco, CA, 1996. Morgan Kaufmann.Google Scholar
  9. 9.
    D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD99), 277–281, 1999.Google Scholar
  10. 10.
    B. D. Ripley. Pattern recognition and neural networks. Statistics, 33:1065–1076, 1996.Google Scholar
  11. 11.
    S. Z. Selim and M. A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):81–87, 1984.zbMATHCrossRefGoogle Scholar
  12. 12.
    M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Data Mining, 2000.Google Scholar
  13. 13.
    C. S. Wallace and D. L. Dowe. Intrinsic classification by MML—the SNOB program. In Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, 37–44, Singapore, 1994. World Scientific.Google Scholar
  14. 14.
    Y. Weiss. Segmentation using eigenvectors: A unifying view. In Proceedings of the IEEE International Conference on Computer Vision (ICCV99), 975–982, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Marcus-Christopher Ludl
    • 1
  • Gerhard Widmer
    • 1
    • 2
  1. 1.Austrian Research Institute for Artificial IntelligenceVienna
  2. 2.Department of Medical Cybernetics and Artificial IntelligenceUniversity of ViennaAustria

Personalised recommendations