Abstract
We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example’s cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of hyper-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings), and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
R. A. Baxter and J. Oliver. MDL and MML: Similarities and differences. Technical report, Dept. of Computer Science, Monash University, Clayton, 1994. (TR 207).
P. S. Bradley and U. M. Fayyad. Refining initial points for k-means clustering. In Proceedings of the 15th Int. Conference on Machine Learning, 91–99, 1998.
P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. Autoclass: A bayesian classification system. In Proceedings of the 5th International Workshop on Machine Learning, 54–64, 1988.
D. Keim and A. Hinneburg. Clustering techniques for large data sets: From the past to the future. In Tutorial Notes for ACM SIGKDD 1999 International Conference on Knowledge Discovery and Data Mining, San Diego, CA, 1999.
R. Koschke and T. Eisenbarth. A framework for experimental evaluation of clustering techniques. In Proceedings of the International Workshop on Program Comprehension (IWPC2000), Limerick, Ireland, 2000. IEEE.
M.-C. Ludl and G. Widmer. Relative unsupervised discretization for association rule mining. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000), Lyon, 2000.
A. Y. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Proceedings of NIPS 14, 2002. (to appear).
J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised learning using MML. In Proceedings of the 13th International Conference on Machine Learning, 364–372, San Francisco, CA, 1996. Morgan Kaufmann.
D. Pelleg and A. Moore. Accelerating exact k-means algorithms with geometric reasoning. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD99), 277–281, 1999.
B. D. Ripley. Pattern recognition and neural networks. Statistics, 33:1065–1076, 1996.
S. Z. Selim and M. A. Ismail. K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1):81–87, 1984.
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In KDD Workshop on Data Mining, 2000.
C. S. Wallace and D. L. Dowe. Intrinsic classification by MML—the SNOB program. In Proceedings of the 7th Australian Joint Conference on Artificial Intelligence, 37–44, Singapore, 1994. World Scientific.
Y. Weiss. Segmentation using eigenvectors: A unifying view. In Proceedings of the IEEE International Conference on Computer Vision (ICCV99), 975–982, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ludl, MC., Widmer, G. (2002). Towards a Simple Clustering Criterion Based on Minimum Length Encoding. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Machine Learning: ECML 2002. ECML 2002. Lecture Notes in Computer Science(), vol 2430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36755-1_22
Download citation
DOI: https://doi.org/10.1007/3-540-36755-1_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44036-9
Online ISBN: 978-3-540-36755-0
eBook Packages: Springer Book Archive