The Minimum Code Length for Clustering Using the Gray Code

  • Mahito Sugiyama
  • Akihiro Yamamoto
Conference paper

DOI: 10.1007/978-3-642-23808-6_24

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)
Cite this paper as:
Sugiyama M., Yamamoto A. (2011) The Minimum Code Length for Clustering Using the Gray Code. In: Gunopulos D., Hofmann T., Malerba D., Vazirgiannis M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science, vol 6913. Springer, Berlin, Heidelberg

Abstract

We propose new approaches to exploit compression algorithms for clustering numerical data. Our first contribution is to design a measure that can score the quality of a given clustering result under the light of a fixed encoding scheme. We call this measure the Minimum Code Length (MCL). Our second contribution is to propose a general strategy to translate any encoding method into a cluster algorithm, which we call COOL (COding-Oriented cLustering). COOL has a low computational cost since it scales linearly with the data set size. The clustering results of COOL is also shown to minimize MCL. To illustrate further this approach, we consider the Gray Code as the encoding scheme to present G-COOL. G-COOL can find clusters of arbitrary shapes and remove noise. Moreover, it is robust to change in the input parameters; it requires only two lower bounds for the number of clusters and the size of each cluster, whereas most algorithms for finding arbitrarily shaped clusters work well only if all parameters are tuned appropriately. G-COOL is theoretically shown to achieve internal cohesion and external isolation and is experimentally shown to work well for both synthetic and real data sets.

Keywords

Clustering Compression Discretization Gray code 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mahito Sugiyama
    • 1
    • 2
  • Akihiro Yamamoto
    • 1
  1. 1.Graduate School of InformaticsKyoto UniversityKyotoJapan
  2. 2.Research Fellow of the Japan Society for the Promotion of ScienceJapan

Personalised recommendations