ITCH: Information-Theoretic Cluster Hierarchies

  • Christian Böhm
  • Frank Fiedler
  • Annahita Oswald
  • Claudia Plant
  • Bianca Wackersreuther
  • Peter Wackersreuther
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)


Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e.g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings.With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals.


Probability Density Function Hierarchical Cluster Gaussian Mixture Model Single Link Minimum Description Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: SIGMOD, pp. 49–60 (1999)Google Scholar
  2. 2.
    Banfield, J.D., Raftery, A.E.: Model-Based Gaussian and Non-Gaussian Clustering. Biometrics 49(3), 803–821 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Basu, S., Bilenko, M., Mooney, R.J.: A Probabilistic Framework for Semi-supervised Clustering. In: KDD, pp. 59–68 (2004)Google Scholar
  4. 4.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating Constraints and Metric Learning in Semi-supervised Clustering. In: ICML (2004)Google Scholar
  5. 5.
    Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust Information-theoretic Clustering. In: KDD. pp. 65–75 (2006)Google Scholar
  6. 6.
    Chardin, A., Pérez, P.: Unsupervised Image Classification with a Hierarchical EM Algorithm. In: ICCV, pp. 969–974 (1999)Google Scholar
  7. 7.
    Cilibrasi, R., Vitányi, P.M.B.: Clustering by Compression. IEEE Transactions on Information Theory 51(4), 1523–1545 (2005)CrossRefGoogle Scholar
  8. 8.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc. 39, 1–31 (1977)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Goldberger, J., Roweis, S.T.: Hierarchical Clustering of a Mixture Model. In: NIPS (2004)Google Scholar
  10. 10.
    Grünwald, P.: A Tutorial Introduction to the Minimum Description Length Principle. CoRR math.ST/0406077 (2004)Google Scholar
  11. 11.
    Hamerly, G., Elkan, C.: Learning the K in K-means. In: NIPS (2003)Google Scholar
  12. 12.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  13. 13.
    Lu, Z., Leen, T.K.: Semi-supervised Learning with Penalized Probabilistic Clustering. In: NIPS (2004)Google Scholar
  14. 14.
    Murtagh, F.: A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J. 26(4), 354–359 (1983)zbMATHGoogle Scholar
  15. 15.
    Pantazi, S., Kagolovsky, Y., Moehr, J.R.: Cluster analysis of wisconsin breast cancer dataset using self-organizing maps (2002)Google Scholar
  16. 16.
    Pelleg, D., Moore, A.W.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: ICML, pp. 727–734 (2000)Google Scholar
  17. 17.
    Rissanen, J.: Stochastic Complexity in Statistical Inquiry Theory. World Scientific Publishing Co., Inc., River Edge (1989)Google Scholar
  18. 18.
    Rissanen, J.: Information and Complexity in Statistical Modeling. Springer Publishing Company, Incorporated (2007)Google Scholar
  19. 19.
    Slonim, N., Tishby, N.: Document Clustering using Word Clusters via the Information Bottleneck Method. In: SIGIR, pp. 208–215 (2000)Google Scholar
  20. 20.
    Still, S., Bialek, W.: How Many Clusters? An Information-Theoretic Perspective. Neural Computation 16(12), 2483–2506 (2004)zbMATHCrossRefGoogle Scholar
  21. 21.
    Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. CoRR physics/0004057 (2000)Google Scholar
  22. 22.
    Vasconcelos, N., Lippman, A.: Learning Mixture Hierarchies. In: NIPS. pp. 606–612 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Christian Böhm
    • 1
  • Frank Fiedler
    • 1
  • Annahita Oswald
    • 1
  • Claudia Plant
    • 2
  • Bianca Wackersreuther
    • 1
  • Peter Wackersreuther
    • 1
  1. 1.University of MunichMunichGermany
  2. 2.Florida State UniversityTallahasseeUSA

Personalised recommendations