Pattern Analysis and Applications

, Volume 18, Issue 3, pp 619–637 | Cite as

Hierarchical clustering based on the information bottleneck method using a control process

  • Ester Bonmati
  • Anton Bardera
  • Imma Boada
  • Miquel Feixas
  • Mateu Sbert
Short Paper


Clustering techniques aim organizing data into groups whose members are similar. A key element of these techniques is the definition of a similarity measure. The information bottleneck method provides us a full solution of the clustering problem with no need to define a similarity measure, since a variable \(X\) is clustered depending on a control variable \(Y\) by maximizing the mutual information between them. In this paper, we propose a hierarchical clustering algorithm based on the information bottleneck method such that, instead of using a control variable, the different possible values of a Markov process are clustered by maximally preserving the mutual information between two consecutive states of the Markov process. These two states can be seen as the input and the output of an information channel that is used as a control process, similarly to how the variable \(Y\) is used as a control variable in the original information bottleneck algorithm. We present both agglomerative and divisive versions of our hierarchical clustering approach and two different applications. The first one, to quantize an image by grouping intensity bins of the image histograms, is tested on synthetic, photographic and medical images and compared with hand-labelled images, hierarchical clustering using Euclidean distance and non-negative matrix factorization methods. The second one, to cluster brain regions by grouping them depending on their connectivity, is tested on medical data. In all the applications, the obtained results demonstrate the efficacy of the method in getting clusters with high mutual information.


Information bottleneck Mutual information Clustering Image segmentation 



This work was supported by the Spanish Government (Grant No. TIN2013-47276-C6-1-R ) and by the Catalan Government (Grant No. 2014-SGR-1232).


  1. 1.
  2. 2.
    Bardera A, Rigau J, Boada I, Feixas M, Sbert M (2009) Image segmentation using the information bottleneck method. IEEE Trans Image Process 18(7):1601–1612MathSciNetCrossRefGoogle Scholar
  3. 3.
    Burbea J, Rao CR (1982) On the convexity of some divergence measures based on entropy functions. IEEE Trans Inf Theory 28(3):489–495MathSciNetCrossRefGoogle Scholar
  4. 4.
    Cai R, Zhang Z, Tung AK, Dai C, Hao Z (2014) A general framework of hierarchical clustering and its applications. Inf Sci 272(10):29–48MathSciNetCrossRefGoogle Scholar
  5. 5.
    Cammoun L, Xavier Gigandet DM, Thiran JP, Sporns O, Doc KQ, Maeder P, Meuli R (2012) Mapping the human connectome at multiple scales with diffusion spectrum MRI. J Neurosci Methods 2:386–397CrossRefGoogle Scholar
  6. 6.
    Cover TM, Thomas J (1991) Elements of information theory. Wiley, New YorkGoogle Scholar
  7. 7.
    Crutchfield JP, Packard N (1983) Symbolic dynamics of noisy chaos. Physica 7D:201–223MathSciNetGoogle Scholar
  8. 8.
    Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp 89–98. ACM Press, New YorkGoogle Scholar
  9. 9.
    Duda R, Hart P, Stork D (2001) Pattern classification. Wiley, New YorkGoogle Scholar
  10. 10.
    Everitt B, Landau S, Leese M, Stahl D (2001) Cluster analysis, 5th edn. Wiley, New YorkGoogle Scholar
  11. 11.
    Feldman DP (1997) A brief introduction to: information theory, excess entropy and computational mechanicsGoogle Scholar
  12. 12.
    Feldman DP, Crutchfield JP (1998) Discovering noncritical organization: statistical mechanical, information theoreticand computational views of patterns in one-dimensional spin systems. Working Paper, vol 98, pp 04–026. Santa Fe Institute, Santa FeGoogle Scholar
  13. 13.
    Fränti P, Kaukoranta T, Shen DF, Chang KS (2000) Fast and memory efficient implementation of the exact PNN. Image Process IEEE Trans 9(5):773–777CrossRefGoogle Scholar
  14. 14.
    Gonzalez R, Woods R (2002) Digital image processing. Prentice Hall, Upper Saddle RiverGoogle Scholar
  15. 15.
    Grassberger P (1986) Toward a quantitative theory of self-generated complexity. Int J Theor Phys 25(9):907–938MathSciNetCrossRefGoogle Scholar
  16. 16.
    Guan N, Tao D, Luo Z, Shawe-Taylor J (2012) Mahnmf: Manhattan non-negative matrix factorization. arXiv:CoRR abs/1207.3438
  17. 17.
    Guan N, Tao D, Luo Z, Yuan B (2012) Nenmf: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 2882–2898Google Scholar
  18. 18.
    Hagmann P (2005) From diffusion MRI to brain connectomics. Ph.D. thesis, LausanneGoogle Scholar
  19. 19.
    Hansen P, Jaumardi B (1997) Cluster analysis and mathematical programming. Math Program 79:191–215Google Scholar
  20. 20.
    Hartigan J (1975) Clustering algorithms. Wiley, New YorkGoogle Scholar
  21. 21.
    Hsu CC, Chen CL, Su YW (2007) Hierarchical clustering of mixed data based on distance hierarchy. Inf Sci 177(20):4474–4492CrossRefGoogle Scholar
  22. 22.
    Huang K, Sidiropoulos ND, Swami A (2014) Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224MathSciNetCrossRefGoogle Scholar
  23. 23.
    Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-HallGoogle Scholar
  24. 24.
    Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323CrossRefGoogle Scholar
  25. 25.
    Kaufman L, Rousseeuw PJ (1990) Finding Groups in data: an introduction to cluster analysis. Wiley, New YorkGoogle Scholar
  26. 26.
    Kersting K, Wahabzada M, Thurau C, Bauckhage C (2010) Hierarchical convex NMF for clustering massive data. In: 2nd Asian Conference on Machine Learning, pp 253–268Google Scholar
  27. 27.
    Lam D, Wunsch DC (2014) Academic Press Library in signal processing: vol 1 signal processing theory and machine learning. Elsevier, AmsterdamGoogle Scholar
  28. 28.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1, pp 281–296 (1967)Google Scholar
  29. 29.
    Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of 8th International Conference on Computer Vision, vol 2, pp 416–423Google Scholar
  30. 30.
    Meila M (2005) Comparing clusterings: an axiomatic view. In: Raedt LD, Wrobel S(eds) Proceedings of the 22nd International Conference on Machine Learning (ICML-05), pp 577–584Google Scholar
  31. 31.
    Nagpal A, Jatain A, Gaur D (2013) Review based on data clustering algorithms. In: IEEE Conference on Information and Communication Technologies (ICT), pp 298–303Google Scholar
  32. 32.
    Otsu N (1979) A threshold selection method from gray-level histogram. Syst Man Cybern IEEE Trans 9(1):62–66. doi: 10.1109/TSMC.1979.4310076 MathSciNetCrossRefGoogle Scholar
  33. 33.
    Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2):111–126. doi: 10.1002/env.3170050203
  34. 34.
    Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Text mining using non-negative matrix factorization. In: Proceeding of the SIAM International Conference on Data Mining, pp 452–456Google Scholar
  35. 35.
    Shaw R (1984) The Dripping faucet as a model chaotic system. Aerial Press, Santa CruzGoogle Scholar
  36. 36.
    Slonim N, Friedman N, Tishby N (2002) Unsupervised document classification using sequential information maximization. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp 129–136. ACM PressGoogle Scholar
  37. 37.
    Slonim N, Friedman N, Tishby N (2006) Multivariate information bottleneck. Neural Comput 18:1739–1789MathSciNetCrossRefGoogle Scholar
  38. 38.
    Slonim N, Tishby N (2000) Agglomerative information bottleneck. In: Proceedings of NIPS-12, pp 617–623. MIT PressGoogle Scholar
  39. 39.
    Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 208–215. ACM Press, AthensGoogle Scholar
  40. 40.
    Sporns O, Tononi G, Kötter R (2005) The human connectome: a structural description of the human brain. PLoS Comput Biol 4(1):e42CrossRefGoogle Scholar
  41. 41.
    Szépfalusy P, Györgyi G (1986) Entropy decay as a measure of stochasticity in chaotic systems. Phys Rev A 33(4):2852CrossRefGoogle Scholar
  42. 42.
    Tao D, Li X, Wu X, Maybank S (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715CrossRefGoogle Scholar
  43. 43.
    Tao D, Xu C, Xu C (2014) Large-margin multi-view information bottleneck. IEEE Trans Pattern Anal Mach Intell 36(8):1559–1572Google Scholar
  44. 44.
    Tishby N, Pereira F, Bialek W (1999) The information bottleneck method. In: Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp 368–377Google Scholar
  45. 45.
    Virmajoki O (2004) Pairwise nearest neighbor method revisitedGoogle Scholar
  46. 46.
    Wang YX, Zhang YJ (2013) Nonnegative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng 1336–1353Google Scholar
  47. 47.
    Wig G, Schlaggar B, Petersen S (2011) Concepts and principles in the analysis of brain networks. Ann N Y Acad Sci 1224:126–146CrossRefGoogle Scholar
  48. 48.
    Xu C, Tao D, Xu C (2014) Large-margin multi-view information bootleneck. IEEE Trans Pattern Anal Mach Intell 36(8):1559–1572CrossRefGoogle Scholar
  49. 49.
    Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  50. 50.
    Yeung RW (2002) A first course in information theory. Springer, BerlinGoogle Scholar
  51. 51.
    Yu J, Liu D, Tao D, Seah HS (2011) Complex object correspondence construction in two-dimensional animation. IEEE Trans Image Process 20(11):3257–3269MathSciNetCrossRefGoogle Scholar
  52. 52.
    Yu J, Tao D (2013) Modern machine learning techniques and their applications in cartoon animation research. Wiley-IEEE Press, New YorkGoogle Scholar
  53. 53.
    Yu J, Wang M, Tao D (2012) Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Trans Image Process 21(11):4636–4648MathSciNetCrossRefGoogle Scholar
  54. 54.
    Zhao Y, Karypis G, Fayyad UM (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10(2):141–168MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Ester Bonmati
    • 1
  • Anton Bardera
    • 1
  • Imma Boada
    • 1
  • Miquel Feixas
    • 1
  • Mateu Sbert
    • 1
  1. 1.Institute of Informatics and ApplicationsUniversity of GironaGironaSpain

Personalised recommendations