Finding the Optimal Cardinality Value for Information Bottleneck Method

  • Gang Li
  • Dong Liu
  • Yiqing Tu
  • Yangdong Ye
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)


Information Bottleneck method can be used as a dimensionality reduction approach by grouping “similar” features together [1]. In application, a natural question is how many “features groups” will be appropriate. The dependency on prior knowledge restricts the applications of many Information Bottleneck algorithms. In this paper we alleviate this dependency by formulating the parameter determination as a model selection problem, and solve it using the minimum message length principle. An efficient encoding scheme is designed to describe the information bottleneck solutions and the original data, then the minimum message length principle is incorporated to automatically determine the optimal cardinality value. Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.


Encode Scheme Neural Information Processing System Distortion Function Document Cluster Data Mining Task 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tishby, N., Pereira, F., Bialek, W.: The information bottleneck method. In: Proc. 37th Allerton Conference on Communication and Computation (1999)Google Scholar
  2. 2.
    Gordon, S., Hayit Greenspan, J.G.: Applying the information bottleneck principle to unsupervised clustering of discrete and continuous image representations. In: Proceddings of the Ninth IEEE International Conference on Computer Vision (ICCV), vol. 2 (2003)Google Scholar
  3. 3.
    Goldberger, J., Greenspan, H., Gordon, S.: Unsupervised image clustering using the information bottleneck method. In: Van Gool, L. (ed.) DAGM 2002. LNCS, vol. 2449, Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proc. of the 23rd Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 208–215 (2000)Google Scholar
  5. 5.
    Verbeek, J.J.: An information theoretic approach to finding word groups for text classification. Masters thesis, The Institute for Logic, Language and Computation, University of Amsterdam (2000)Google Scholar
  6. 6.
    Niu, Z.Y., Ji, D.H., Tan, C.L.: Document clustering based on cluster validation. In: CIKM 2004: Proceedings of the thirteenth ACM international conference on Information and knowledge management, pp. 501–506. ACM Press, New York (2004)Google Scholar
  7. 7.
    Schneidman, E., Slonim, N., de Ruyter van Steveninck, R.R., Tishby, N., Bialek, W.: Analyzing neural codes using the information bottleneck method (unpublished manuscript, 2001)Google Scholar
  8. 8.
    Slonim, N., Tishby, N.: The power of word clusters for text classification. School of Computer Science and Engineering and The Interdisciplinary Center for Neural Computation The Hebrew University, Jerusalem, 91904, Israel (2001)Google Scholar
  9. 9.
    Tishby, N., Slonim, N.: Data clustering by markovian relaxation and the information bottleneck method. Advances in Neural Information Processing Systems (NIPS) 13 (2000)Google Scholar
  10. 10.
    Slonim, N., Friedman, N., Tishby, N.: Unsupervised document classification using sequential information maximization. In: Proc. of the 25th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (2002)Google Scholar
  11. 11.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. City College of New York (1991)Google Scholar
  12. 12.
    Slonim, N.: The Information Bottleneck: Theory and Applications. PhD thesis, the Senate of the Hebrew University (2002)Google Scholar
  13. 13.
    Slonim, N., Tishby, N.: Agglomerative information bottleneck. Advances in Neural Information Processing Systems (NIPS) 12, 617–623 (1999)Google Scholar
  14. 14.
    Wallace, C., Freeman, P.R.: Estimation and inference by compact coding. Journal of the Royal Statistical Society 49, 223–265 (1987)MathSciNetGoogle Scholar
  15. 15.
    Wallace, C., Boulton, D.: An information measure for classification. Computer Journal 11, 185–194 (1968)CrossRefMATHGoogle Scholar
  16. 16.
    Rissanen, J.: Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11, 416–431 (1983)CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Lang, K.: Learning to filter netnews. In: Proc. of the 12th International Conf. on Machine Learning, pp. 331–339 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gang Li
    • 1
  • Dong Liu
    • 2
  • Yiqing Tu
    • 1
  • Yangdong Ye
    • 2
  1. 1.School of Information TechnologyDeakin UniversityAustralia
  2. 2.School of Information EngineeringZhengzhou UniversityZhengzhouChina

Personalised recommendations