Comparing Clusterings by the Variation of Information

  • Marina Meilă
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2777)

Abstract

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering \({\cal C}\) to clustering \({\cal C}'\). The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings.

Keywords

Clustering Comparing partitions Measures of agreement Information theory Mutual information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6-17 (2002)Google Scholar
  2. 2.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)MATHCrossRefGoogle Scholar
  3. 3.
    Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553–569 (1983)MATHCrossRefGoogle Scholar
  4. 4.
    Hubert, L., Arabie, P.: Comparing Partitions. Journal of Classification 2, 193–218 (1985)CrossRefGoogle Scholar
  5. 5.
    Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the conference on Knowledge Discovery and Data Mining, pp. 16-22 (1999)Google Scholar
  6. 6.
    Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Meilă, M.: Comparing clusterings. Technical Report 419, University of Washington (2002), http://www.stat.washington.edu/reports
  8. 8.
    Meilă, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42(1/2), 9–29 (2001)Google Scholar
  9. 9.
    Mirkin, B.: Mathematical classification and clustering. Kluwer Academic Press, Dordrecht (1996)MATHGoogle Scholar
  10. 10.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)CrossRefGoogle Scholar
  11. 11.
    van Dongen, S.: Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, Centrum voor Wiskunde en Informatica (2000)Google Scholar
  12. 12.
    Wallace, D.L.: Comment. Journal of the American Statistical Association 78(383), 569–576 (1983)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Marina Meilă
    • 1
  1. 1.University of WashingtonSeattleUSA

Personalised recommendations