Skip to main content

Comparing Clusterings by the Variation of Information

  • Conference paper
Learning Theory and Kernel Machines

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2777))

Abstract

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering \({\cal C}\) to clustering \({\cal C}'\). The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6-17 (2002)

    Google Scholar 

  2. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)

    Book  MATH  Google Scholar 

  3. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78(383), 553–569 (1983)

    Article  MATH  Google Scholar 

  4. Hubert, L., Arabie, P.: Comparing Partitions. Journal of Classification 2, 193–218 (1985)

    Article  Google Scholar 

  5. Larsen, B., Aone, C.: Fast and effective text mining using linear time document clustering. In: Proceedings of the conference on Knowledge Discovery and Data Mining, pp. 16-22 (1999)

    Google Scholar 

  6. Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  7. Meilă, M.: Comparing clusterings. Technical Report 419, University of Washington (2002), http://www.stat.washington.edu/reports

  8. Meilă, M., Heckerman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42(1/2), 9–29 (2001)

    Google Scholar 

  9. Mirkin, B.: Mathematical classification and clustering. Kluwer Academic Press, Dordrecht (1996)

    MATH  Google Scholar 

  10. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)

    Article  Google Scholar 

  11. van Dongen, S.: Performance criteria for graph clustering and Markov cluster experiments. Technical Report INS-R0012, Centrum voor Wiskunde en Informatica (2000)

    Google Scholar 

  12. Wallace, D.L.: Comment. Journal of the American Statistical Association 78(383), 569–576 (1983)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Meilă, M. (2003). Comparing Clusterings by the Variation of Information. In: Schölkopf, B., Warmuth, M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science(), vol 2777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45167-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45167-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40720-1

  • Online ISBN: 978-3-540-45167-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics