Comparing Clusterings by the Variation of Information

  • Marina Meilă
Conference paper

DOI: 10.1007/978-3-540-45167-9_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2777)
Cite this paper as:
Meilă M. (2003) Comparing Clusterings by the Variation of Information. In: Schölkopf B., Warmuth M.K. (eds) Learning Theory and Kernel Machines. Lecture Notes in Computer Science, vol 2777. Springer, Berlin, Heidelberg

Abstract

This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount of information lost and gained in changing from clustering \({\cal C}\) to clustering \({\cal C}'\). The criterion makes no assumptions about how the clusterings were generated and applies to both soft and hard clusterings. The basic properties of VI are presented and discussed from the point of view of comparing clusterings. In particular, the VI is positive, symmetric and obeys the triangle inequality. Thus, surprisingly enough, it is a true metric on the space of clusterings.

Keywords

Clustering Comparing partitions Measures of agreement Information theory Mutual information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Marina Meilă
    • 1
  1. 1.University of WashingtonSeattleUSA

Personalised recommendations