Measuring Constraint-Set Utility for Partitional Clustering Algorithms

  • Ian Davidson
  • Kiri L. Wagstaff
  • Sugato Basu
Conference paper

DOI: 10.1007/11871637_15

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4213)
Cite this paper as:
Davidson I., Wagstaff K.L., Basu S. (2006) Measuring Constraint-Set Utility for Partitional Clustering Algorithms. In: Fürnkranz J., Scheffer T., Spiliopoulou M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science, vol 4213. Springer, Berlin, Heidelberg

Abstract

Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves performance, with respect to the true data labels. However, in most of these experiments, results are averaged over different randomly chosen constraint sets, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ian Davidson
    • 1
  • Kiri L. Wagstaff
    • 2
  • Sugato Basu
    • 3
  1. 1.State University of New YorkAlbanyUSA
  2. 2.Jet Propulsion LaboratoryPasadenaUSA
  3. 3.SRI InternationalMenlo ParkUSA

Personalised recommendations