A Cluster-Level Semi-supervision Model for Interactive Clustering

  • Avinava Dubey
  • Indrajit Bhattacharya
  • Shantanu Godbole
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)

Abstract

Semi-supervised clustering models, that incorporate user provided constraints to yield meaningful clusters, have recently become a popular area of research. In this paper, we propose a cluster-level semi-supervision model for inter-active clustering. Prototype based clustering algorithms typically alternate between updating cluster descriptions and assignment of data items to clusters. In our model, the user provides semi-supervision directly for these two steps. Assignment feedback re-assigns data items among existing clusters, while cluster description feedback helps to position existing cluster centers more meaningfully. We argue that providing such supervision is more natural for exploratory data mining, where the user discovers and interprets clusters as the algorithm progresses, in comparison to the pair-wise instance level supervision model, particularly for high dimensional data such as document collection. We show how such feedback can be interpreted as constraints and incorporated within the kmeans clustering framework. Using experimental results on multiple real-world datasets, we show that this framework improves clustering performance significantly beyond traditional k-means. Interestingly, when given the same number of feedbacks from the user, the proposed framework significantly outperforms the pair-wise supervision model.

Keywords

Adjust Rand Index Current Cluster Interactive Cluster Cluster Description Soccer League 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Banerjee, A., Ghosh, J.: Scalable clustering algorithms with balancing constraints. Data Mining and Knowledge Discovery 13(3) (2006)Google Scholar
  2. 2.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proc. of ICML (2003)Google Scholar
  3. 3.
    Basu, S., Banjeree, A., Mooney, E.: Active semi-supervision for pairwise constrained clustering. In: Proc. of SDM (2004)Google Scholar
  4. 4.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory, and applications. Chapman and Hall/CRC Data Mining and Knowledge Discovery Series (2008)Google Scholar
  5. 5.
    Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proc. of ICML (2002)Google Scholar
  6. 6.
    Cohn, D., Caruana, R., McCallum, A.: Semi-supervised clustering with user feedback. Tech. rep., TR2003-1892, Cornell University (2003)Google Scholar
  7. 7.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15(2) (1994)Google Scholar
  8. 8.
    Cohn, D., Ghahramani, Z., Jordan, M.: Active learning with statistical models. Journal of Artificial Intelligence Research 4(1) (1996)Google Scholar
  9. 9.
    Davidson, I., Ravi, S.: Clustering with constraints: Feasibility issues and the k-means algorithm. In: Proc. of SDM (2005)Google Scholar
  10. 10.
    Davidson, I., Ravi, S.: Identifying and generating easy sets of constraints for clustering. In: Proc. of AAAI (2006)Google Scholar
  11. 11.
    Davidson, I., Ravi, S.: Intractability and clustering with constraints. In: Proc. of ICML (2007)Google Scholar
  12. 12.
    desJardins, M., MacGlashan, J., Ferraioli, J.: Interactive visual clustering. In: Proc. of IUI (2007)Google Scholar
  13. 13.
    Dhillon, I., Mallela, S., Modha, D.: Information-theoretic co-clustering. In: Proc. of SIGKDD (2003)Google Scholar
  14. 14.
    Gondek, D., Hofmann, T.: Non-redundant data clustering. In: Proc. of ICDM (2004)Google Scholar
  15. 15.
    Hofmann, T., Buhmann, J.: Active data clustering. In: Proc. of NIPS (1998)Google Scholar
  16. 16.
    Klein, D., Kamvar, S., Manning, C.: From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. In: Proc. of ICML (2002)Google Scholar
  17. 17.
    Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proc. of ICML (2000)Google Scholar
  18. 18.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proc. of ICML (2001)Google Scholar
  19. 19.
    Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: Proc. of NIPS (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Avinava Dubey
    • 1
  • Indrajit Bhattacharya
    • 2
  • Shantanu Godbole
    • 1
  1. 1.IBM ResearchIndia
  2. 2.Indian Institute of Science 

Personalised recommendations