Advertisement

COBRAS: Interactive Clustering with Pairwise Queries

  • Toon Van CraenendonckEmail author
  • Sebastijan Dumančić
  • Elia Van Wolputte
  • Hendrik Blockeel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11191)

Abstract

Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Answering yes results in a must-link constraint, no in a cannot-link. Ideally, the user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. Such an interactive clustering process requires the clustering system to satisfy three requirements: (1) it should be able to present a reasonable (intermediate) clustering to the user at any time, (2) it should produce good clusterings given few queries, i.e. it should be query-efficient, and (3) it should be time-efficient. We present COBRAS, an approach to clustering with pairwise constraints that satisfies these requirements. COBRAS constructs clusterings of super-instances, which are local regions in the data in which all instances are assumed to belong to the same cluster. By dynamically refining these super-instances during clustering, COBRAS is able to produce clusterings at increasingly fine-grained levels of granularity. It quickly produces good high-level clusterings, and is able to refine them to find more detailed structure as more queries are answered. In our experiments we demonstrate that COBRAS is the only method able to produce good solutions at all stages of the clustering process at fast runtimes, and hence the most suitable method for interactive clustering.

Keywords

Semi-supervised clustering Pairwise constraints Active clustering 

Notes

Acknowledgements

TVC is supported by the Agency for Innovation by Science and Technology in Flanders (IWT). Research supported by Research Fund KU Leuven (GOA/13/010), Research Foundation - Flanders (G079416N), and the European Research Council (Horizon 2020, grant agreement 694980, “SYNTH”).

References

  1. 1.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proceedings of SDM (2004)CrossRefGoogle Scholar
  2. 2.
    Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of KDD (2004)Google Scholar
  3. 3.
    Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of ICML (2004)Google Scholar
  4. 4.
    Caruana, R., Elhawary, M., Nguyen, N.: Meta clustering. In: Proceedings of ICDM (2006)Google Scholar
  5. 5.
    Davis, J.V., Kulis, B., Jain, P., Sra, S., Dhillon, I.S.: Information-theoretic metric learning. In: Proceedings of ICML (2007)Google Scholar
  6. 6.
    García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf. Sci. 180, 2044–2064 (2010). special Issue on Intelligent Distributed Information SystemsCrossRefGoogle Scholar
  7. 7.
    Hodges, J.L., Lehmann, E.L.: Rank methods for combination of independent experiments in analysis of variance. Ann. Math. Stat. 33, 482–497 (1962)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 21, 193–218 (1985)CrossRefGoogle Scholar
  9. 9.
    von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: Workshop on Unsupervised Learning and Transfer Learning (2014)Google Scholar
  10. 10.
    Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: Proceedings of ICPR (2008)Google Scholar
  11. 11.
    Ng, R.T., Han, J.: Clarans: a method for clustering objects for spatial data mining. IEEE TKDE 14(5), 1003–1016 (2002)Google Scholar
  12. 12.
    Rangapuram, S.S., Hein, M.: Constrained 1-spectral clustering. In: Proceedings of AISTATS (2012)Google Scholar
  13. 13.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR abs/1512.00567 (2015). http://arxiv.org/abs/1512.00567
  14. 14.
    Van Craenendonck, T., Blockeel, H.: Constraint-based clustering selection. Mach. Learn. 106, 1497–1521 (2017)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Van Craenendonck, T., Dumančić, S., Blockeel, H.: COBRA: a fast and simple method for active clustering with pairwise constraints. In: Proceedings of IJCAI (2017)Google Scholar
  16. 16.
    Wagstaff, K., Cardie, C.: Clustering with instance-level constraints. In: Proceedings of ICML (2000)Google Scholar
  17. 17.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of ICML (2001)Google Scholar
  18. 18.
    Wang, X., Qian, B., Davidson, I.: On constrained spectral clustering and its applications. Data Min. Knowl. Discov. 28, 1–30 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS (2003)Google Scholar
  20. 20.
    Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. TKDE 26, 43–54 (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Toon Van Craenendonck
    • 1
    Email author
  • Sebastijan Dumančić
    • 1
  • Elia Van Wolputte
    • 1
  • Hendrik Blockeel
    • 1
  1. 1.KU Leuven, Department of Computer ScienceLeuvenBelgium

Personalised recommendations