Active Learning Strategies for Semi-Supervised DBSCAN

  • Jundong Li
  • Jörg Sander
  • Ricardo Campello
  • Arthur Zimek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8436)


The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select “most representative” objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.


Active learning Semi-supervised clustering Density-based clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. ACM SIGKDD, pp. 226–231 (1996)Google Scholar
  2. 2.
    Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD, pp. 49–60 (1999)Google Scholar
  3. 3.
    Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: Proc. IEEE ICDM, pp. 842–847 (2009)Google Scholar
  4. 4.
    Settles, B.: Active learning literature survey. University of Wisconsin, Madison (2010)Google Scholar
  5. 5.
    Zhang, L., Chen, C., Bu, J., Cai, D., He, X., Huang, T.: Active learning based on locally linear reconstruction. IEEE TPAMI 33(10), 2026–2038 (2011)CrossRefGoogle Scholar
  6. 6.
    Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proc. ICML, pp. 577–584 (2001)Google Scholar
  7. 7.
    Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proc. ICML, pp. 19–26 (2002)Google Scholar
  8. 8.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory, and applications. CRC Press (2008)Google Scholar
  9. 9.
    Böhm, C., Plant, C.: Hissclu: a hierarchical density-based method for semi-supervised clustering. In: Proc. EDBT, pp. 440–451 (2008)Google Scholar
  10. 10.
    Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proc. NIPS, pp. 892–900 (2010)Google Scholar
  11. 11.
    Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. ACM SIGIR, pp. 3–12 (1994)Google Scholar
  12. 12.
    McCallum, A., Nigam, K.: et al.: Employing EM in pool-based active learning for text classification. In: Proc. ICML, pp. 350–358 (1998)Google Scholar
  13. 13.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. COLT Workshop, pp. 287–294 (1992)Google Scholar
  14. 14.
    Atkinson, A.C., Donev, A.N., Tobias, R.D.: Optimum experimental designs, with SAS, vol. 34. Oxford University Press, Oxford (2007)zbMATHGoogle Scholar
  15. 15.
    Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: Proc. ICPR, pp. 1–4 (2008)Google Scholar
  16. 16.
    Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. ICML, pp. 623–630 (2004)Google Scholar
  17. 17.
    Vu, V.V., Labroche, N., Bouchon-Meunier, B.: Active learning for semi-supervised k-means clustering. In: Proc. IEEE ICTAI, pp. 12–15 (2010)Google Scholar
  18. 18.
    Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proc. SAIM SDM, pp. 333–344 (2004)Google Scholar
  19. 19.
    Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE TKDE 26(1), 43–54 (2014)Google Scholar
  20. 20.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  21. 21.
    Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR 7, 2399–2434 (2006)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar
  23. 23.
    Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam Library of Object Images. Int. Journal of Computer Vision 61(1), 103–112 (2005)CrossRefGoogle Scholar
  24. 24.
    Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognition 45(12), 4370–4388 (2012)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jundong Li
    • 1
  • Jörg Sander
    • 1
  • Ricardo Campello
    • 2
  • Arthur Zimek
    • 3
  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  2. 2.Department of Computer SciencesUniversity of São PauloSão CarlosBrazil
  3. 3.Institute for InformaticsLudwig-Maximilians-UniversitätMunichGermany

Personalised recommendations