Abstract
Constrained clustering has been well-studied for algorithms such as K-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a developing area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link (ML) and Cannot-Link (CL) constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in ML and CL constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.
Similar content being viewed by others
References
Amini MR, Usunier N, Goutte C (2009) Learning from multiple partially observed views—an application to multilingual text categorization. In: Advances in neural information processing systems 22 (NIPS 2009), pp 28–36
Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed Jan 2012
Basu S, Davidson I, Wagstaff K (eds) (2008) Constrained clustering: advances in algorithms, theory, and applications. Chapman & Hall/CRC, Boca Raton
Buckner RL, Andrews-Hanna JR, Schacter DL (2008) The brain’s default network. Ann NY Acad Sci 1124(1): 1–38
Coleman T, Saunderson J, Wirth A (2008) Spectral clustering with inconsistent advice. In: Proceedings of the 25th international conference on machine learning (ICML 2008), pp 152–159
Davidson I, Ravi SS (2006) Identifying and generating easy sets of constraints for clustering. In: Proceedings of the 21st national conference on artificial intelligence (AAAI 2006), pp 336–341
Davidson I, Ravi SS (2007a) The complexity of non-hierarchical clustering with instance and cluster level constraints. Data Min Knowl Discov 14(1): 25–61
Davidson I, Ravi SS (2007b) Intractability and clustering with constraints. In: Proceedings of the 24th international conference on machine learning (ICML 2007), pp 201–208
Davidson I, Ravi SS, Ester M (2007) Efficient incremental constrained clustering. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2007), pp 240–249
De Bie T, Suykens JAK, De Moor B (2004) Learning from general label constraints. In: Structural, syntactic, and statistical pattern recognition, Joint IAPR International Workshops (SSPR/SPR 2004), pp 671–679
Drineas P, Frieze AM, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1–3): 9–33
Gu Q, Li Z, Han J (2011) Learning a kernel for multi-task clustering. In: Proceedings of the 25th AAAI conference on artificial intelligence (AAAI 2011)
Horn R, Johnson C (1990) Matrix analysis. Cambridge University Press, Cambridge
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2: 193–218
Ji X, Xu W (2006) Document clustering with prior knowledge. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2006), pp 405–412
Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003), pp 561–566
Kuhn H, Tucker A (1982) Nonlinear programming. ACM SIGMAP Bull 6–18
Kulis B, Basu S, Dhillon IS, Mooney RJ (2005) Semi-supervised graph clustering: a kernel approach. In: Proceedings of the 22nd international conference on machine learning (ICML 2005), pp 457–464
Li Z, Liu J, Tang X (2009) Constrained clustering via spectral regularization. In: IEEE conference on computer vision and pattern recognition (CVPR 2009), pp 421–428
Lu Z, Carreira-Perpiñán MÁ (2008) Constrained spectral clustering through affinity propagation. In: IEEE conference on computer vision and pattern recognition (CVPR 2008)
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proceedings of the 8th international conference on computer vision (ICCV 2001), vol 2, pp 416–423
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 22 (NIPS 2001), pp 849–856
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10): 1345–1359
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8): 888–905
van den Heuvel M, Mandl R, Hulshoff Pol H (2008) Normalized cut group clustering of resting-state fMRI data. PLoS ONE 3(4): e2001
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416
Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD 2010), pp 563–572
Wang F, Li T, Zhang C (2008) Semi-supervised clustering via matrix factorization. In: Proceedings of the 8th SIAM international conference on data mining (SDM 2008), pp 1–12
Wang F, Ding CHQ, Li T (2009) Integrated KL (K-means–Laplacian) clustering: a new clustering approach by combining attribute data and pairwise relations. In: Proceedings of the 9th SIAM international conference on data mining (SDM 2009), pp 38–48
White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: Proceedings of the 5th SIAM international conference on data mining (SDM 2005), pp 76–84
Xu Q, desJardins M, Wagstaff K (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the 18th international Florida artificial intelligence research society conference, pp 866–867
Yu SX, Shi J (2001) Grouping with bias. In: Advances in neural information processing systems 22 (NIPS 2001), pp 1327–1334
Yu SX, Shi J (2004) Segmentation given partial grouping constraints. IEEE Trans Pattern Anal Mach Intell 26(2): 173–183
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Charu Aggarwal.
Rights and permissions
About this article
Cite this article
Wang, X., Qian, B. & Davidson, I. On constrained spectral clustering and its applications. Data Min Knowl Disc 28, 1–30 (2014). https://doi.org/10.1007/s10618-012-0291-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-012-0291-9