Abstract
To obtain a user-desired and accurate clustering result in practical applications, one way is to utilize additional pairwise constraints that indicate the relationship between two samples, that is, whether these samples belong to the same cluster or not. In this paper, we put forward a discriminative learning approach which can incorporate pairwise constraints into the recently proposed two-class maximum margin clustering framework. In particular, a set of pairwise loss functions is proposed, which features robust detection and penalization for violating the pairwise constraints. Consequently, the proposed method is able to directly find the partitioning hyperplane, which can separate the data into two groups and satisfy the given pairwise constraints as much as possible. In this way, it makes fewer assumptions on the distance metric or similarity matrix for the data, which may be complicated in practice, than existing popular constrained clustering algorithms. Finally, an iterative updating algorithm is proposed for the resulting optimization problem. The experiments on a number of real-world data sets demonstrate that the proposed pairwise constrained two-class clustering algorithm outperforms several representative pairwise constrained clustering counterparts in the literature.
Notes
It can be shown that the CCCP remains valid when using any subgradient of the concave function [50]. A subgradient of \(f\) at \(\mathbf x \) is any vector \(\mathbf g \) that satisfies the inequality \(f(\mathbf y ) \le f(\mathbf x ) + \mathbf g ^{\prime }(\mathbf y - \mathbf x )\) for all \(\mathbf y \) [51].
Since DCA+K-means has memory overflow problem on the leukemia data set whose dimensionality is high, we do not include it for comparison on this data set.
References
Peng W, Li T (2011) Temporal relation co-clustering on directional social network and author-topic evolution. Knowl Inf Syst 26:467–486
Tang M, Zhou Y, Li J, Wang W et al (2011) Exploring the wild birds imigration data for the disease spread study of H5N1: a clustering and association approach. Knowl Inf Syst 27:227–251
Baralis E, Bruno G, Fiori A (2011) Measuring gene similarity by means of the classification distance. Knowl Inf Syst 29:81–101
Zhao W, He Q, Ma H, Shi Z (2011) Effective semi-supervised document clustering via active learning with instance-level constraints. Knowl Inf Syst 30:569–587
Kalogeratos A, Likas A (2011) Text document clustering using global term context vectors. Knowl Inf Syst. doi:10.1007/s10115-011-0412-6
Li Z, Liu J (2009) Constrained clustering by spectral kernel learning. Proceedings of the 12th IEEE international conference on computer vision, pp 421–427
Basu S, Davidson I, Wagstaff K (2008) Constrained clustering: advances in algorithms, applications and theory. CRC Press, Boca Raton
Wagstaff K, Cardie C, Schroedl S (2001) Constrained k-means clustering with background knowledge. Proceedings of the 18th international conference on, machine learning, pp 577–584
Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph glustering: a kernel approach. Proceedings of the 22th international conference on, machine learning, pp 457–464
Yan R, Zhang J, Yang J, Hauptmann A (2006) A discriminative learning framework with pairwise constraints for video object classification. IEEE Trans Pattern Anal Mach Intell 28(4):578–593
Domeniconi C, Peng J, Yan B (2011) Composite kernels for semi-supervised clustering. Knowl Inf Syst 28:99–116
Wang F, Li P, König AC, Wan M (2011) Improving clustering by learning a bi-stochastic data similarity matrix. Knowl Inf Syst. doi:10.1007/s10115-011-0433-1
Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning with application to clustering with side-information. Adv Neural Inf Process Syst 15:521–528
Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proceedings of the 20th international conference on, machine learning, pp 11–18
Hoi SCH, Liu W, Lyu MR, Ma WY (2006) Learning distance metrics with contextual constraints for image retrieval. Proceedings of the 9th international conference on computer vision and, pattern recognition, pp 2072–2078
Kamvar SD, Klein D, Manning C (2003) Spectral learning. Proceedings of the 18th international joint conference on, artificial intelligence, pp 561–566
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. Proceedings of the 24th international conference on, machine learning, pp 209–216
Li ZG, Liu J, Tang X (2008) Pairwise constraint propagation by semidefinite programming for semi-supervised classification. Proceedings of the 25th international conference on, machine learning, pp 576–583
Hoi SCH, Jin R, Lyu MR (2007) Learning nonparametric kernel matrices from pairwise constraints. Proceedings of the 24th international conference on, machine learning, pp 361–368
Lu Z, Carreira-Perpinan MA (2008) Constrained spectral clustering through affinity propagation. Proceedings of the 11th IEEE international conference on computer vision and, pattern recognition, pp 1–8
Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. Proceedings of the 21st international conference on, machine learning, pp 81–88
Wu L, Jin R, Hoi SCH, Zhu J, Yu N (2009) Learning bregman distance functions and its application for semi-supervised clustering. Adv Neural Inf Process Syst 22:2089–2097
Xu L, Neufeld J, Larson B, Schuurmans D (2005) Maximum margin clustering. Adv Neural Inf Process Syst 17:1537–1544
Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive svms. J Mach Learn Res 7:1687–1712
Hu Y, Wang J, Yu N, Hua XS (2008) Maximum margin clustering with pairwise constraints. Proceedings of the 8th IEEE international conference on data mining, pp 253–262
Zeng H, Cheung YM (2012) Semi-supervised maxmum margin clustering with pairwise constraints. IEEE Trans Knowl Data Eng 24(5):926–939
Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document clustering. Proceedings of the 7th IEEE international conference on data mining, pp 103–112
Wang F, Li T, Zhang CS (2008) Semi-supervised clustering via matrix factorization. Proceedings of the 8th SIAM international conference on data mining, pp 1–12
Li T, Ding C, Jordan MI (2007) Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. Proceedings of the 7th IEEE international conference on data mining, pp 577–582
Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17:355–379
Hoi SCH, Liu W, Chang SF (2008) Semi-supervised distance metric learning for collaborative image retrieval. Proceedings of the 11th IEEE international conference on computer vision and, pattern recognition, pp 1–7
Zhang DQ, Zhou ZH, Chen SC (2007) Semi-supervised dimensionality reduction. Proceedings of the 7th SIAM international conference on data mining, pp 629–634
Nguyen N, Caruana R (2008) Improving classification with pairwise constraints: a margin-based approach. Proceedings of the 19th European conference on machine learning and knowledge discovery in databases, pp 113–124
Goldberg A, Zhu X, Wright S (2007) Dissimilarity in graph-based semi-supervised classification. Proceedings of the 12th international conference on artificial intelligence and, statistics, pp 155–162
Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. Proceedings of the 22nd national conference on, artificial intelligence, pp 651–656
Zhang C, Cai Q, Song Y (2010) Boosting with pairwise constraints. Neurocomputing 73(4–6):908–919
Xu L, Schuurmans D (2005) Unsupervised and semi-supervised multi-class support vector machines. Proceedings of the 20th national conference on, artificial intelligence, pp 904–910
Zhang K, Tsang IW, Kwok JT (2009) Maximum margin clustering made practical. IEEE Trans Neural Netw 20(4):583–596
Valizadegan H, Jin R (2007) Generalized maximum margin clustering and unsupervised kernel learning. Adv Neural Inf Process Syst 19:1417–1424
Zhang K, Tsang IW, Kwok JT (2007) Maximum margin clustering made practical. Proceedings of the 24th international conference on, machine learning, pp 1119–1126
Zhao B, Wang F, Zhang C (2008) Efficient multiclass maximum margin clustering. Proceedings of the 25th international conference on, machine learning, pp 1248–1255
Li YF, Tsang IW, Kwok JT, Zhou ZH (2009) Tighter and convex maximum margin clustering. Proceedings of the 12th international conference on artificial intelligence and, statistics, pp 344–351
Wang F, Zhao B, Zhang C (2010) Linear time maximum margin clustering. IEEE Trans Neural Netw 21(2):319–332
Gu Q, Zhou J (2009) Subspace maximum margin clustering. Proceedings of the 18th ACM conference on information and, knowledge management, pp 1337–1346
Zhao B, Kwok J, Wang F, Zhang C (2009) Unsupervised maximum margin feature selection with manifold regularization. Proceedings of the 12th IEEE conference on computer vision and, pattern recognition, pp 888–895
Zhao B, Kwok JT, Zhang C (2009) Multiple kernel clustering. Proceedings of the 9th SIAM international conference on data mining, pp 638–649
Shen RL, Olshen AB, Ladanyi M (2010) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 26:292–293
Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936
Smola AJ, Vishwanathan SVN, Hofmann T (2005) Kernel methods for missing variables. Proceedings of the 20th international workshop on artificial intelligence and, statistics, pp 325–332
Collobert R, Sinz F, Weston J et al (2006) Large scale transductive SVMs. J Mach Learn Res 7:1687–1712
Bonnans JF, Gilbert JC, Lemaréchal C et al (2003) Numerical optimization. Springer, Berlin, Germany
Rudin W (1978) Principles of mathematical analysis, 3rd edn. McGray-Hill, New York
Joachims T (2006) Training linear SVMs in linear time. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 217–226
Li Y, Bontcheva K, Cunningham H (2009) Adapting svm for data sparseness and imbalance: a case study in information extraction. Nat Lang Eng 15:241–271
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Shalev-Shwartz S, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-gradient solver for SVM. Proceedings of the 24th international conference on, machine learning, pp 807–814
Núñez Castro H, González Abril L, Angulo Bahón C (2011) A post-processing strategy for SVM learning from unbalanced data. Proceedings of the 15th European symposium on artificial, neural networks, pp 195–200
Strehl A, Ghosh J (2003) Cluster ensembles-a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge, MA
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750
Golub TR, Slonim DK, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 97:77–87
Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: from transductive to semi-supervised learning. Proceedings of the 22nd international conference on, machine learning, pp 824–831
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable comments and helpful suggestions. This work was supported by the National Natural Science Foundation of China (No. 61105048, 60972165, 61104206, 51175080), the Doctoral Fund of Ministry of Education of China (No. 20100092120012, 20110092120034), the Natural Science Foundation of Jiangsu Province (No. BK2010240, BK2010423), the Technology Foundation for Selected Overseas Chinese Scholar, Ministry of Human Resources and Social Security of China (No. 6722000008), and the Open Fund of Jiangsu Province Key Laboratory for Remote Measuring and Control (No. YCCK201005).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zeng, H., Song, A. & Cheung, Y.M. Improving clustering with pairwise constraints: a discriminative approach. Knowl Inf Syst 36, 489–515 (2013). https://doi.org/10.1007/s10115-012-0592-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0592-8