Abstract
Recent advances in clustering consider incorporating background knowledge in the partitioning algorithm, using, e.g., pairwise constraints between objects. As a matter of fact, prior information, when available, often makes it possible to better retrieve meaningful clusters in data. Here, this approach is investigated in the framework of belief functions, which allows us to handle the imprecision and the uncertainty of the clustering process. In this context, the EVCLUS algorithm was proposed for partitioning objects described by a dissimilarity matrix. It is extended here so as to take pairwise constraints into account, by adding a term to its objective function. This term corresponds to a penalty term that expresses pairwise constraints in the belief function framework. Various synthetic and real datasets are considered to demonstrate the interest of the proposed method, called CEVCLUS, and two applications are presented. The performances of CEVCLUS are also compared to those of other constrained clustering algorithms.
Similar content being viewed by others
Notes
A Matlab implementation of the CEVCLUS algorithm is available at https://www.hds.utc.fr/~tdenoeux.
Available at http://archive.ics.uci.edu/ml.
Available at http://people.csail.mit.edu/jrennie/20Newsgroups.
Available on http://algoval.essex.ac.uk/data/sequence/chicken.
References
Antoine V, Quost B, Masson M-H, Denœux T (2012) CECM: constrained evidential C-means algorithm. Computat Stat Data Anal 56:894–914
Basu S, Bilenko M, Banerjee A, Mooney R (2006) Probabilistic semi-supervised clustering with constraints. In: Chapelle O, Schölkopf B, Zien A (eds) Semi-supervised learning. MIT Press, Cambridge, pp 71–98
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer, Norwell
Bunke H, Bühler U (1993) Applications of approximate string matching to 2d shape recognition. Pattern Recogn 26(12):1797–1812
Davidson I, Ravi S (2005) Agglomerative hierarchical clustering with constraints: theoretical and empirical results. Proceedings of the 9th European conference on principles and practice of knowledge discovery in databases (PKDD), vol 3721. Springer, Porto, pp 59–70
Davidson I, Wagstaff K, Basu S (2006) Measuring constraint-set utility for partitional clustering algorithms. In: Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (PKDD). Berlin, Germany, vol 4213, pp 115–126
Dempster A (1967) Upper and lower probabilities induced by multivalued mapping. Ann Math Stat 38:325–339
Denœux T, Masson M-H (2004) EVCLUS: evidential clustering of proximity data. IEEE Trans Syst Man Cybern B 34(1):95–109
Everitt BS, Landau S, Leese M (2009) Hierarchical clustering. Cluster analysis, 4th edn. Wiley, New York, pp 55–89
Frigui H, Hwang C (2007) Adaptive concept learning through clustering and aggregation of relational data. Proceedings of the 7th SIAM international conference on data mining, Minneaplis, USA, pp 90–101
Gondek D, Hofmann T (2007) Non-redundant data clustering. Knowl Inf Syst 12(1):1–24
Grira N, Crucianu M, Boujemaa N (2008) Active semi-supervised fuzzy clustering. Pattern Recogn 41(5):1834–1844
Hamasuna Y, Endo Y (2012) On semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria. Soft Comput Fusion Found Methodol Appl 1–11
Hathaway R, Davenport J, Bezdek J (1989) Relational duals of the c-means clustering algorithms. Pattern Recogn 22(2):205–212
Kannan S, Sathya A, Ramathilagam S (2011) Effective fuzzy clustering techniques for segmentation of breast mri. Soft Comput Fusion Found Methodol Appl 15:483–491
Klein D, Kamvar S, Manning C (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: Proceedings of the 19th international conference on machine learning. Australia, Sydney, pp 307–314
Klir G, Wierman M (1999) Uncertainty-based information: elements of generalized information theory. Springer, New York
Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. In: 22nd international conference on machine learning (ICML). Bonn, Germany, pp 457–464
Law M, Topchy A, Jain A (2004) Clustering with soft and group constraints. Structural, syntactic, and statistical pattern recognition, Joint IAPR international workshops, SSPR 2004 and SPR 2004, vol 3138. Springer, Lisbon, pp 662–670
Lazzerini B, Marcelloni F (2007) A hierarchical fuzzy clustering-based system to create user profiles. Soft Comput Fusion Found Methodol Appl 11:157–168
Li YL, Shen Y (2010) An automatic fuzzy c-means algorithm for image segmentation. Soft Comput Fusion Found Methodol Appl 14:123–128
Liu Y, Jin R, Jain A (2007) Boostcluster: boosting clustering by pairwise constraints. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, San Jose, pp 450–459
Masson M-H, Denœux T (2004) Clustering interval-valued data using belief functions. Pattern Recogn Lett 25(2):163–171
Masson M-H, Denœux T (2008) ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recogn 41(4):1384–1397
Masson M-H, Denœux T (2009) RECM: relational evidential c-means algorithm. Pattern Recogn Lett 30(11):1015–1026
Pal N, Bezdek J, Hemasinha R (1992) Uncertainty measures for evidential reasoning i: a review. Int J Approx Reason 7(3–4):165–183
Pedrycz W, Loia V, Senatore S (2004) P-FCM: a proximity-based fuzzy clustering. Fuzzy Sets Syst 148(1):21–41
Pedrycz W (2007) Collaborative and knowledge-based fuzzy clustering. Int J Innov Comput Inf Control 1(3):1–12
Pekalska E, Duin R (2005) The dissimilarity representation for pattern recognition, vol 64, foundations and applications. World Scientific, Singapore
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18:401–409
Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton
Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458
Smets P, Kennes R (1994) The transferable belief model. Artif Intell 66:191–234
Wagstaff K (2007) Value, cost, and sharing: open issues in constrained clustering. Knowl Discov Induct Databases 4747:1–10
Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the 18th international conference on machine learning. Williamstown, MA, USA, pp 577–584
Xing E, Ng A, Jordan M, Russell S (2002) Distance metric learning with application to clustering with side-information. In: Proceedings of the 15th annual conference on advances in neural information processing systems (NIPS). Vancouver, British Columbia, Canada, pp 521–528
Zhenguo L, Jianzhuang L, Xiaoou T (2009) Constrained clustering via spectral regularization. In: IEEE conference on computer vision and pattern recognition (CVPR). Miami, FL, USA, pp 421–428
Zhong S, Ghosh J (2003) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037
Acknowledgments
This work was carried out in the framework of the Labex MS2T, which was funded by the French Government, through the program “Investments for the future” managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by W. Pedrycz.
This work has been mostly developed while the author was in Heudiasyc.
Appendix A: Optimization algorithm
Appendix A: Optimization algorithm
The minimization of \(J_{CEVCLUS}\) can be performed using any unconstrained nonlinear programming algorithm. In the experiments reported in Sect. 4, we used the same gradient-based optimization as in Denœux and Masson (2004). This method is briefly sketched below.
Let \(\varvec{w}\) be the vector of parameters and \(J(\varvec{w})\) the objective function to be minimized. The algorithm is a variant of gradient descent in which each parameter \(w_i\) has its own step size \(\eta _j\), and the step sizes are adapted during the optimization process, depending on the evolution of the objective function and on the sign of the derivatives at successive iterations. Let \(t\) be the iteration counter. Let us first assume that the objective function has decreased between iterations \(t-1\) and \(t\). Then the following rule is applied to update each step size \(\eta _j\):
where \(\beta >1\) and \(\gamma <1\) are two coefficients. Hence, the step size is increased if the derivatives have kept the same sign during two iterations, and it is increased if the sign of the derivative has changed, which indicates that we have “jumped over” a minimum. The parameters are then updated by:
If now the objective function has increased between iterations \(t-1\) and \(t\), all step sizes are decreased simultaneously:
with \(\delta <1\), and the parameters are updated starting from where they were at the previous iteration:
As in Denœux and Masson (2004), we set the parameters \(\beta \), \(\gamma \) and \(\delta \) to \(1.2\), \(0.8\) and \(0.5\) in our experiments.
Rights and permissions
About this article
Cite this article
Antoine, V., Quost, B., Masson, MH. et al. CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18, 1321–1335 (2014). https://doi.org/10.1007/s00500-013-1146-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-013-1146-z