Abstract
Objective-based clustering is a class of important clustering analysis techniques; however, these methods are easily beset by local minima due to the non-convexity of their objective functions involved, as a result, impacting final clustering performance. Recently, a convex clustering method (CC) has been on the spot light and enjoys the global optimality and independence on the initialization. However, one of its downsides is non-robustness to data contaminated with outliers, leading to a deviation of the clustering results. In order to improve its robustness, in this paper, an outlier-aware robust convex clustering algorithm, called as RCC, is proposed. Specifically, RCC extends the CC by modeling the contaminated data as the sum of the clean data and the sparse outliers and then adding a Lasso-type regularization term to the objective of the CC to reflect the sparsity of outliers. In this way, RCC can both resist the outliers to great extent and still maintain the advantages of CC, including the convexity of the objective. Further we develop a block coordinate descent approach with the convergence guarantee and find that RCC can usually converge just in a few iterations. Finally, the effectiveness and robustness of RCC are empirically corroborated by numerical experiments on both synthetic and real datasets.
Similar content being viewed by others
References
Ascari G, Fagiolo G, Roventini A (2012) Fat-tail distributions and business-cycle models. Macroecon Dyn 19(2):465–476
Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Berkhin P (2006) A survey of clustering data mining techniques. Group Multidimens Data 43(1):25–71
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11
Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228
Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Gr Stat 24(4):994–1013
Chi EC, Allen GI, Baraniuk RG (2016) Convex biclustering. Biometrics 73(1):10–19
Dave RN, Krishnapuram R (2002) Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 5(2):270–293
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, ACM, New York, pp 233–240
Dietterich TG (2017) Steps toward robust artificial intelligence. AI Mag 38(3):3–24
Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627
Du L, Shen YD (2013) Towards robust co-clustering. In: International joint conferences on artificial intelligence (IJCAI), pp 1317–1322
Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. Publ Am Stat Assoc 96(456):1348–1360
Forero PA, Kekatos V, Giannakis GB (2011) Outlier-aware robust clustering. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 2244–2247
Forero PA, Kekatos V, Giannakis GB (2012) Robust clustering using outlier-sparsity regularization. IEEE Trans Signal Process 60(8):4163–4177
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2–3):89–109
Giannakis GB, Mateos G, Farahmand S, Kekatos V, Zhu H (2011) USPACOR: universal sparsity-controlling outlier rejection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1952–1955
Hall LO (2012) Objective function-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 2(4):326–339
Hallac D, Leskovec J, Boyd S (2015) Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 387–396
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York
Hocking TD, Joulin A, Bach F, Vert JP (2011) Clusterpath: an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1
Huber PJ (1981) Robust statistics. Wiley, New York
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River
Krijthe JH (2016) RSSL: semi-supervised learning in R. In: International workshop on reproducible research in pattern recognition, Springer, Cham, pp 104–115
Lindsten F, Ohlsson H, Ljung L (2011) Just relax and come clustering!: a convexification of k-means clustering. Linköping University Electronic Press, Linköping
Lu C, Yan S, Lin Z (2016) Convex sparse spectral clustering: single-view to multi-view. IEEE Trans Image Process 25(6):2833–2843
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinf 1(1):24–45
Mateos G, Giannakis GB (2012) Robust PCA as bilinear decomposition with outlier-sparsity regularization. IEEE Trans Signal Process 60(10):5176–5190
Meng D, Zhao Q, Xu Z (2012) Improve robustness of sparse PCA by L1-norm maximization. Pattern Recognit 45(1):487–497
Nagorski J, Allen GI (2016) Genomic region detection via spatial convex clustering. arXiv preprint arXiv:1611.04696
Nie F, Wang H, Cai X et al (2012) Robust matrix completion via joint schatten p-norm and lp-norm minimization. In: 2012 IEEE 12th international conference on data mining (ICDM), IEEE, pp 566–574
Oliveira JVD, Pedrycz W et al (2007) Advances in fuzzy clustering and its applications. Wiley, New York
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Poddar S, Jacob M (2018) Clustering of data with missing entries. arXiv preprint arXiv:1801.01455
Tachikawa T, Yatabe K, Ikeda Y, et al (2016) Sound source localization based on sparse estimation and convex clustering. In: Proceedings of meetings on acoustics 172ASA, ASA, vol 29, no 1, p 055004
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288
Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282
Tošić I, Frossard P (2011) Dictionary learning. IEEE Signal Process Mag 28(2):27–38
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494
Wang S, Liu D, Zhang Z (2013) Nonconvex relaxation approaches to robust matrix recovery. In: International joint conferences on artificial intelligence (IJCAI), pp 1764–1770
Wang B, Zhang Y, Sun W et al (2016) Sparse convex clustering. J Comput Gr Stat. https://doi.org/10.1080/10618600.2017.1377081
Wang Q, Gong P, Chang S et al (2017) Robust convex clustering analysis. In: IEEE international conference on data mining
Weylandt M, Nagorski J, Allen GI (2019) Dynamic visualization and fast computation for convex clustering via algorithmic regularization. J Comput Gr Stat. https://doi.org/10.1080/10618600.2019.1629943
Yuan Y, Sun D, Toh KC (2018) An efficient semismooth Newton based algorithm for convex clustering. arXiv preprint arXiv:1802.07091
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942
Zhang H, Zha ZJ, Yan S, Wang M, Chua TS (2012) Robust non-negative graph embedding: towards noisy data, unreliable graphs, and noisy labels. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2464–2471
Zhao Y, Zhu E, Xinwang LIU et al (2019) Simultaneous clustering and optimization for evolving datasets. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2923239
Zhu C, Xu H, Leng C et al (2014) Convex optimization procedure for clustering: theoretical revisit. In: Advances in neural information processing systems (NIPS), pp 1619–1627
Acknowledgements
This work is supported by the National Natural Science Foundation of China (NSFC) under the Grant Nos. 61732006 and 61672281, as well as the Key Program of NSFC under Grant No. 61472186.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by A. Di Nola.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
Hereafter we provide the proof that the objective function of the RCC in Eq. (4) is strict convex.
First, we present the definition of strict convexity as follows.
Definition 1
A function \( f : {\mathbb{R}}^{n} \to {\mathbb{R}} \) is strictly convex, if \( {\text{dom }}\, f \) is a convex set and if for all \( \varvec{x},\varvec{ y} \in {\text{dom}} f \), \( f(\theta \varvec{x} + (1 - \theta )\varvec{y}) < \theta f(\varvec{x}) + \left( {1 - \theta } \right)f(\varvec{y}) \) whenever \( \varvec{x} \ne \varvec{y} \) and \( 0 < \theta < 1 \) (Boyd and Vandenberghe 2004).
Proof
As is known previously in Eq. (4), \( \varvec{U} \in {\mathbb{R}}^{p \times n} \) and \( \varvec{O} \in {\mathbb{R}}^{p \times n} \)
Let \( \tilde{\varvec{U}} = \left[ {\varvec{U},\varvec{O}} \right] \in {\mathbb{R}}^{p \times 2n} \), \( {\tilde{\mathbf{v}}}_{i} = \left[ {\begin{array}{*{20}c} {{\mathbf{y}}_{n(i)} } \\ {{\mathbf{0}}_{n} } \\ \end{array} } \right] \in {\mathbb{R}}^{2n \times 1} \), \( {\hat{\mathbf{v}}}_{i} = \left[ {\begin{array}{*{20}c} {{\mathbf{0}}_{n} } \\ {{\mathbf{y}}_{n(i)} } \\ \end{array} } \right] \in {\mathbb{R}}^{2n \times 1} \), \( {\tilde{\mathbf{I}}} = \left[ {\begin{array}{*{20}c} {{\mathbf{I}}_{n} } \\ {{\mathbf{I}}_{n} } \\ \end{array} } \right] \).
Subsequently, the problem in Eq. (4) can be rewritten as:
Therefore, proving the convexity of problem (4) turns into proving that Eq. (11) is strictly convex in \( \tilde{\varvec{U}} \). Applying Eq. (11) to Definition 1 for \( \forall \theta \in \left( {0,1} \right) \) and assuming \( \varvec{U}_{1} \ne \varvec{U}_{2} \), we further have
where
In terms of the absolutely homogeneous and subadditive properties of the norms, T3 in Eq. (12) is non-positive:
Similarly, T2 in Eq. (12) is also non-positive. Thus, Eq. (12) can be simplified to
At last, we obtain that
That is, \( F_{\gamma } (\tilde{\varvec{U}}) \) is convex in \( \tilde{\varvec{U}} \). Equivalently, \( F_{\gamma } (\varvec{U},\varvec{O}) \) is jointly convex in \( \left\{ {\varvec{U},\varvec{O}} \right\} \).
Rights and permissions
About this article
Cite this article
Quan, Z., Chen, S. Robust convex clustering. Soft Comput 24, 731–744 (2020). https://doi.org/10.1007/s00500-019-04471-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04471-9