Skip to main content
Log in

Robust convex clustering

  • Foundations
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Objective-based clustering is a class of important clustering analysis techniques; however, these methods are easily beset by local minima due to the non-convexity of their objective functions involved, as a result, impacting final clustering performance. Recently, a convex clustering method (CC) has been on the spot light and enjoys the global optimality and independence on the initialization. However, one of its downsides is non-robustness to data contaminated with outliers, leading to a deviation of the clustering results. In order to improve its robustness, in this paper, an outlier-aware robust convex clustering algorithm, called as RCC, is proposed. Specifically, RCC extends the CC by modeling the contaminated data as the sum of the clean data and the sparse outliers and then adding a Lasso-type regularization term to the objective of the CC to reflect the sparsity of outliers. In this way, RCC can both resist the outliers to great extent and still maintain the advantages of CC, including the convexity of the objective. Further we develop a block coordinate descent approach with the convergence guarantee and find that RCC can usually converge just in a few iterations. Finally, the effectiveness and robustness of RCC are empirically corroborated by numerical experiments on both synthetic and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ascari G, Fagiolo G, Roventini A (2012) Fat-tail distributions and business-cycle models. Macroecon Dyn 19(2):465–476

    Google Scholar 

  • Bache K, Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml

  • Berkhin P (2006) A survey of clustering data mining techniques. Group Multidimens Data 43(1):25–71

    MathSciNet  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11

    MathSciNet  MATH  Google Scholar 

  • Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228

    Google Scholar 

  • Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Gr Stat 24(4):994–1013

    MathSciNet  Google Scholar 

  • Chi EC, Allen GI, Baraniuk RG (2016) Convex biclustering. Biometrics 73(1):10–19

    MathSciNet  MATH  Google Scholar 

  • Dave RN, Krishnapuram R (2002) Robust clustering methods: a unified view. IEEE Trans Fuzzy Syst 5(2):270–293

    Google Scholar 

  • Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning, ACM, New York, pp 233–240

  • Dietterich TG (2017) Steps toward robust artificial intelligence. AI Mag 38(3):3–24

    Google Scholar 

  • Donoho DL (1995) De-noising by soft-thresholding. IEEE Trans Inf Theory 41(3):613–627

    MathSciNet  MATH  Google Scholar 

  • Du L, Shen YD (2013) Towards robust co-clustering. In: International joint conferences on artificial intelligence (IJCAI), pp 1317–1322

  • Fan J, Li R (2001) Variable selection via non-concave penalized likelihood and its oracle properties. Publ Am Stat Assoc 96(456):1348–1360

    MATH  Google Scholar 

  • Forero PA, Kekatos V, Giannakis GB (2011) Outlier-aware robust clustering. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 2244–2247

  • Forero PA, Kekatos V, Giannakis GB (2012) Robust clustering using outlier-sparsity regularization. IEEE Trans Signal Process 60(8):4163–4177

    MathSciNet  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4(2–3):89–109

    MathSciNet  MATH  Google Scholar 

  • Giannakis GB, Mateos G, Farahmand S, Kekatos V, Zhu H (2011) USPACOR: universal sparsity-controlling outlier rejection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 1952–1955

  • Hall LO (2012) Objective function-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 2(4):326–339

    Google Scholar 

  • Hallac D, Leskovec J, Boyd S (2015) Network lasso: clustering and optimization in large graphs. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, pp 387–396

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics: the approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  • Hocking TD, Joulin A, Bach F, Vert JP (2011) Clusterpath: an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1

  • Huber PJ (1981) Robust statistics. Wiley, New York

    MATH  Google Scholar 

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    MATH  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  • Krijthe JH (2016) RSSL: semi-supervised learning in R. In: International workshop on reproducible research in pattern recognition, Springer, Cham, pp 104–115

    Google Scholar 

  • Lindsten F, Ohlsson H, Ljung L (2011) Just relax and come clustering!: a convexification of k-means clustering. Linköping University Electronic Press, Linköping

    Google Scholar 

  • Lu C, Yan S, Lin Z (2016) Convex sparse spectral clustering: single-view to multi-view. IEEE Trans Image Process 25(6):2833–2843

    MathSciNet  MATH  Google Scholar 

  • Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE ACM Trans Comput Biol Bioinf 1(1):24–45

    Google Scholar 

  • Mateos G, Giannakis GB (2012) Robust PCA as bilinear decomposition with outlier-sparsity regularization. IEEE Trans Signal Process 60(10):5176–5190

    MathSciNet  MATH  Google Scholar 

  • Meng D, Zhao Q, Xu Z (2012) Improve robustness of sparse PCA by L1-norm maximization. Pattern Recognit 45(1):487–497

    MATH  Google Scholar 

  • Nagorski J, Allen GI (2016) Genomic region detection via spatial convex clustering. arXiv preprint arXiv:1611.04696

  • Nie F, Wang H, Cai X et al (2012) Robust matrix completion via joint schatten p-norm and lp-norm minimization. In: 2012 IEEE 12th international conference on data mining (ICDM), IEEE, pp 566–574

  • Oliveira JVD, Pedrycz W et al (2007) Advances in fuzzy clustering and its applications. Wiley, New York

    Google Scholar 

  • Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239

    Google Scholar 

  • Poddar S, Jacob M (2018) Clustering of data with missing entries. arXiv preprint arXiv:1801.01455

  • Tachikawa T, Yatabe K, Ikeda Y, et al (2016) Sound source localization based on sparse estimation and convex clustering. In: Proceedings of meetings on acoustics 172ASA, ASA, vol 29, no 1, p 055004

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (2011) Regression shrinkage and selection via the lasso: a retrospective. J R Stat Soc Ser B (Stat Methodol) 73(3):273–282

    MathSciNet  MATH  Google Scholar 

  • Tošić I, Frossard P (2011) Dictionary learning. IEEE Signal Process Mag 28(2):27–38

    MATH  Google Scholar 

  • Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494

    MathSciNet  MATH  Google Scholar 

  • Wang S, Liu D, Zhang Z (2013) Nonconvex relaxation approaches to robust matrix recovery. In: International joint conferences on artificial intelligence (IJCAI), pp 1764–1770

  • Wang B, Zhang Y, Sun W et al (2016) Sparse convex clustering. J Comput Gr Stat. https://doi.org/10.1080/10618600.2017.1377081

    MathSciNet  Google Scholar 

  • Wang Q, Gong P, Chang S et al (2017) Robust convex clustering analysis. In: IEEE international conference on data mining

  • Weylandt M, Nagorski J, Allen GI (2019) Dynamic visualization and fast computation for convex clustering via algorithmic regularization. J Comput Gr Stat. https://doi.org/10.1080/10618600.2019.1629943

  • Yuan Y, Sun D, Toh KC (2018) An efficient semismooth Newton based algorithm for convex clustering. arXiv preprint arXiv:1802.07091

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    MathSciNet  MATH  Google Scholar 

  • Zhang H, Zha ZJ, Yan S, Wang M, Chua TS (2012) Robust non-negative graph embedding: towards noisy data, unreliable graphs, and noisy labels. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2464–2471

  • Zhao Y, Zhu E, Xinwang LIU et al (2019) Simultaneous clustering and optimization for evolving datasets. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2923239

  • Zhu C, Xu H, Leng C et al (2014) Convex optimization procedure for clustering: theoretical revisit. In: Advances in neural information processing systems (NIPS), pp 1619–1627

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) under the Grant Nos. 61732006 and 61672281, as well as the Key Program of NSFC under Grant No. 61472186.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songcan Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by A. Di Nola.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Hereafter we provide the proof that the objective function of the RCC in Eq. (4) is strict convex.

First, we present the definition of strict convexity as follows.

Definition 1

A function \( f : {\mathbb{R}}^{n} \to {\mathbb{R}} \) is strictly convex, if \( {\text{dom }}\, f \) is a convex set and if for all \( \varvec{x},\varvec{ y} \in {\text{dom}} f \), \( f(\theta \varvec{x} + (1 - \theta )\varvec{y}) < \theta f(\varvec{x}) + \left( {1 - \theta } \right)f(\varvec{y}) \) whenever \( \varvec{x} \ne \varvec{y} \) and \( 0 < \theta < 1 \) (Boyd and Vandenberghe 2004).

Proof

As is known previously in Eq. (4), \( \varvec{U} \in {\mathbb{R}}^{p \times n} \) and \( \varvec{O} \in {\mathbb{R}}^{p \times n} \)

Let \( \tilde{\varvec{U}} = \left[ {\varvec{U},\varvec{O}} \right] \in {\mathbb{R}}^{p \times 2n} \), \( {\tilde{\mathbf{v}}}_{i} = \left[ {\begin{array}{*{20}c} {{\mathbf{y}}_{n(i)} } \\ {{\mathbf{0}}_{n} } \\ \end{array} } \right] \in {\mathbb{R}}^{2n \times 1} \), \( {\hat{\mathbf{v}}}_{i} = \left[ {\begin{array}{*{20}c} {{\mathbf{0}}_{n} } \\ {{\mathbf{y}}_{n(i)} } \\ \end{array} } \right] \in {\mathbb{R}}^{2n \times 1} \), \( {\tilde{\mathbf{I}}} = \left[ {\begin{array}{*{20}c} {{\mathbf{I}}_{n} } \\ {{\mathbf{I}}_{n} } \\ \end{array} } \right] \).

Subsequently, the problem in Eq. (4) can be rewritten as:

$$ F_{\gamma } \left( {\tilde{\varvec{U}}} \right) = \frac{1}{2}\left\| {{\mathbf{X}} - {{\tilde{\varvec{U}}\tilde{I}}}} \right\|_{F}^{2} + \gamma_{1} \sum\nolimits_{i < j} {w_{ij} \left\| {{\tilde{\varvec{U}}(\tilde{v}}_{i} - \tilde{\varvec{v}}_{j} )} \right\|_{2} } + \gamma_{2} \sum\nolimits_{i = 1}^{n} {\left\| {{\tilde{\varvec{U}}\hat{v}}_{i} } \right\|_{1} } $$
(11)

Therefore, proving the convexity of problem (4) turns into proving that Eq. (11) is strictly convex in \( \tilde{\varvec{U}} \). Applying Eq. (11) to Definition 1 for \( \forall \theta \in \left( {0,1} \right) \) and assuming \( \varvec{U}_{1} \ne \varvec{U}_{2} \), we further have

$$ F_{\gamma } \left( {\theta \tilde{\varvec{U}}_{1} \varvec{ + }\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right) - \theta F_{\gamma } \left( {\tilde{\varvec{U}}_{1} } \right) - \left( {1 - \theta } \right)F_{\gamma } \left( {\tilde{\varvec{U}}_{2} } \right) = T_{1} + T_{2} + T_{3} $$
(12)

where

$$ \begin{aligned} T_{1} & = tr\left\{ {{\tilde{\mathbf{I}}}^{T} } \right.\left[ \theta \right.\tilde{\varvec{U}}_{1}^{T} \left( {\theta \tilde{\varvec{U}}_{1} + \left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right) + \left( {1 - \theta } \right)\tilde{\varvec{U}}_{2}^{T} \left( {\theta \tilde{\varvec{U}}_{1} + \left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right) - \left. {\theta \tilde{\varvec{U}}_{1}^{T} \tilde{\varvec{U}}_{1} - \left( {1 - \theta } \right)\tilde{\varvec{U}}_{2}^{T} \tilde{\varvec{U}}_{2} } \right]{\tilde{\mathbf{I}}} \\ T_{2} & = \gamma_{1} \sum\limits_{i < j} {w_{ij} \left\| {\left( {\theta \tilde{\varvec{U}}_{1} \varvec{ + }\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right)(\tilde{\varvec{v}}_{i} - \tilde{\varvec{v}}_{j} )} \right\|_{2} } - \gamma_{1} \theta \sum\limits_{i < j} {w_{ij} \left\| {\tilde{\varvec{U}}_{1} (\tilde{\varvec{v}}_{i} - \tilde{\varvec{v}}_{j} )} \right\|_{2} } - \gamma_{1} \left( {1 - \theta } \right)\sum\limits_{i < j} {w_{ij} \left\| {\tilde{\varvec{U}}_{2} (\tilde{\varvec{v}}_{i} - \tilde{\varvec{v}}_{j} )} \right\|_{2} } \\ T_{3} & = \gamma_{2} \sum\limits_{i = 1}^{n} {\left\| {\left( {\theta \tilde{\varvec{U}}_{1} \varvec{ + }\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right)\hat{\varvec{v}}_{i} } \right\|_{1} } - \gamma_{2} \theta \sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{1} \hat{\varvec{v}}_{i} } \right\|_{1} } - \gamma_{2} \left( {1 - \theta } \right)\sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{2} \hat{\varvec{v}}_{i} } \right\|_{1} } \\ \end{aligned} $$

In terms of the absolutely homogeneous and subadditive properties of the norms, T3 in Eq. (12) is non-positive:

$$ T_{3} \le \gamma_{2} \sum\limits_{i = 1}^{n} {\left\| {\theta \tilde{\varvec{U}}_{1} \hat{\varvec{v}}_{i} } \right\|_{1} } + \gamma_{2} \sum\limits_{i = 1}^{n} {\left\| {\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} \hat{\varvec{v}}_{i} } \right\|_{1} } - \gamma_{2} \theta \sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{1} \hat{\varvec{v}}_{i} } \right\|_{1} } - \gamma_{2} \left( {1 - \theta } \right)\sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{2} \hat{\varvec{v}}_{i} } \right\|_{1} } = \gamma_{2} \theta \sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{1} \hat{\varvec{v}}_{i} } \right\|_{1} } + \gamma_{2} \left( {1 - \theta } \right)\sum\limits_{i = 1}^{n} {\left\| {\tilde{\varvec{U}}_{2} \hat{\varvec{v}}_{i} } \right\|_{1} } = 0 $$

Similarly, T2 in Eq. (12) is also non-positive. Thus, Eq. (12) can be simplified to

$$ \begin{aligned} & F_{\gamma } \left( {\theta \tilde{\varvec{U}}_{1} \varvec{ + }\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right) - \theta F_{\gamma } \left( {\tilde{\varvec{U}}_{1} } \right) - \left( {1 - \theta } \right)F_{\gamma } \left( {\tilde{\varvec{U}}_{2} } \right) \\ & \quad \le tr\left\{ { - \theta \left( {1 - \theta } \right){\tilde{\mathbf{I}}}^{T} } \right.\left( {\tilde{\varvec{U}}_{1} } \right. - \left. {\tilde{\varvec{U}}_{2} } \right)^{T} \left( {\tilde{\varvec{U}}_{1} } \right. - \left. {\tilde{\varvec{U}}_{2} } \right)\left. {{\tilde{\mathbf{I}}}} \right\} \\ & \quad = - \theta \left( {1 - \theta } \right)tr\left\{ {{\tilde{\mathbf{I}}}^{T} } \right.\left( {\tilde{\varvec{U}}_{1} } \right. - \left. {\tilde{\varvec{U}}_{2} } \right)^{T} \left( {\tilde{\varvec{U}}_{1} } \right. - \left. {\tilde{\varvec{U}}_{2} } \right)\left. {{\tilde{\mathbf{I}}}} \right\} \\ & \quad = - \theta \left( {1 - \theta } \right)\left\| {\left( {\tilde{\varvec{U}}_{1} } \right. - \left. {\tilde{\varvec{U}}_{2} } \right){\tilde{\mathbf{I}}}} \right\|_{F}^{2} < 0 \\ \end{aligned} $$
(.)

At last, we obtain that

$$ F_{\gamma } \left( {\theta \tilde{\varvec{U}}_{1} \varvec{ + }\left( {1 - \theta } \right)\tilde{\varvec{U}}_{2} } \right) < \theta F_{\gamma } \left( {\tilde{\varvec{U}}_{1} } \right)\varvec{ + }\left( {1 - \theta } \right)F_{\gamma } \left( {\tilde{\varvec{U}}_{2} } \right) $$

That is, \( F_{\gamma } (\tilde{\varvec{U}}) \) is convex in \( \tilde{\varvec{U}} \). Equivalently, \( F_{\gamma } (\varvec{U},\varvec{O}) \) is jointly convex in \( \left\{ {\varvec{U},\varvec{O}} \right\} \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quan, Z., Chen, S. Robust convex clustering. Soft Comput 24, 731–744 (2020). https://doi.org/10.1007/s00500-019-04471-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04471-9

Keywords

Navigation