# A novel heuristic algorithm to solve penalized regression-based clustering model

- 5 Downloads

## Abstract

Penalized regression-based clustering model (PRClust) is an extension of “sum-of-norms” clustering model. Three previously proposed heuristic algorithms for solving PRClust are: (1) DC-CD, which combines the difference of convex programming (DC) and a coordinate-wise descent algorithm (CD), (2) DC-ADMM, which combines DC with the alternating direction method of multipliers (ADMM), and (3) ALT, which uses alternate optimization. DC-CD uses \( p \times \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables to solve PRClust, where *n* is the number of data and *p* is the number of their features. In each iteration of DC-CD, these slack variables and cluster centers are updated using a second-order cone programming (SOCP). DC-ADMM uses \( p \times n \times \left( {n - 1} \right) \) scalar slack variables. In each iteration of DC-ADMM, these slack variables and cluster centers are updated with a standard ADMM. In this paper, first, PRClust is reformulated into an equivalent model. Then, a novel heuristic algorithm is proposed to solve the reformulated model. Our proposed algorithm needs only \( \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Therefore, updating slack variables in our proposed algorithm is less time-consuming than that of DC-CD and DC-ADMM. Our proposed algorithm updates only cluster centers using an unconstrained convex quadratic problem. Therefore, our proposed unconstrained convex quadratic problem is much smaller than the SOCP of DC-CD which is used to update both cluster centers and slack variables. Meanwhile, ALT updates cluster centers using a SOCP, while our proposed algorithm updates cluster centers using an unconstrained convex quadratic problem with the same number of variables. Solving an unconstrained convex quadratic problem is less time-consuming than a SOCP with the same number of variables. Our experimental results on 12 datasets confirm that the runtime of our proposed algorithm is better than that of DC-ADMM and DC-CD.

## Keywords

“Sum-of-norms” (SON) clustering Penalized regression-based clustering (PRClust) DC-CD DC-ADMM## Notes

### Compliance with ethical standards

### Conflict of interest

Zohreh Zendehdel declares that she has no conflict of interest. Yahya Forghani declares that he has no conflict of interest.

### Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

## References

- Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035Google Scholar
- Barati M, Jalali M, Forghani YJES (2019) Alternating optimization to solve penalized regression‐based clustering model. p e12462Google Scholar
- Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121CrossRefGoogle Scholar
- Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69CrossRefGoogle Scholar
- Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228CrossRefGoogle Scholar
- Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367CrossRefGoogle Scholar
- Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Graph Stat 24(4):994–1013MathSciNetCrossRefGoogle Scholar
- de Amorim RC (2015) Feature relevance in ward’s hierarchical clustering using the L p norm. J Classif 32(1):46–62MathSciNetCrossRefGoogle Scholar
- Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332MathSciNetCrossRefGoogle Scholar
- Gan G, Ng MK-P (2017) K-means clustering with outlier removal. Pattern Recognit Lett 90:8–14CrossRefGoogle Scholar
- Han D, Agrawal A, Liao W-K, Choudhary A (2018) A fast DBSCAN algorithm with spark implementation. In: Big data in engineering applications. Springer, Berlin, pp 173–192Google Scholar
- Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1Google Scholar
- Ienco D, Bordogna G (2018) Fuzzy extensions of the DBScan clustering algorithm. Soft Comput 22(5):1719–1730CrossRefGoogle Scholar
- Le Thi Hoai A, Tao PD (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285CrossRefGoogle Scholar
- Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: with application to particle filter output computation. In: Statistical signal processing workshop (SSP), 2011 IEEE, pp 201–204. IEEEGoogle Scholar
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, pp 281–297Google Scholar
- Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1):303–324MathSciNetCrossRefGoogle Scholar
- Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14(1):1865–1889MathSciNetzbMATHGoogle Scholar
- Panahi A, Dubhashi D, Johansson FD, Bhattacharyya C (2017) Clustering by sum of norms: stochastic incremental algorithm, convergence and cluster recovery. In: International conference on machine learning, pp 2769–2777Google Scholar
- Pelckmans K, De Brabanter J, Suykens J, De Moor B (2005) Convex clustering shrinkage. In: PASCAL workshop on statistics and optimization of clustering workshopGoogle Scholar
- Pham T, Dang H, Le T, Le TH (2017) Fast support vector clustering. Vietnam J Comput Sci 4(1):13–21CrossRefGoogle Scholar
- Seidpisheh M, Mohammadpour A (2018) Hierarchical clustering of heavy-tailed data using a new similarity measure. Intell Data Anal 22(3):569–579CrossRefGoogle Scholar
- Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244MathSciNetCrossRefGoogle Scholar
- Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17(188):1–25MathSciNetzbMATHGoogle Scholar