A novel heuristic algorithm to solve penalized regression-based clustering model

  • Shadi Hasanzadeh Tavakkoli
  • Yahya ForghaniEmail author
  • Reza Sheibani
Methodologies and Application


Penalized regression-based clustering model (PRClust) is an extension of “sum-of-norms” clustering model. Three previously proposed heuristic algorithms for solving PRClust are: (1) DC-CD, which combines the difference of convex programming (DC) and a coordinate-wise descent algorithm (CD), (2) DC-ADMM, which combines DC with the alternating direction method of multipliers (ADMM), and (3) ALT, which uses alternate optimization. DC-CD uses \( p \times \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables to solve PRClust, where n is the number of data and p is the number of their features. In each iteration of DC-CD, these slack variables and cluster centers are updated using a second-order cone programming (SOCP). DC-ADMM uses \( p \times n \times \left( {n - 1} \right) \) scalar slack variables. In each iteration of DC-ADMM, these slack variables and cluster centers are updated with a standard ADMM. In this paper, first, PRClust is reformulated into an equivalent model. Then, a novel heuristic algorithm is proposed to solve the reformulated model. Our proposed algorithm needs only \( \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Therefore, updating slack variables in our proposed algorithm is less time-consuming than that of DC-CD and DC-ADMM. Our proposed algorithm updates only cluster centers using an unconstrained convex quadratic problem. Therefore, our proposed unconstrained convex quadratic problem is much smaller than the SOCP of DC-CD which is used to update both cluster centers and slack variables. Meanwhile, ALT updates cluster centers using a SOCP, while our proposed algorithm updates cluster centers using an unconstrained convex quadratic problem with the same number of variables. Solving an unconstrained convex quadratic problem is less time-consuming than a SOCP with the same number of variables. Our experimental results on 12 datasets confirm that the runtime of our proposed algorithm is better than that of DC-ADMM and DC-CD.


“Sum-of-norms” (SON) clustering Penalized regression-based clustering (PRClust) DC-CD DC-ADMM 


Compliance with ethical standards

Conflict of interest

Zohreh Zendehdel declares that she has no conflict of interest. Yahya Forghani declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035Google Scholar
  2. Barati M, Jalali M, Forghani YJES (2019) Alternating optimization to solve penalized regression‐based clustering model. p e12462Google Scholar
  3. Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121CrossRefGoogle Scholar
  4. Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69CrossRefGoogle Scholar
  5. Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228CrossRefGoogle Scholar
  6. Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367CrossRefGoogle Scholar
  7. Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Graph Stat 24(4):994–1013MathSciNetCrossRefGoogle Scholar
  8. de Amorim RC (2015) Feature relevance in ward’s hierarchical clustering using the L p norm. J Classif 32(1):46–62MathSciNetCrossRefGoogle Scholar
  9. Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332MathSciNetCrossRefGoogle Scholar
  10. Gan G, Ng MK-P (2017) K-means clustering with outlier removal. Pattern Recognit Lett 90:8–14CrossRefGoogle Scholar
  11. Han D, Agrawal A, Liao W-K, Choudhary A (2018) A fast DBSCAN algorithm with spark implementation. In: Big data in engineering applications. Springer, Berlin, pp 173–192Google Scholar
  12. Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1Google Scholar
  13. Ienco D, Bordogna G (2018) Fuzzy extensions of the DBScan clustering algorithm. Soft Comput 22(5):1719–1730CrossRefGoogle Scholar
  14. Le Thi Hoai A, Tao PD (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285CrossRefGoogle Scholar
  15. Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: with application to particle filter output computation. In: Statistical signal processing workshop (SSP), 2011 IEEE, pp 201–204. IEEEGoogle Scholar
  16. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, pp 281–297Google Scholar
  17. Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1):303–324MathSciNetCrossRefGoogle Scholar
  18. Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14(1):1865–1889MathSciNetzbMATHGoogle Scholar
  19. Panahi A, Dubhashi D, Johansson FD, Bhattacharyya C (2017) Clustering by sum of norms: stochastic incremental algorithm, convergence and cluster recovery. In: International conference on machine learning, pp 2769–2777Google Scholar
  20. Pelckmans K, De Brabanter J, Suykens J, De Moor B (2005) Convex clustering shrinkage. In: PASCAL workshop on statistics and optimization of clustering workshopGoogle Scholar
  21. Pham T, Dang H, Le T, Le TH (2017) Fast support vector clustering. Vietnam J Comput Sci 4(1):13–21CrossRefGoogle Scholar
  22. Seidpisheh M, Mohammadpour A (2018) Hierarchical clustering of heavy-tailed data using a new similarity measure. Intell Data Anal 22(3):569–579CrossRefGoogle Scholar
  23. Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244MathSciNetCrossRefGoogle Scholar
  24. Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17(188):1–25MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Islamic Azad UniversityMashhadIran

Personalised recommendations