Nonsmooth Optimization Based Algorithms in Cluster Analysis

  • Adil M. Bagirov
  • Ehsan Mohebi


Cluster analysis is an important task in data mining. It deals with the problem of organization of a collection of objects into clusters based on a similarity measure. Various distance functions can be used to define the similarity measure. Cluster analysis problems with the similarity measure defined by the squared Euclidean distance, which is also known as the minimum sum-of-squares clustering, has been studied extensively over the last five decades. However, problems with the L 1 and L norms have attracted less attention. In this chapter, we consider a nonsmooth nonconvex optimization formulation of the cluster analysis problems. This formulation allows one to easily apply similarity measures defined using different distance functions. Moreover, an efficient incremental algorithm can be designed based on this formulation to solve the clustering problems. We develop incremental algorithms for solving clustering problems where the similarity measure is defined using the L 1, L 2 and L norms. We also consider different algorithms for solving nonsmooth nonconvex optimization problems in cluster analysis. The proposed algorithms are tested using several real world data sets and compared with other similar algorithms.


Cluster analysis Nonsmooth optimization Nonconvex optimization Partition clustering Incremental algorithm k-means algorithm Similarity measure 


  1. 1.
    Aggarwal C, Hinneburg A, Keim D (2001) On the surprising behavior of distance metrics in high dimensional space. In: ICDT ’01 Proceedings of the 8th international conference on database theory, pp 420–434Google Scholar
  2. 2.
    Al-Sultan K (1995) A tabu search approach to the clustering problem. Pattern Recogn 28(9):1443–1451CrossRefGoogle Scholar
  3. 3.
    Arthur D, Vassilvitskii S (2007) k-means + +: the advantages of careful seeding. In: Bansal N, Pruhs K, Stein C (eds) SODA ’07 Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. SIAM, Miami, pp 1027–1035Google Scholar
  4. 4.
    Astorino A, Fuduli A (2007) Nonsmooth optimization techniques for semi-supervised classification. IEEE Trans Pattern Anal Mach Intell 29:2135–2142CrossRefGoogle Scholar
  5. 5.
    Astorino A, Fuduli A, Gorgone E (2008) Non-smoothness in classification problems. Optim Methods Software 23:675–688CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Bache K, Lichman M (2013) UCI machine learning repository. URL
  7. 7.
    Bagirov AM (1999) Minimization methods for one class of nonsmooth functions and calculation of semi-equilibrium prices. In: Eberhard A, et al (eds) Progress in optimization: contribution from Australasia. Kluwer Academic, Norwell, pp 147–175CrossRefGoogle Scholar
  8. 8.
    Bagirov AM (2003) Continuous subdifferential approximations and their applications. J Math Sci 115(5):2567–2609CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Bagirov AM (2008) Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recogn 41(10):3192–3199CrossRefMATHGoogle Scholar
  10. 10.
    Bagirov AM, Al Nuaimat A, Sultanova N (2013) Hyperbolic smoothing function method for minimax problems. Optimization 62(6):759–782CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Bagirov AM, Karasozen B, Sezer M (2008) Discrete gradient method: Derivative-free method for nonsmooth optimization. J Optim Theory Appl 137:317–334CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Bagirov AM, Rubinov AM, Soukhoroukova N, Yearwood J (2003) Unsupervised and supervised data classification via nonsmooth and global optimization. Top 11:1–93CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Bagirov AM, Rubinov AM, Yearwood J (2002) A Global Optimization Approach to Classification. Optim Eng 3(2):129–155CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Bagirov AM, Ugon J (2006) Piecewise partially separable functions and a derivative-free algorithm for large scale nonsmooth optimization. J Global Optim 35:163–195CrossRefMATHMathSciNetGoogle Scholar
  15. 15.
    Bagirov AM, Ugon J, Webb D (2011) Fast modified global k-means algorithm for incremental cluster construction. Pattern Recogn 44(4):866–876CrossRefMATHGoogle Scholar
  16. 16.
    Bagirov AM, Yearwood J (2006) A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. Eur J Oper Res 170(2):578–596CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Ball GH, Hall DJ (1967) A clustering technique for summarizing multivariate data. Behav Sci 12(2):153–155CrossRefGoogle Scholar
  18. 18.
    Bock HH (1998) Clustering and neural networks. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin and Heidelberg, pp 265–277CrossRefGoogle Scholar
  19. 19.
    Bradley P, Fayyad U (1998) Refining initial points for k-means clustering. In: Proc. of the 15th int. conf. on machine learning, pp 91–99Google Scholar
  20. 20.
    Brown D, Huntley C (1992) A practical application of simulated annealing to clustering. Pattern Recogn 25(4):401–412CrossRefGoogle Scholar
  21. 21.
    Cao F, Liang j, Jiang G (2009) An initialization method for the k-means algorithm using neighborhood model. Comput Math Appl 58(3):474–483Google Scholar
  22. 22.
    Carmichael J, Sneath P (1969) Taxometric maps. Syst Zool 18:402–415CrossRefGoogle Scholar
  23. 23.
    Carrizosa E, Romero Morales D (2013) Supervised classification and mathematical optimization. Comput Oper Res 40:150–165CrossRefMathSciNetGoogle Scholar
  24. 24.
    Celebi ME, Kingravi H (2012) Deterministic initialization of the k-means algorithm using hierarchical clustering. Int J Pattern Recogn Artif Intell 26(7):1250,018Google Scholar
  25. 25.
    Celebi ME, Kingravi HA, Vela PA (2013) A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst Appl 40:200–210CrossRefGoogle Scholar
  26. 26.
    Cheng CH (1995) A branch and bound clustering algorithm. IEEE Trans Syst Man Cybern 25(5):895–898CrossRefGoogle Scholar
  27. 27.
    Clarke F (1983) Optimization and nonsmooth analysis. Canadian mathematical society series of monographs and advanced texts. Wiley, New YorkGoogle Scholar
  28. 28.
    Dhillon IS, James F, Guan Y (2001) Efficient clustering of very large document collections. In: Grossman R, Kamath C, Kegelmeyer P, Kumar V, Namburu R (eds) Data mining for scientific and engineering applications. Kluwer Academic, Norwell, pp 357–382CrossRefGoogle Scholar
  29. 29.
    Diehr G (1985) Evaluation of a branch and bound algorithm for clustering. SIAM J Sci Stat Comput 6(2):268–284CrossRefMATHGoogle Scholar
  30. 30.
    Doherty K, Adams R, Davey N (2004) Non-Euclidean norms and data normalisation. In: Proceedings of ESANN, pp 181–186Google Scholar
  31. 31.
    Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769Google Scholar
  32. 32.
    Ghorbani M (2005) Maximum entropy-based fuzzy clustering by using l1-norm space. Turk J Math 29:431–438MATHMathSciNetGoogle Scholar
  33. 33.
    Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams: Theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528CrossRefGoogle Scholar
  34. 34.
    Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Programm 79(1-3):191–215Google Scholar
  35. 35.
    Hansen P, Mladenovic N, Perez-Britos D (2001) Variable neighborhood decomposition search. J Heuristics 7(4):335–350CrossRefMATHGoogle Scholar
  36. 36.
    Hansen P, Ngai E, Cheung B, Mladenovic N (2005) Analysis of global k-means an incremental heuristic for minimum sum-of-squares clustering. J Classification 22(2):287–310CrossRefMathSciNetGoogle Scholar
  37. 37.
    Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J Roy Stat Soc C (Appl Stat) 28(1):100–108MATHGoogle Scholar
  38. 38.
    Jajuga K (1987) A clustering method based on the L 1-norm. Comput Stat Data Anal 5(4): 357–371CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Jalali A, Srebro N (2012) Clustering using Max-norm Constrained Optimization. CoRR abs/1202.5Google Scholar
  40. 40.
    Kaufman L, Rousseeuw P (1990) Finding groups in data: an introduction to cluster analysis. Wiley series in probability and statistics. WileyCrossRefGoogle Scholar
  41. 41.
    Lai JZC, Huang TJ (2010) Fast global k-means clustering using cluster membership and inequality. Pattern Recogn 43(5):1954–1963CrossRefMATHMathSciNetGoogle Scholar
  42. 42.
    Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recogn 36(2):451–461CrossRefGoogle Scholar
  43. 43.
    Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inform Theory 28(2):129–137CrossRefMATHMathSciNetGoogle Scholar
  44. 44.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Cam LML, Neyman J (eds) Proc. of the fifth Berkeley symposium on mathematical statistics and probability, University of California Press, vol 1, pp 281–297Google Scholar
  45. 45.
    du Merle O, Hansen P, Jaumard B, Mladenovic N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505CrossRefMathSciNetGoogle Scholar
  46. 46.
    Ordin B, Bagirov AM (2014) A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J Global Optim URL 10.1007/s10898-014-0171-5 Google Scholar
  47. 47.
    Pelleg D, Moore A (2000) X-means: Extending k-means with efficient estimation of the number of clusters. In: Langley P (ed) ICML’00 Proceedings of the seventeenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 727–734Google Scholar
  48. 48.
    Ramos G, Hatakeyama Y, Dong F, Hirota K (2009) Hyperbox clustering with Ant Colony Optimization (HACO) method and its application to medical risk profile recognition. Appl Soft Comput 9(2):632–640CrossRefGoogle Scholar
  49. 49.
    Reinelt G (1991) TSPLIB- a traveling salesman problem library. ORSA J Comput 3(4): 376–384CrossRefMATHGoogle Scholar
  50. 50.
    Sedgewick R, Wayne K (2007) Introduction to programming in Java. Addison-Wesley, URL
  51. 51.
    Selim SZ, Al-Sultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recogn 24(10):1003–1008CrossRefMathSciNetGoogle Scholar
  52. 52.
    Su T, Dy JG (2007) In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intell Data Anal 11(4):319–338Google Scholar
  53. 53.
    Sun L, Xie Y, Song X, Wang J, Yu R (1994) Cluster analysis by simulated annealing. Comput Chem 18(2):103–108CrossRefMATHGoogle Scholar
  54. 54.
    Venkateswarlu N, Raju P (1992) Fast isodata clustering algorithms. Pattern Recogn 25(3): 335–342CrossRefGoogle Scholar
  55. 55.
    Xavier AE, Oliveira AAFD (2005) Optimal covering of plane domains by circles via hyperbolic smoothing. J Global Optim 31(3):493–504CrossRefMATHMathSciNetGoogle Scholar
  56. 56.
    Yang M, Hung W (2006) Alternative fuzzy clustering algorithms with l1-norm and covariance matrix. Adv Concepts Intell Vis 4179:654–665CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Science, School of Science, Information Technology and EngineeringFederation University AustraliaVictoriaAustralia

Personalised recommendations