Advertisement

Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem

  • Thiago Pereira
  • Daniel Aloise
  • Jack Brimberg
  • Nenad Mladenović
Chapter
Part of the Springer Optimization and Its Applications book series (SOIA, volume 141)

Abstract

This paper presents a review of the well-known K-means, H-means, and J-means heuristics, and their variants, that are used to solve the minimum sum-of-squares clustering problem. We then develop two new local searches that combine these heuristics in a nested and sequential structure, also referred to as variable neighborhood descent. In order to show how these local searches can be implemented within a metaheuristic framework, we apply the new heuristics in the local improvement step of two variable neighborhood search (VNS) procedures. Computational experiments are carried out which suggest that this new and simple application of VNS is comparable to the state of the art. In addition, a very significant improvement (over 30%) in solution quality is obtained for the largest problem instance investigated containing 85,900 entities.

Keywords

Clustering Minimum sum-of-squares VNS K-means 

Notes

Acknowledgements

Thiago Pereira is grateful to CAPES-Brazil. Daniel Aloise and Nenad Mladenović were partially supported by CNPq-Brazil grants 308887/2014-0 and 400350/2014-9. This research was partially covered by the framework of the grant number BR05236839 “Development of information technologies and systems for stimulation of personality’s sustainable development as one of the bases of development of digital Kazakhstan”.

References

  1. 1.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC, Boca Raton (2013)CrossRefGoogle Scholar
  2. 2.
    Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)CrossRefGoogle Scholar
  3. 3.
    Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Aloise, D., Damasceno, N., Mladenović, N., Pinheiro, D.: On strategies to fix degenerate k-means solutions. J. Classif. 34, 165–190 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bagirov, A.M., Ordin, B., Ozturk, G., Xavier, A.E.: An incremental clustering algorithm based on hyperbolic smoothing. Comput. Optim. Appl. 61(1), 219–241 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Belacel, N., Hansen, P., Mladenović, N.: Fuzzy J-Means: a new heuristic for fuzzy clustering. Pattern Recogn. 35(10), 2193–2200 (2002)CrossRefGoogle Scholar
  7. 7.
    Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013). Prediction, Control and Diagnosis using Advanced Neural ComputationsGoogle Scholar
  8. 8.
    Brimberg, J., Mladenović, N.: Degeneracy in the multi-source Weber problem. Math. Program. 85(1), 213–220 (1999)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Brimberg, J., Mladenović, N., Todosijević, R., Urosević, D.: Less is more: solving the max-mean diversity problem with variable neighborhood search. Inf. Sci. 382–383, 179–200 (2017). https://doi.org/10.1016/j.ins.2016.12.021. http://www.sciencedirect.com/science/article/pii/S0020025516320394 CrossRefGoogle Scholar
  10. 10.
    Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017). https://doi.org/10.1016/j.ins.2017.06.019. http://www.sciencedirect.com/science/article/pii/S0020025517307934 CrossRefGoogle Scholar
  11. 11.
    Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)CrossRefGoogle Scholar
  12. 12.
    Forgey, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)Google Scholar
  13. 13.
    Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Program. 51(1), 141–202 (1991)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hansen, P., Mladenović, N.: Variable neighborhood search for the p-median. Locat. Sci. 5(4), 207–226 (1997)CrossRefGoogle Scholar
  15. 15.
    Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recogn. 34(2), 405–413 (2001)CrossRefGoogle Scholar
  16. 16.
    Hansen, P., Mladenović, N.: First vs. best improvement: an empirical study. Discret. Appl. Math. 154(5), 802–817 (2006). IV ALIO/EURO Workshop on Applied Combinatorial OptimizationGoogle Scholar
  17. 17.
    Hansen, P., E., N., B., C., N., M.: Survey and comparison of initialization methods for k-means clustering. Paper not publishedGoogle Scholar
  18. 18.
    Hansen, P., Jaumard, B., Mladenović, N.: Minimum sum of squares clustering in a low dimensional space. J. Classif. 15(1), 37–55 (1998)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recogn. 45(12), 4337–4345 (2012)CrossRefGoogle Scholar
  20. 20.
    Hansen, P., Mladenović, N., Todosijević, R., Hanafi, S.: Variable neighborhood search: basics and variants. EURO J. Comput. Optim. 1–32 (2016). https://doi.org/10.1007/s13675-016-0075-x MathSciNetCrossRefGoogle Scholar
  21. 21.
    Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)CrossRefGoogle Scholar
  22. 22.
    Laszlo, M., Mukherjee, S.: A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 533–543 (2006)CrossRefGoogle Scholar
  23. 23.
    Laszlo, M., Mukherjee, S.: A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn. Lett. 28(16), 2359–2366 (2007)CrossRefGoogle Scholar
  24. 24.
    Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003). BiometricsCrossRefGoogle Scholar
  25. 25.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, vol. 1, pp. 281–297 (1967)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Mladenović, N.: A variable neighborhood algorithm-a new metaheuristic for combinatorial optimization. In: Papers Presented at Optimization Days, vol. 12 (1995)Google Scholar
  27. 27.
    Mladenović, N., Todosijević, R., Urosević, D.: Less is more: basic variable neighborhood search for minimum differential dispersion problem. Inf. Sci. 326, 160–171 (2016). https://doi.org/10.1016/j.ins.2015.07.044. http://www.sciencedirect.com/science/article/pii/S0020025515005526 CrossRefGoogle Scholar
  28. 28.
    Ordin, B., Bagirov, A.M.: A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J. Glob. Optim. 61(2), 341–361 (2015)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl. Based Syst. 71, 345–365 (2014)CrossRefGoogle Scholar
  31. 31.
    Reinelt, G.: TSPLIB—a traveling salesman problem library. ORSA J. Comput. 3(4), 376–384 (1991)CrossRefGoogle Scholar
  32. 32.
    Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)CrossRefGoogle Scholar
  33. 33.
    Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. Eur. J. Oper. Res. 253(3), 659–672 (2016)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24(10), 1003–1008 (1991)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Silva, K., Aloise, D., de Souza, S.X., Mladenović, N.: Less is more: simplified Nelder-Mead method for large unconstrained optimization. Yugoslav J. Oper. Res. 28(2), 153–169 (2018). http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/609 MathSciNetCrossRefGoogle Scholar
  36. 36.
    Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Computers and Their Applications. E. Horwood, Chichester (1980)Google Scholar
  37. 37.
    Turkensteen, M., Andersen, K.A.: A Tabu Search Approach to Clustering, pp. 475–480. Springer, Berlin (2009)CrossRefGoogle Scholar
  38. 38.
    Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Inf. Syst. Oper. Res. 21(2), 95–108 (1983)zbMATHGoogle Scholar
  40. 40.
    Wishart, D.: 256. note: an algorithm for hierarchical classifications. Biometrics 25(1), 165–170 (1969)CrossRefGoogle Scholar
  41. 41.
    Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recogn. 44(1), 70–77 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Thiago Pereira
    • 1
  • Daniel Aloise
    • 2
  • Jack Brimberg
    • 3
  • Nenad Mladenović
    • 4
    • 5
  1. 1.Universidade Federal do Rio Grande do NorteNatalBrazil
  2. 2.Department of Computer and Software EngineeringPolytechnique MontréalMontrealCanada
  3. 3.Department of Mathematics and Computer ScienceThe Royal Military College of CanadaKingstonCanada
  4. 4.Emirates College of TechnologiesAbu DhabiUAE
  5. 5.Mathematical InstituteSASASerbia

Personalised recommendations