Skip to main content

Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem

  • Chapter
  • First Online:
Open Problems in Optimization and Data Analysis

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 141))

Abstract

This paper presents a review of the well-known K-means, H-means, and J-means heuristics, and their variants, that are used to solve the minimum sum-of-squares clustering problem. We then develop two new local searches that combine these heuristics in a nested and sequential structure, also referred to as variable neighborhood descent. In order to show how these local searches can be implemented within a metaheuristic framework, we apply the new heuristics in the local improvement step of two variable neighborhood search (VNS) procedures. Computational experiments are carried out which suggest that this new and simple application of VNS is comparable to the state of the art. In addition, a very significant improvement (over 30%) in solution quality is obtained for the largest problem instance investigated containing 85,900 entities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC, Boca Raton (2013)

    Book  Google Scholar 

  2. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)

    Article  Google Scholar 

  3. Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)

    Article  MathSciNet  Google Scholar 

  4. Aloise, D., Damasceno, N., Mladenović, N., Pinheiro, D.: On strategies to fix degenerate k-means solutions. J. Classif. 34, 165–190 (2017)

    Article  MathSciNet  Google Scholar 

  5. Bagirov, A.M., Ordin, B., Ozturk, G., Xavier, A.E.: An incremental clustering algorithm based on hyperbolic smoothing. Comput. Optim. Appl. 61(1), 219–241 (2015)

    Article  MathSciNet  Google Scholar 

  6. Belacel, N., Hansen, P., Mladenović, N.: Fuzzy J-Means: a new heuristic for fuzzy clustering. Pattern Recogn. 35(10), 2193–2200 (2002)

    Article  Google Scholar 

  7. Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013). Prediction, Control and Diagnosis using Advanced Neural Computations

    Google Scholar 

  8. Brimberg, J., Mladenović, N.: Degeneracy in the multi-source Weber problem. Math. Program. 85(1), 213–220 (1999)

    Article  MathSciNet  Google Scholar 

  9. Brimberg, J., Mladenović, N., Todosijević, R., Urosević, D.: Less is more: solving the max-mean diversity problem with variable neighborhood search. Inf. Sci. 382–383, 179–200 (2017). https://doi.org/10.1016/j.ins.2016.12.021. http://www.sciencedirect.com/science/article/pii/S0020025516320394

    Article  Google Scholar 

  10. Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017). https://doi.org/10.1016/j.ins.2017.06.019. http://www.sciencedirect.com/science/article/pii/S0020025517307934

    Article  Google Scholar 

  11. Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)

    Article  Google Scholar 

  12. Forgey, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)

    Google Scholar 

  13. Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Program. 51(1), 141–202 (1991)

    Article  MathSciNet  Google Scholar 

  14. Hansen, P., Mladenović, N.: Variable neighborhood search for the p-median. Locat. Sci. 5(4), 207–226 (1997)

    Article  Google Scholar 

  15. Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recogn. 34(2), 405–413 (2001)

    Article  Google Scholar 

  16. Hansen, P., Mladenović, N.: First vs. best improvement: an empirical study. Discret. Appl. Math. 154(5), 802–817 (2006). IV ALIO/EURO Workshop on Applied Combinatorial Optimization

    Google Scholar 

  17. Hansen, P., E., N., B., C., N., M.: Survey and comparison of initialization methods for k-means clustering. Paper not published

    Google Scholar 

  18. Hansen, P., Jaumard, B., Mladenović, N.: Minimum sum of squares clustering in a low dimensional space. J. Classif. 15(1), 37–55 (1998)

    Article  MathSciNet  Google Scholar 

  19. Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recogn. 45(12), 4337–4345 (2012)

    Article  Google Scholar 

  20. Hansen, P., Mladenović, N., Todosijević, R., Hanafi, S.: Variable neighborhood search: basics and variants. EURO J. Comput. Optim. 1–32 (2016). https://doi.org/10.1007/s13675-016-0075-x

    Article  MathSciNet  Google Scholar 

  21. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  22. Laszlo, M., Mukherjee, S.: A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 533–543 (2006)

    Article  Google Scholar 

  23. Laszlo, M., Mukherjee, S.: A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn. Lett. 28(16), 2359–2366 (2007)

    Article  Google Scholar 

  24. Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003). Biometrics

    Article  Google Scholar 

  25. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, vol. 1, pp. 281–297 (1967)

    MathSciNet  MATH  Google Scholar 

  26. Mladenović, N.: A variable neighborhood algorithm-a new metaheuristic for combinatorial optimization. In: Papers Presented at Optimization Days, vol. 12 (1995)

    Google Scholar 

  27. Mladenović, N., Todosijević, R., Urosević, D.: Less is more: basic variable neighborhood search for minimum differential dispersion problem. Inf. Sci. 326, 160–171 (2016). https://doi.org/10.1016/j.ins.2015.07.044. http://www.sciencedirect.com/science/article/pii/S0020025515005526

    Article  Google Scholar 

  28. Ordin, B., Bagirov, A.M.: A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J. Glob. Optim. 61(2), 341–361 (2015)

    Article  MathSciNet  Google Scholar 

  29. Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)

    Article  MathSciNet  Google Scholar 

  30. Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl. Based Syst. 71, 345–365 (2014)

    Article  Google Scholar 

  31. Reinelt, G.: TSPLIB—a traveling salesman problem library. ORSA J. Comput. 3(4), 376–384 (1991)

    Article  Google Scholar 

  32. Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)

    Article  Google Scholar 

  33. Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. Eur. J. Oper. Res. 253(3), 659–672 (2016)

    Article  MathSciNet  Google Scholar 

  34. Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24(10), 1003–1008 (1991)

    Article  MathSciNet  Google Scholar 

  35. Silva, K., Aloise, D., de Souza, S.X., Mladenović, N.: Less is more: simplified Nelder-Mead method for large unconstrained optimization. Yugoslav J. Oper. Res. 28(2), 153–169 (2018). http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/609

    Article  MathSciNet  Google Scholar 

  36. Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Computers and Their Applications. E. Horwood, Chichester (1980)

    Google Scholar 

  37. Turkensteen, M., Andersen, K.A.: A Tabu Search Approach to Clustering, pp. 475–480. Springer, Berlin (2009)

    Chapter  Google Scholar 

  38. Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  39. Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Inf. Syst. Oper. Res. 21(2), 95–108 (1983)

    MATH  Google Scholar 

  40. Wishart, D.: 256. note: an algorithm for hierarchical classifications. Biometrics 25(1), 165–170 (1969)

    Article  Google Scholar 

  41. Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recogn. 44(1), 70–77 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

Thiago Pereira is grateful to CAPES-Brazil. Daniel Aloise and Nenad Mladenović were partially supported by CNPq-Brazil grants 308887/2014-0 and 400350/2014-9. This research was partially covered by the framework of the grant number BR05236839 “Development of information technologies and systems for stimulation of personality’s sustainable development as one of the bases of development of digital Kazakhstan”.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendices

Appendix 1: Small Instances

See Tables 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Table 5 Comparison of mean comportment of 10 runs on Ruspini data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 6 Comparison of mean comportment of 10 runs on Grostshel and Holland’s 202 cities coordinates data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 7 Comparison of mean comportment of 10 runs on Grostshel and Holland’s 666 cities coordinates data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 8 Comparison of mean comportment of 10 runs on Padberg and Rinaldi’s hole-drilling data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 9 Comparison of mean comportment of 10 runs on Fisher’s Iris data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 10 Comparison of mean comportment of 10 runs on Glass identification data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 11 Comparison of mean comportment of 10 runs on body measurements data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 12 Comparison of mean comportment of 10 runs on Telugu Indian vowel sounds data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 13 Comparison of mean comportment of 10 runs on concrete compressive strength data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s
Table 14 Comparison of mean comportment of 10 runs on image segmentation data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Appendix 2: Medium and Large Instances

See Tables 15, 16, 17.

Table 15 Comparison of mean comportment of 4 runs on Reinelt’s hole-drilling data set for different values of M, for two GVNS strategies and smoothing clustering algorithm; \(t_{max}=\dfrac {N}{10}\) s
Table 16 Comparison of mean comportment of 4 runs on TSPLIB3038 data set for different values of M, for two GVNS strategies and smoothing clustering algorithm; \(t_{max}=\dfrac {N}{10}\) s
Table 17 Comparison of mean comportment of 4 runs on Pla85900 data set for different values of M, for two GVNS strategies and Smoothing Clustering Algorithm; t max = 600 s

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Pereira, T., Aloise, D., Brimberg, J., Mladenović, N. (2018). Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem. In: Pardalos, P., Migdalas, A. (eds) Open Problems in Optimization and Data Analysis. Springer Optimization and Its Applications, vol 141. Springer, Cham. https://doi.org/10.1007/978-3-319-99142-9_13

Download citation

Publish with us

Policies and ethics