Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem
This paper presents a review of the well-known K-means, H-means, and J-means heuristics, and their variants, that are used to solve the minimum sum-of-squares clustering problem. We then develop two new local searches that combine these heuristics in a nested and sequential structure, also referred to as variable neighborhood descent. In order to show how these local searches can be implemented within a metaheuristic framework, we apply the new heuristics in the local improvement step of two variable neighborhood search (VNS) procedures. Computational experiments are carried out which suggest that this new and simple application of VNS is comparable to the state of the art. In addition, a very significant improvement (over 30%) in solution quality is obtained for the largest problem instance investigated containing 85,900 entities.
KeywordsClustering Minimum sum-of-squares VNS K-means
Thiago Pereira is grateful to CAPES-Brazil. Daniel Aloise and Nenad Mladenović were partially supported by CNPq-Brazil grants 308887/2014-0 and 400350/2014-9. This research was partially covered by the framework of the grant number BR05236839 “Development of information technologies and systems for stimulation of personality’s sustainable development as one of the bases of development of digital Kazakhstan”.
- 7.Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013). Prediction, Control and Diagnosis using Advanced Neural ComputationsGoogle Scholar
- 9.Brimberg, J., Mladenović, N., Todosijević, R., Urosević, D.: Less is more: solving the max-mean diversity problem with variable neighborhood search. Inf. Sci. 382–383, 179–200 (2017). https://doi.org/10.1016/j.ins.2016.12.021. http://www.sciencedirect.com/science/article/pii/S0020025516320394 CrossRefGoogle Scholar
- 10.Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017). https://doi.org/10.1016/j.ins.2017.06.019. http://www.sciencedirect.com/science/article/pii/S0020025517307934 CrossRefGoogle Scholar
- 12.Forgey, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)Google Scholar
- 16.Hansen, P., Mladenović, N.: First vs. best improvement: an empirical study. Discret. Appl. Math. 154(5), 802–817 (2006). IV ALIO/EURO Workshop on Applied Combinatorial OptimizationGoogle Scholar
- 17.Hansen, P., E., N., B., C., N., M.: Survey and comparison of initialization methods for k-means clustering. Paper not publishedGoogle Scholar
- 26.Mladenović, N.: A variable neighborhood algorithm-a new metaheuristic for combinatorial optimization. In: Papers Presented at Optimization Days, vol. 12 (1995)Google Scholar
- 27.Mladenović, N., Todosijević, R., Urosević, D.: Less is more: basic variable neighborhood search for minimum differential dispersion problem. Inf. Sci. 326, 160–171 (2016). https://doi.org/10.1016/j.ins.2015.07.044. http://www.sciencedirect.com/science/article/pii/S0020025515005526 CrossRefGoogle Scholar
- 35.Silva, K., Aloise, D., de Souza, S.X., Mladenović, N.: Less is more: simplified Nelder-Mead method for large unconstrained optimization. Yugoslav J. Oper. Res. 28(2), 153–169 (2018). http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/609 MathSciNetCrossRefGoogle Scholar
- 36.Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Computers and Their Applications. E. Horwood, Chichester (1980)Google Scholar