Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem

Pereira, Thiago; Aloise, Daniel; Brimberg, Jack; Mladenović, Nenad

doi:10.1007/978-3-319-99142-9_13

Thiago Pereira⁵,
Daniel Aloise⁶,
Jack Brimberg⁷ &
…
Nenad Mladenović^8,9

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 141))

1703 Accesses
2 Citations

Abstract

This paper presents a review of the well-known K-means, H-means, and J-means heuristics, and their variants, that are used to solve the minimum sum-of-squares clustering problem. We then develop two new local searches that combine these heuristics in a nested and sequential structure, also referred to as variable neighborhood descent. In order to show how these local searches can be implemented within a metaheuristic framework, we apply the new heuristics in the local improvement step of two variable neighborhood search (VNS) procedures. Computational experiments are carried out which suggest that this new and simple application of VNS is comparable to the state of the art. In addition, a very significant improvement (over 30%) in solution quality is obtained for the largest problem instance investigated containing 85,900 entities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman and Hall/CRC, Boca Raton (2013)
Book Google Scholar
Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
Article Google Scholar
Aloise, D., Hansen, P., Liberti, L.: An improved column generation algorithm for minimum sum-of-squares clustering. Math. Program. 131(1), 195–220 (2012)
Article MathSciNet Google Scholar
Aloise, D., Damasceno, N., Mladenović, N., Pinheiro, D.: On strategies to fix degenerate k-means solutions. J. Classif. 34, 165–190 (2017)
Article MathSciNet Google Scholar
Bagirov, A.M., Ordin, B., Ozturk, G., Xavier, A.E.: An incremental clustering algorithm based on hyperbolic smoothing. Comput. Optim. Appl. 61(1), 219–241 (2015)
Article MathSciNet Google Scholar
Belacel, N., Hansen, P., Mladenović, N.: Fuzzy J-Means: a new heuristic for fuzzy clustering. Pattern Recogn. 35(10), 2193–2200 (2002)
Article Google Scholar
Boussaïd, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82–117 (2013). Prediction, Control and Diagnosis using Advanced Neural Computations
Google Scholar
Brimberg, J., Mladenović, N.: Degeneracy in the multi-source Weber problem. Math. Program. 85(1), 213–220 (1999)
Article MathSciNet Google Scholar
Brimberg, J., Mladenović, N., Todosijević, R., Urosević, D.: Less is more: solving the max-mean diversity problem with variable neighborhood search. Inf. Sci. 382–383, 179–200 (2017). https://doi.org/10.1016/j.ins.2016.12.021. http://www.sciencedirect.com/science/article/pii/S0020025516320394
Article Google Scholar
Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415–416, 247–253 (2017). https://doi.org/10.1016/j.ins.2017.06.019. http://www.sciencedirect.com/science/article/pii/S0020025517307934
Article Google Scholar
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S., Bouras, A.: A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans. Emerg. Top. Comput. 2(3), 267–279 (2014)
Article Google Scholar
Forgey, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classification. Biometrics 21(3), 768–769 (1965)
Google Scholar
Grötschel, M., Holland, O.: Solution of large-scale symmetric travelling salesman problems. Math. Program. 51(1), 141–202 (1991)
Article MathSciNet Google Scholar
Hansen, P., Mladenović, N.: Variable neighborhood search for the p-median. Locat. Sci. 5(4), 207–226 (1997)
Article Google Scholar
Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recogn. 34(2), 405–413 (2001)
Article Google Scholar
Hansen, P., Mladenović, N.: First vs. best improvement: an empirical study. Discret. Appl. Math. 154(5), 802–817 (2006). IV ALIO/EURO Workshop on Applied Combinatorial Optimization
Google Scholar
Hansen, P., E., N., B., C., N., M.: Survey and comparison of initialization methods for k-means clustering. Paper not published
Google Scholar
Hansen, P., Jaumard, B., Mladenović, N.: Minimum sum of squares clustering in a low dimensional space. J. Classif. 15(1), 37–55 (1998)
Article MathSciNet Google Scholar
Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recogn. 45(12), 4337–4345 (2012)
Article Google Scholar
Hansen, P., Mladenović, N., Todosijević, R., Hanafi, S.: Variable neighborhood search: basics and variants. EURO J. Comput. Optim. 1–32 (2016). https://doi.org/10.1007/s13675-016-0075-x
Article MathSciNet Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Laszlo, M., Mukherjee, S.: A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 533–543 (2006)
Article Google Scholar
Laszlo, M., Mukherjee, S.: A genetic algorithm that exchanges neighboring centers for k-means clustering. Pattern Recogn. Lett. 28(16), 2359–2366 (2007)
Article Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003). Biometrics
Article Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, vol. 1, pp. 281–297 (1967)
MathSciNet MATH Google Scholar
Mladenović, N.: A variable neighborhood algorithm-a new metaheuristic for combinatorial optimization. In: Papers Presented at Optimization Days, vol. 12 (1995)
Google Scholar
Mladenović, N., Todosijević, R., Urosević, D.: Less is more: basic variable neighborhood search for minimum differential dispersion problem. Inf. Sci. 326, 160–171 (2016). https://doi.org/10.1016/j.ins.2015.07.044. http://www.sciencedirect.com/science/article/pii/S0020025515005526
Article Google Scholar
Ordin, B., Bagirov, A.M.: A heuristic algorithm for solving the minimum sum-of-squares clustering problems. J. Glob. Optim. 61(2), 341–361 (2015)
Article MathSciNet Google Scholar
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)
Article MathSciNet Google Scholar
Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl. Based Syst. 71, 345–365 (2014)
Article Google Scholar
Reinelt, G.: TSPLIB—a traveling salesman problem library. ORSA J. Comput. 3(4), 376–384 (1991)
Article Google Scholar
Ruspini, E.H.: Numerical methods for fuzzy clustering. Inf. Sci. 2(3), 319–350 (1970)
Article Google Scholar
Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. Eur. J. Oper. Res. 253(3), 659–672 (2016)
Article MathSciNet Google Scholar
Selim, S.Z., Alsultan, K.: A simulated annealing algorithm for the clustering problem. Pattern Recogn. 24(10), 1003–1008 (1991)
Article MathSciNet Google Scholar
Silva, K., Aloise, D., de Souza, S.X., Mladenović, N.: Less is more: simplified Nelder-Mead method for large unconstrained optimization. Yugoslav J. Oper. Res. 28(2), 153–169 (2018). http://yujor.fon.bg.ac.rs/index.php/yujor/article/view/609
Article MathSciNet Google Scholar
Spath, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Computers and Their Applications. E. Horwood, Chichester (1980)
Google Scholar
Turkensteen, M., Andersen, K.A.: A Tabu Search Approach to Clustering, pp. 475–480. Springer, Berlin (2009)
Chapter Google Scholar
Ward, J.H.J.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. Inf. Syst. Oper. Res. 21(2), 95–108 (1983)
MATH Google Scholar
Wishart, D.: 256. note: an algorithm for hierarchical classifications. Biometrics 25(1), 165–170 (1969)
Article Google Scholar
Xavier, A.E., Xavier, V.L.: Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recogn. 44(1), 70–77 (2011)
Article Google Scholar

Download references

Acknowledgements

Thiago Pereira is grateful to CAPES-Brazil. Daniel Aloise and Nenad Mladenović were partially supported by CNPq-Brazil grants 308887/2014-0 and 400350/2014-9. This research was partially covered by the framework of the grant number BR05236839 “Development of information technologies and systems for stimulation of personality’s sustainable development as one of the bases of development of digital Kazakhstan”.

Author information

Authors and Affiliations

Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
Thiago Pereira
Department of Computer and Software Engineering, Polytechnique Montréal, Montreal, QC, Canada
Daniel Aloise
Department of Mathematics and Computer Science, The Royal Military College of Canada, Kingston, ON, Canada
Jack Brimberg
Emirates College of Technologies, Abu Dhabi, UAE
Nenad Mladenović
Mathematical Institute, SASA, Belgrade, Serbia
Nenad Mladenović

Authors

Thiago Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Aloise
View author publications
You can also search for this author in PubMed Google Scholar
Jack Brimberg
View author publications
You can also search for this author in PubMed Google Scholar
Nenad Mladenović
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Industrial and Systems, Engineering Department, University of Florida, Center For Applied Optimization, Gainesville, FL, USA
Panos M. Pardalos
Industrial Logistics, ETS Institute, Lulea University of Technology, Norrbotten, Sweden
Athanasios Migdalas

Appendices

Appendix 1: Small Instances

See Tables 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Table 5 Comparison of mean comportment of 10 runs on Ruspini data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 6 Comparison of mean comportment of 10 runs on Grostshel and Holland’s 202 cities coordinates data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 7 Comparison of mean comportment of 10 runs on Grostshel and Holland’s 666 cities coordinates data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 8 Comparison of mean comportment of 10 runs on Padberg and Rinaldi’s hole-drilling data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 9 Comparison of mean comportment of 10 runs on Fisher’s Iris data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 10 Comparison of mean comportment of 10 runs on Glass identification data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 11 Comparison of mean comportment of 10 runs on body measurements data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 12 Comparison of mean comportment of 10 runs on Telugu Indian vowel sounds data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 13 Comparison of mean comportment of 10 runs on concrete compressive strength data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Table 14 Comparison of mean comportment of 10 runs on image segmentation data set for different values of M and two GVNS strategies; \(t_{max}=\dfrac {N}{2}\) s

Full size table

Appendix 2: Medium and Large Instances

See Tables 15, 16, 17.

Table 15 Comparison of mean comportment of 4 runs on Reinelt’s hole-drilling data set for different values of M, for two GVNS strategies and smoothing clustering algorithm; \(t_{max}=\dfrac {N}{10}\) s

Full size table

Table 16 Comparison of mean comportment of 4 runs on TSPLIB3038 data set for different values of M, for two GVNS strategies and smoothing clustering algorithm; \(t_{max}=\dfrac {N}{10}\) s

Full size table

Table 17 Comparison of mean comportment of 4 runs on Pla85900 data set for different values of M, for two GVNS strategies and Smoothing Clustering Algorithm; t _max = 600 s

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pereira, T., Aloise, D., Brimberg, J., Mladenović, N. (2018). Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem. In: Pardalos, P., Migdalas, A. (eds) Open Problems in Optimization and Data Analysis. Springer Optimization and Its Applications, vol 141. Springer, Cham. https://doi.org/10.1007/978-3-319-99142-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-99142-9_13
Published: 04 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99141-2
Online ISBN: 978-3-319-99142-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendices

Appendix 1: Small Instances

Appendix 2: Medium and Large Instances

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation