The field of Metaheuristics has produced a large number of algorithms for continuous, black-box optimization. In contrast, there are few standard benchmark problem sets, limiting our ability to gain insight into the empirical performance of these algorithms. Clustering problems have been used many times in the literature to evaluate optimization algorithms. However, much of this work has occurred independently on different problem instances and the various experimental methodologies used have produced results which are frequently incomparable and provide little knowledge regarding the difficulty of the problems used, or any platform for comparing and evaluating the performance of algorithms. This paper discusses sum of squares clustering problems from the optimization viewpoint. Properties of the fitness landscape are analysed and it is proposed that these problems are highly suitable for algorithm benchmarking. A set of 27 problem instances (from 4-D to 40-D), based on three well-known datasets, is specified. Baseline experimental results are presented for the Covariance Matrix Adaptation-Evolution Strategy and several other standard algorithms. A web-repository has also been created for this problem set to facilitate future use for algorithm evaluation and comparison.
This is a preview of subscription content, log in to check access.
Compliance with ethical standards
Conflict of interest
The author declares that he has no conflict of interest.
Berthier V (2015) Progressive differential evolution on clustering real world problems. In: Artificial evolution 2015, EA 2015—international conference on artificial evolution. Springer, Lyon. https://hal.inria.fr/hal-01215803
Brimberg J, Hansen P, Mladenovic N, Taillard ED (2000) Improvements and comparison of heuristics for solving the uncapacitated multisource Weber problem. Oper Res 48(3):444–460CrossRefGoogle Scholar
Chang DX, Zhang XD, Zheng CW (2009) A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recognit 42(7):1210–1222CrossRefGoogle Scholar
Du Merle O, Hansen P, Jaumard B, Mladenovic N (2000) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505MathSciNetCrossRefzbMATHGoogle Scholar
Fathian M, Amiri B, Maroosi A (2007) Application of honey-bee mating optimization algorithm on clustering. Appl Math Comput 190(2):1502–1513MathSciNetzbMATHGoogle Scholar
Gallagher M (2000) Multi-layer perceptron error surfaces: visualization, structure and modelling. PhD thesis, Department of Computer Science and Electrical Engineering, University of QueenslandGoogle Scholar
Gallagher M (2014) Clustering problems for more useful benchmarking of optimization algorithms. In: Simulated evolution and learning, (SEAL 2014). Springer, pp 131–142Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Sur 31(3):264–323CrossRefGoogle Scholar
Kanade PM, Hall LO (2007) Fuzzy ants and clustering. Syst Man Cybern Part A: IEEE Trans Syst Hum 37(5):758–769CrossRefGoogle Scholar
Kao Y, Cheng K (2006) An ACO-based clustering algorithm. In: Ant colony optimization and swarm intelligence (ANTS 2006). Springer, Berlin, pp 340–347Google Scholar
Likas A, Vlassis N, Verbeek JJ (2003) The global k-means clustering algorithm. Pattern Recognit 36(2):451–461CrossRefGoogle Scholar
Liu R, Shen Z, Jiao L, Zhang W (2010) Immunodominance based clonal selection clustering algorithm. In: 2010 IEEE Congress on Evolutionary Computation (CEC), pp 1–7Google Scholar
Macready W, Wolpert, D (1996) What makes an optimization problem hard? Technical Report. SFI-TR-95-05-046, The Santa Fe InstituteGoogle Scholar
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33(9):1455–1465CrossRefGoogle Scholar
McGeoch CC (2002) Experimental analysis of optimization algorithms. In: Pardalos PM, Resende M (eds) Handbook of applied optimization, chap 24. Oxford University Press, Oxford, pp 1044–1052Google Scholar
Pena JM, Lozano JA, Larranaga P (1999) An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognit Lett 20(10):1027–1040CrossRefGoogle Scholar
Rardin RL, Uzsoy R (2001) Experimental evaluation of heuristic optimization algorithms: a tutorial. J Heuristics 7:261–304CrossRefzbMATHGoogle Scholar