Data Clustering Using Grouping Hyper-heuristics

  • Anas Elhag
  • Ender Özcan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10782)


Grouping problems represent a class of computationally hard to solve problems requiring optimal partitioning of a given set of items with respect to multiple criteria varying dependent on the domain. A recent work proposed a general-purpose selection hyper-heuristic search framework with reusable components, designed for rapid development of grouping hyper-heuristics to solve grouping problems. The framework was tested only on the graph colouring problem domain. Extending the previous work, this study compares the performance of selection hyper-heuristics implemented using the framework, pairing up various heuristic/operator selection and move acceptance methods for data clustering. The selection hyper-heuristic performs the search processing a single solution at any decision point and controls a fixed set of generic low level heuristics specifically designed for the grouping problems based on a bi-objective formulation. An archive of high quality solutions, capturing the trade-off between the number of clusters and overall error of clustering, is maintained during the search process. The empirical results verify the effectiveness of a successful selection hyper-heuristic, winner of a recent hyper-heuristic challenge for data clustering on a set of benchmark problem instances.


Heuristic Multiobjective optimisation Reinforcement learning Adaptive move acceptance 


  1. 1.
    Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)zbMATHGoogle Scholar
  2. 2.
    Agustın-Blas, L.E., Salcedo-Sanz, S., Jiménez-Fernández, S., Carro-Calvo, L., Del Ser, J., Portilla-Figueras, J.A.: A new grouping genetic algorithm for clustering problems. Expert Syst. App. 39(10), 9695–9703 (2012)CrossRefGoogle Scholar
  3. 3.
    Mitra, S., Banka, H.: Multi-objective evolutionary biclustering of gene expression data. Pattern Recogn. 39(12), 2464–2477 (2006)CrossRefGoogle Scholar
  4. 4.
    Park, Y.J., Song, M.S.: A genetic algorithm for clustering problems. In: Koza, J.R., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, 22–25 July, pp. 568–575. Morgan Kaufmann (1998)Google Scholar
  5. 5.
    Burke, E.K., Gendreau, M., Hyde, M.R., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: a survey of the state of the art. JORS 64(12), 1695–1724 (2013)CrossRefGoogle Scholar
  6. 6.
    Elhag, A., Özcan, E.: A grouping hyper-heuristic framework: application on graph colouring. Expert Syst. App. 42(13), 5491–5507 (2015)CrossRefGoogle Scholar
  7. 7.
    Talbi, E.G., Bessiere, P.: A parallel genetic algorithm for the graph partitioning problem. In: Proceedings of the 5th International Conference on Supercomputing, pp. 312–320. ACM (1991)Google Scholar
  8. 8.
    Handl, J., Knowles, J.D.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)CrossRefGoogle Scholar
  9. 9.
    Ülker, Ö., Özcan, E., Korkmaz, E.E.: Linear linkage encoding in grouping problems: applications on graph coloring and timetabling. In: Burke, E.K., Rudová, H. (eds.) PATAT 2006. LNCS, vol. 3867, pp. 347–363. Springer, Heidelberg (2007). Scholar
  10. 10.
    Radcliffe, N.J.: Formal analysis and random respectful recombination. In: Proceedings of the 4th International Conference on Genetic Algorithm, pp. 222–229 (1991)Google Scholar
  11. 11.
    Radcliffe, N.J., Surry, P.D.: Fitness variance of formae and performance prediction. In: Whitley, L.D., Vose, M.D. (eds.) FOGA, pp. 51–72. Morgan Kaufmann Publishers Inc. (1994)Google Scholar
  12. 12.
    Falkenauer, E.: The grouping genetic algorithms: widening the scope of the GAs. Belg. J. Oper. Res., Stat. Comput. Sci. (JORBEL), 33(1–2), 79–102 (1992)Google Scholar
  13. 13.
    Brown, C.E., Sumichrast, R.T.: Impact of the replacement heuristic in a grouping genetic algorithm. Comput. & OR 30(11), 1575–1593 (2003)CrossRefGoogle Scholar
  14. 14.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17, 107–145 (2001)CrossRefGoogle Scholar
  15. 15.
    Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRefGoogle Scholar
  16. 16.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979)CrossRefGoogle Scholar
  17. 17.
    Rand, W.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar
  18. 18.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Inc., Upper Saddle River (1988)zbMATHGoogle Scholar
  19. 19.
    Chang, D.X., Zhang, X.D., Zheng, C.W.: A genetic algorithm with gene rearrangement for k-means clustering. Pattern Recogn. 42(7), 1210–1222 (2009)CrossRefGoogle Scholar
  20. 20.
    MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Number 14 in 1, California, USA, pp. 281–297 (1967)Google Scholar
  21. 21.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Voorhees, E.M.: The effectiveness and efficiency of agglomerative hierarchical clustering in document retrieval. Ph.D. thesis (1985)Google Scholar
  23. 23.
    Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRefGoogle Scholar
  24. 24.
    Ankerst, M., Breunig, M.M., Peter Kriegel, H., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, SIGMOD 1999, pp. 49–60. ACM Press (1999)Google Scholar
  25. 25.
    Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. 1, 24–45 (2004)CrossRefGoogle Scholar
  26. 26.
    Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn. 41(9), 2742–2756 (2008)CrossRefGoogle Scholar
  27. 27.
    Özcan, E., Misir, M., Ochoa, G., Burke, E.K.: A reinforcement learning-great-deluge hyper-heuristic for examination timetabling. Int. J. Appl. Metaheuristic Comput. 1(1), 39–59 (2010)CrossRefGoogle Scholar
  28. 28.
    Burke, E.K., Bykov, Y.: The late acceptance hill-climbing heuristic. Eur. J. Oper. Res. 258(1), 70–78 (2017)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Dueck, G.: New optimization heuristics: the great deluge algorithm and the record-to-record travel. J. Comput. Phys. 104, 86–92 (1993)CrossRefGoogle Scholar
  30. 30.
    Misir, M., Verbeeck, K., Causmaecker, P.D., Berghe, G.V.: A new hyper-heuristic as a general problem solver: an implementation in hyflex. J. Sched. 16(3), 291–311 (2013)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Burke, E., Curtois, T., Hyde, M., Kendall, G., Ochoa, G., Petrovic, S., Vazquez-Rodriguez, J.: Hyflex: a flexible framework for the design and analysis of hyper-heuristics. In: Proceedings of the Multidisciplinary International Scheduling Conference (MISTA09), pp. 790–797 (2009)Google Scholar
  32. 32.
    Bache, K., Lichman, M.: UCI machine learning repository. School of Information and Computer Science, University of California, Irvine (2013)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.ASAP Research Group, School of Computer ScienceUniversity of NottinghamNottinghamUK

Personalised recommendations