Abstract
The clustering ensemble has emerged as a prominent method for improving robustness, stability, and accuracy of unsupervised classification solutions. It combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known as methods with high ability to solve optimization problems including clustering. To date, significant progress has been contributed to find consensus clustering that will yield better results than existing clustering. This paper presents a survey of genetic algorithms designed for clustering ensembles. It begins with the introduction of clustering ensembles and clustering ensemble algorithms. Subsequently, this paper describes a number of suggested genetic-guided clustering ensemble algorithms, in particular the genotypes, fitness functions, and genetic operations. Next, clustering accuracies among the genetic-guided clustering ensemble algorithms is compared. This paper concludes that using genetic algorithms in clustering ensemble improves the clustering accuracy and addresses open questions subject to future research.
Similar content being viewed by others
References
Analoui M, Sadighian N (2006) Solving cluster ensemble problems by correlation’s matrix & GA. IFIP Int Fed Inf Process 228: 227–231
Azimi J, Abdoos M, Analoui M (2007) A new efficient approach in clustering ensembles. In: Proceedings of the 8th international conference on intellignt data engineering and automated learning. Lecture Note Computer Science, vol 4881, pp 395–405
Azimi J, Mohammadi M, Movaghar A, Analoui M (2007) Clustering ensembles using genetic algorithm. In: The international workshop on computer architecture for machine perception and sensing, IEEE, pp 119–123
Bouchachia A (2005) Learning with hybrid data. In: Proceedings of the fifth international conference on hybrid intelligent systems. IEEE Computer Society
Chiou YC, Lan LW (2001) Genetic clustering algorithms. EJOR Eur J Oper Res 135: 413–427
Coello CAC, Van Veldhuizen DA, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems. Kluwer, Norwell
Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multi-objective optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 283–290
Deb K (2001) Multi-objective optimization using evolutionary algorithms. ISBN: 047187339X, Wiley
Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng J 809–814
Dietterich TG (1997) Machine-learning research. AI Mag J 18(4): 97–136
Du J, Korkmaz E, Alhajj R, Barker K (2004) Novel clustering approach that employs genetic algorithm with new representation scheme and multiple objectives. Data Warehousing Knowl Discov J, Springer, pp 219–228
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinf J, Oxford University Press, vol 19, no 9, pp 1090–1099
Faceli K, De Carvalho A, De Souto M (2007) Multi-objective clustering ensemble with prior knowledge. Adv Bioinf Comput Biol, Springer, pp 34–45
Falkenauer E (1994) A new representation and operators for genetic algorithms applied to grouping problems. Evol Comput 2: 123–144
Falkenauer E (1998) Genetic algorithms and grouping problems. Wiley, USA, ISBN: 0471971502
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning (ICML), vol 20, no 1, pp 186–193
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st international conference on machine learning. ACM, p 36
Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11)
Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4)
Franti P (2000) Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn Lett J 21: 61–68
Fred ALN (2001) Finding consistent cluster in data partitions. Springer, Berlin, pp 309–318
Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. Pattern Recogn J 4: 835–850
Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27: 835–850
Gablentz V, Koppen M, Dimitriadou E (2000) Robust clustering by evolutionary computation. In: Proceedings of the fifth online world conference soft computing in industrial applications (WSC5)
Garai G, Chaudhuri BB (2004) A novel genetic algorithm for automatic clustering. Pattern Recogn Lett J 25: 173–187
Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2009) A survey: clustering ensembles techniques. In: Proceedings of the international conference on computer, electrical, and systems science, and engineering (CESSE), vol 38, pp 644–653
Handl J, Knowles J (2005) Exploiting the trade-off—the benefits of multiple objectives in data clustering. In: Proceedings of the third international conference on evolutionary multi-criterion optimization. Springer, pp 547–560
Handl J, Knowles J (2006) Multi-objective clustering and cluster validation. Multi Object Mach Learn J, Springer, pp 12–47
Haupt RL, Haupt SE (1998) Practical genetic algorithms. ISBN 0-471-45565-2, Wiley Online Library
Hong Y, Kwong S (2008) To combine steady-state genetic algorithm and ensemble learning for data clustering. Pattern Recogn Lett J, Elsevier, vol 29, no 9, pp 1416–1423
Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn Soc 41(9): 2742–2756
Hruschka ER, Campello RJGB, Freitas AA, De Carvalho A (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern C Appl Rev 39(2): 133–155
Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323
Jones DR, Beltramo MA (1991) Solving partitioning problems with genetic algorithm. In: Proceedings of the fourth international conference on genetic algorithms. California University, Morgan Kaufmann Publishers, pp 442–449
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 359–392
Kellam P, Liu X, Martin NJ, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of the sixth workshop on intelligent data analysis in medicine and pharmocology, pp 56–62
Krishna K, Murty M (2002) Genetic K-means algorithm. IEEE Trans Syst Man Cybern B 29(3): 433–439
Kuncheva LI, Bezdek JC (2002) Nearest prototype classification: custering, genetic algorithms or random search?. IEEE Trans Syst Man Cybern C Appl Rev 28(1): 160–164
Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proceedings of FUSION, Citeseer, pp 105–115
Lu Y, Li S, Fotouhi F, Deng Y, Brown SJ (2004) Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinform J 5(1): 172
Luo H, Jing F, Xie X (2007) Combining multiple clusterings using information theory-based genetic algorithm. In: International conference on computational intelligence and security, IEEE, vol 1, pp 84–89
Martnez-Otzeta JM, Sierra B, Lazkano E, Astigarraga A(2006) Classifier hierarchy learning by means of genetic algorithms. Pattern Recogn Lett J, Elsevier, vol 27, no 16, pp 1998–2004
Minaei-Bidgoli B, Topchy A, Punch WF (2004) A comparison of resampling methods for clustering ensembles. In: Proceedings of the international conference on machine learning: models, technologies and applications, Michigan State University, Citeseer
Mitra S (2004) An evolutionary rough portative clustering. Pattern Recogn Lett J 25: 1439–1449
Mohammadi M, Nikanjam A, Rahmani A (2008) An evolutionary approach to clustering ensemble. In: Fourth international conference on natural computation, IEEE, vol 3, pp 77–82
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 849–856
Ozyer T, Alhajj R (2009) Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer. Appl Intell J, Springer, vol 31, no 3, pp 318–331
Qian Y, Suen CY (2000) Clustering combination method. In: Proceedings of the fifteen international conference on pattern recognition, vol 2, pp 732–735
Ramanathan K, Guan SU (2006) Recursive self-organizing maps with hybrid clustering. In: IEEE conference on cybernetics and intelligent systems, pp 1–6
Sheng W, Tucker A, Liu X (2004) Clustering with niching genetic K-means algorithm. In: Proceeding genetic and evolutionary computation conference, Springer, pp 162–173
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining partitionings. In: Proceeding of 11th national conference on artificial intelligence, pp 93–98
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Mach Learn Res J 3: 583–617
Topchy A, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: Proceeding of the third IEEE international conference on data mining (ICDM), pp 331–338
Topchy A, Jain AK, Punch WF (2004a) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining, Michigan State University
Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF (2004b) Adaptive clustering ensembles. Pattern Recogn J 1: 272–275
Topchy A, Jain AK, Punch WF (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12): 1866–1881
Vavak F, Fogarty TC (1996) Comparison of steady-state and generational genetic algorithms for use in nonstationary environments. Lecture Notes in Computer Science, Springer
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3)
Yoon HS, Ahn SY, Lee SH, Cho SB, Kim JH (2006a) Heterogeneous clustering ensemble method for combining different cluster results. Data Min Biomed Appl J, Springer, pp 82–92
Yoon HS, Lee SH, Cho SB, Kim JH (2006b) A novel framework for discovering robust cluster results. Discov Sci, Springer, pp 373–377
Yoon HS, Lee SH, Cho SB, Kim JH (2006c) Integration analysis of diverse genomic data using multi-clustering results. Biomed Med Data Anal J, Springer, pp 37–48
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghaemi, R., Sulaiman, N.b., Ibrahim, H. et al. A review: accuracy optimization in clustering ensembles using genetic algorithms. Artif Intell Rev 35, 287–318 (2011). https://doi.org/10.1007/s10462-010-9195-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-010-9195-5