Advertisement

Artificial Intelligence Review

, Volume 35, Issue 4, pp 287–318 | Cite as

A review: accuracy optimization in clustering ensembles using genetic algorithms

  • Reza GhaemiEmail author
  • Nasir bin Sulaiman
  • Hamidah Ibrahim
  • Norwati Mustapha
Article

Abstract

The clustering ensemble has emerged as a prominent method for improving robustness, stability, and accuracy of unsupervised classification solutions. It combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known as methods with high ability to solve optimization problems including clustering. To date, significant progress has been contributed to find consensus clustering that will yield better results than existing clustering. This paper presents a survey of genetic algorithms designed for clustering ensembles. It begins with the introduction of clustering ensembles and clustering ensemble algorithms. Subsequently, this paper describes a number of suggested genetic-guided clustering ensemble algorithms, in particular the genotypes, fitness functions, and genetic operations. Next, clustering accuracies among the genetic-guided clustering ensemble algorithms is compared. This paper concludes that using genetic algorithms in clustering ensemble improves the clustering accuracy and addresses open questions subject to future research.

Keywords

Accuracy Clustering ensemble Genetic algorithms Unsupervised classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Analoui M, Sadighian N (2006) Solving cluster ensemble problems by correlation’s matrix & GA. IFIP Int Fed Inf Process 228: 227–231CrossRefGoogle Scholar
  2. Azimi J, Abdoos M, Analoui M (2007) A new efficient approach in clustering ensembles. In: Proceedings of the 8th international conference on intellignt data engineering and automated learning. Lecture Note Computer Science, vol 4881, pp 395–405Google Scholar
  3. Azimi J, Mohammadi M, Movaghar A, Analoui M (2007) Clustering ensembles using genetic algorithm. In: The international workshop on computer architecture for machine perception and sensing, IEEE, pp 119–123Google Scholar
  4. Bouchachia A (2005) Learning with hybrid data. In: Proceedings of the fifth international conference on hybrid intelligent systems. IEEE Computer SocietyGoogle Scholar
  5. Chiou YC, Lan LW (2001) Genetic clustering algorithms. EJOR Eur J Oper Res 135: 413–427CrossRefzbMATHMathSciNetGoogle Scholar
  6. Coello CAC, Van Veldhuizen DA, Lamont GB (2002) Evolutionary algorithms for solving multi-objective problems. Kluwer, NorwellzbMATHGoogle Scholar
  7. Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multi-objective optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 283–290Google Scholar
  8. Deb K (2001) Multi-objective optimization using evolutionary algorithms. ISBN: 047187339X, WileyGoogle Scholar
  9. Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng J 809–814Google Scholar
  10. Dietterich TG (1997) Machine-learning research. AI Mag J 18(4): 97–136Google Scholar
  11. Du J, Korkmaz E, Alhajj R, Barker K (2004) Novel clustering approach that employs genetic algorithm with new representation scheme and multiple objectives. Data Warehousing Knowl Discov J, Springer, pp 219–228Google Scholar
  12. Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinf J, Oxford University Press, vol 19, no 9, pp 1090–1099Google Scholar
  13. Faceli K, De Carvalho A, De Souto M (2007) Multi-objective clustering ensemble with prior knowledge. Adv Bioinf Comput Biol, Springer, pp 34–45Google Scholar
  14. Falkenauer E (1994) A new representation and operators for genetic algorithms applied to grouping problems. Evol Comput 2: 123–144CrossRefGoogle Scholar
  15. Falkenauer E (1998) Genetic algorithms and grouping problems. Wiley, USA, ISBN: 0471971502Google Scholar
  16. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning (ICML), vol 20, no 1, pp 186–193Google Scholar
  17. Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st international conference on machine learning. ACM, p 36Google Scholar
  18. Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11)Google Scholar
  19. Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4)Google Scholar
  20. Franti P (2000) Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn Lett J 21: 61–68CrossRefGoogle Scholar
  21. Fred ALN (2001) Finding consistent cluster in data partitions. Springer, Berlin, pp 309–318Google Scholar
  22. Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. Pattern Recogn J 4: 835–850Google Scholar
  23. Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27: 835–850CrossRefGoogle Scholar
  24. Gablentz V, Koppen M, Dimitriadou E (2000) Robust clustering by evolutionary computation. In: Proceedings of the fifth online world conference soft computing in industrial applications (WSC5)Google Scholar
  25. Garai G, Chaudhuri BB (2004) A novel genetic algorithm for automatic clustering. Pattern Recogn Lett J 25: 173–187CrossRefGoogle Scholar
  26. Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2009) A survey: clustering ensembles techniques. In: Proceedings of the international conference on computer, electrical, and systems science, and engineering (CESSE), vol 38, pp 644–653Google Scholar
  27. Handl J, Knowles J (2005) Exploiting the trade-off—the benefits of multiple objectives in data clustering. In: Proceedings of the third international conference on evolutionary multi-criterion optimization. Springer, pp 547–560Google Scholar
  28. Handl J, Knowles J (2006) Multi-objective clustering and cluster validation. Multi Object Mach Learn J, Springer, pp 12–47Google Scholar
  29. Haupt RL, Haupt SE (1998) Practical genetic algorithms. ISBN 0-471-45565-2, Wiley Online LibraryGoogle Scholar
  30. Hong Y, Kwong S (2008) To combine steady-state genetic algorithm and ensemble learning for data clustering. Pattern Recogn Lett J, Elsevier, vol 29, no 9, pp 1416–1423Google Scholar
  31. Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn Soc 41(9): 2742–2756CrossRefzbMATHGoogle Scholar
  32. Hruschka ER, Campello RJGB, Freitas AA, De Carvalho A (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern C Appl Rev 39(2): 133–155CrossRefGoogle Scholar
  33. Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323CrossRefGoogle Scholar
  34. Jones DR, Beltramo MA (1991) Solving partitioning problems with genetic algorithm. In: Proceedings of the fourth international conference on genetic algorithms. California University, Morgan Kaufmann Publishers, pp 442–449Google Scholar
  35. Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 359–392Google Scholar
  36. Kellam P, Liu X, Martin NJ, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of the sixth workshop on intelligent data analysis in medicine and pharmocology, pp 56–62Google Scholar
  37. Krishna K, Murty M (2002) Genetic K-means algorithm. IEEE Trans Syst Man Cybern B 29(3): 433–439CrossRefGoogle Scholar
  38. Kuncheva LI, Bezdek JC (2002) Nearest prototype classification: custering, genetic algorithms or random search?.   IEEE Trans Syst Man Cybern C Appl Rev 28(1): 160–164CrossRefGoogle Scholar
  39. Kuncheva LI, Hadjitodorov ST, Todorova LP (2006) Experimental comparison of cluster ensemble methods. In: Proceedings of FUSION, Citeseer, pp 105–115Google Scholar
  40. Lu Y, Li S, Fotouhi F, Deng Y, Brown SJ (2004) Incremental genetic K-means algorithm and its application in gene expression data analysis. BMC Bioinform J 5(1): 172CrossRefGoogle Scholar
  41. Luo H, Jing F, Xie X (2007) Combining multiple clusterings using information theory-based genetic algorithm. In: International conference on computational intelligence and security, IEEE, vol 1, pp 84–89Google Scholar
  42. Martnez-Otzeta JM, Sierra B, Lazkano E, Astigarraga A(2006) Classifier hierarchy learning by means of genetic algorithms. Pattern Recogn Lett J, Elsevier, vol 27, no 16, pp 1998–2004Google Scholar
  43. Minaei-Bidgoli B, Topchy A, Punch WF (2004) A comparison of resampling methods for clustering ensembles. In: Proceedings of the international conference on machine learning: models, technologies and applications, Michigan State University, CiteseerGoogle Scholar
  44. Mitra S (2004) An evolutionary rough portative clustering. Pattern Recogn Lett J 25: 1439–1449CrossRefGoogle Scholar
  45. Mohammadi M, Nikanjam A, Rahmani A (2008) An evolutionary approach to clustering ensemble. In: Fourth international conference on natural computation, IEEE, vol 3, pp 77–82Google Scholar
  46. Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 849–856Google Scholar
  47. Ozyer T, Alhajj R (2009) Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer. Appl Intell J, Springer, vol 31, no 3, pp 318–331Google Scholar
  48. Qian Y, Suen CY (2000) Clustering combination method. In: Proceedings of the fifteen international conference on pattern recognition, vol 2, pp 732–735Google Scholar
  49. Ramanathan K, Guan SU (2006) Recursive self-organizing maps with hybrid clustering. In: IEEE conference on cybernetics and intelligent systems, pp 1–6Google Scholar
  50. Sheng W, Tucker A, Liu X (2004) Clustering with niching genetic K-means algorithm. In: Proceeding genetic and evolutionary computation conference, Springer, pp 162–173Google Scholar
  51. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining partitionings. In: Proceeding of 11th national conference on artificial intelligence, pp 93–98Google Scholar
  52. Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Mach Learn Res J 3: 583–617CrossRefzbMATHMathSciNetGoogle Scholar
  53. Topchy A, Jain AK, Punch WF (2003) Combining multiple weak clusterings. In: Proceeding of the third IEEE international conference on data mining (ICDM), pp 331–338Google Scholar
  54. Topchy A, Jain AK, Punch WF (2004a) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining, Michigan State UniversityGoogle Scholar
  55. Topchy A, Minaei-Bidgoli B, Jain AK, Punch WF (2004b) Adaptive clustering ensembles. Pattern Recogn J 1: 272–275Google Scholar
  56. Topchy A, Jain AK, Punch WF (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12): 1866–1881CrossRefGoogle Scholar
  57. Vavak F, Fogarty TC (1996) Comparison of steady-state and generational genetic algorithms for use in nonstationary environments. Lecture Notes in Computer Science, SpringerGoogle Scholar
  58. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3)Google Scholar
  59. Yoon HS, Ahn SY, Lee SH, Cho SB, Kim JH (2006a) Heterogeneous clustering ensemble method for combining different cluster results. Data Min Biomed Appl J, Springer, pp 82–92Google Scholar
  60. Yoon HS, Lee SH, Cho SB, Kim JH (2006b) A novel framework for discovering robust cluster results. Discov Sci, Springer, pp 373–377Google Scholar
  61. Yoon HS, Lee SH, Cho SB, Kim JH (2006c) Integration analysis of diverse genomic data using multi-clustering results. Biomed Med Data Anal J, Springer, pp 37–48Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Reza Ghaemi
    • 1
    • 2
    Email author
  • Nasir bin Sulaiman
    • 2
  • Hamidah Ibrahim
    • 2
  • Norwati Mustapha
    • 2
  1. 1.CE DepartmentIslamic Azad UniversityTehranIran
  2. 2.Department of Computer Science, Faculty of Computer Science and Information TechnologyUniversiti Putra Malaysia (UPM)SelangorMalaysia

Personalised recommendations