Advertisement

Memetic Computing

, Volume 4, Issue 1, pp 49–71 | Cite as

A novel fuzzy C-means algorithm to generate diverse and desirable cluster solutions used by genetic-based clustering ensemble algorithms

  • Reza GhaemiEmail author
  • Md. Nasir Sulaiman
  • Hamidah Ibrahim
  • Norwati Mustapha
Regular Research Paper

Abstract

One of the most significant discussions in the field of machine learning today is on the clustering ensemble. The clustering ensemble combines multiple partitions generated by different clustering algorithms into a single clustering solution. Genetic algorithms are known for their high ability to solve optimization problems, especially the problem of the clustering ensemble. To date, despite the major contributions to find consensus cluster partitions with application of genetic algorithms, there has been little discussion on population initialization through generative mechanisms in genetic-based clustering ensemble algorithms as well as the production of cluster partitions with favorable fitness values in first phase clustering ensembles. In this paper, a threshold fuzzy C-means algorithm, named TFCM, is proposed to solve the problem of diversity of clustering, one of the most common problems in clustering ensembles. Moreover, TFCM is able to increase the fitness of cluster partitions, such that it improves performance of genetic-based clustering ensemble algorithms. The fitness average of cluster partitions generated by TFCM are evaluated by three different objective functions and compared against other clustering algorithms. In this paper, a simple genetic-based clustering ensemble algorithm, named SGCE, is proposed, in which cluster partitions generated by the TFCM and other clustering algorithms are used as the initial population used by the SGCE. The performance of the SGCE is evaluated and compared based on the different initial populations used. The experimental results based on eleven real world datasets demonstrate that TFCM improves the fitness of cluster partitions and that the performance of the SGCE is enhanced using initial populations generated by the TFCM.

Keywords

Fuzzy C-means algorithm Clustering ensemble Genetic-based clustering ensemble Genetic algorithms Diversity of clustering Clustering accuracy 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Azimi J, Abdoos M, Analoui M (2007) A new efficient approach in clustering ensembles. IDEAL LNCS 4881: 395–405Google Scholar
  2. 2.
    Azimi J, Mohammadi M, Movaghar A, Analoui M (2006) Clustering ensembles using genetic algorithm. In: IEEE the international workshop on computer architecture for machine perception and sensing, pp 119–123Google Scholar
  3. 3.
    Baraldi A, Blonda P (1998) A survey of fuzzy clustering algorithms for pattern recognition—part I and II. IEEE Trans Syst Man Cybern Part B Cybern 29(6): 778–801CrossRefGoogle Scholar
  4. 4.
    Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkzbMATHGoogle Scholar
  5. 5.
    Bobrowski L, Bezdek J (1991) C-means clustering with the l 1 and l norms. IEEE Trans Syst Man Cybern 21(3): 545–554MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Cannon R, Dave J, Bezdek J (1986) Efficient implementation of the fuzzy C-means clustering algorithms. IEEE Trans Pattern Anal Mach Intell 8: 248–255zbMATHCrossRefGoogle Scholar
  7. 7.
    Cheng T, Goldgof D, Hall L (1998) Fast fuzzy clustering. Fuzzy Sets Syst 93: 49–56zbMATHCrossRefGoogle Scholar
  8. 8.
    Theodoridis S, Koutroumbas K (2006) Pattern recognition, 3rd edn. Elsevier, Amsterdam, ISBN 0-12-369531-7Google Scholar
  9. 9.
    Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinform Oxf Univ 19(9): 1090–1099CrossRefGoogle Scholar
  10. 10.
    Dunn J (1974) A fuzzy relative of the ISODATA process and its use in detecting compact well separated clusters. J Cybern 3(3): 32–57MathSciNetGoogle Scholar
  11. 11.
    El-Sonbaty Y, Ismail M (1998) Fuzzy clustering for symbolic data. IEEE Trans Fuzzy Syst 6(2): 195–204CrossRefGoogle Scholar
  12. 12.
    Eschrich S, Ke J, Hall L, Goldgof D (2003) Fast accurate fuzzy clustering through data reduction. IEEE Trans Fuzzy Syst 11(2): 262–270CrossRefGoogle Scholar
  13. 13.
    Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21st international conference on machine learning, CanadaGoogle Scholar
  14. 14.
    Fischer B, Buhmann JM (2003) Bagging for path-based clustering. IEEE Trans Pattern Anal Mach Intell 25(11): 1411–1415CrossRefGoogle Scholar
  15. 15.
    Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 25(4): 513–518CrossRefGoogle Scholar
  16. 16.
    Fred ALN (2001) Finding consistent cluster in data partitions. Springer, Berlin, pp 309–318Google Scholar
  17. 17.
    Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Fourth conference on pattern recognition, IEEE Computer SocietyGoogle Scholar
  18. 18.
    Fred ALN, Jain AK (2002) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 835–850Google Scholar
  19. 19.
    Gablentz W, Koppen M (2000) Robust clustering by evolutionary computation. In: Proceedings of fifth online world conference soft computing in industrial applications (WSC5)Google Scholar
  20. 20.
    Gath I, Geva A (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7): 773–781CrossRefGoogle Scholar
  21. 21.
    Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2009) A survey: clustering ensembles techniques. Int Conf Comput Electr Syst Sci Eng (CESSE) 38: 644–653Google Scholar
  22. 22.
    Gröll L, Jäkel J (2005) A new convergence proof of fuzzy C-means. IEEE Trans Fuzzy Syst 13(5): 717–720CrossRefGoogle Scholar
  23. 23.
    Hathaway R, Bezdek J, Hu Y (2000) Generalized fuzzy c-means clustering strategies using L p norm distances. IEEE Trans Fuzzy Syst 8(5): 576–582CrossRefGoogle Scholar
  24. 24.
    Hathaway R, Bezdek J (2001) Fuzzy C-means clustering of incomplete data. IEEE Trans Syst Man Cybern 31(5): 735–744CrossRefGoogle Scholar
  25. 25.
    Haupt RL, Haupt SE (2004) Practical genetic algorithms. Wiley, New York, ISBN 0-471-45565-2Google Scholar
  26. 26.
    Honda K, Ichihashi H (2005) Regularized linear fuzzy clustering and probabilistic PCA mixture models. IEEE Trans Fuzzy Syst 13(4): 508–516CrossRefGoogle Scholar
  27. 27.
    Hong Y, Kwong S (2008) To combine steady-state genetic algorithm and ensemble learning for data clustering. Pattern Recognit Lett Elsevier J 29(9): 1416–1423CrossRefGoogle Scholar
  28. 28.
    Hong Y, Kwong S, Chang Y, Ren Q (2008) Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recognit Soc 41(9): 2742–2756zbMATHCrossRefGoogle Scholar
  29. 29.
    Hong Y, Kwong S, Xiong H, Ren Q (2008) Data clustering using virtual population based incremental learning algorithm with similarity matrix encoding strategy. ACM, GECCO, Quebec, pp 471–473Google Scholar
  30. 30.
    Höppner F, Klawonn F, Kruse R (1999) Fuzzy cluster analysis: methods for classification, data analysis and image recognition. Wiley, New YorkzbMATHGoogle Scholar
  31. 31.
    Höppner F, Klawonn F (2003) A contribution to convergence theory of fuzzy C-means and derivatives. IEEE Trans Fuzzy Syst 11(5): 682–694CrossRefGoogle Scholar
  32. 32.
    Hung M, Yang D (2001) An efficient fuzzy C-means clustering algorithm. In: Proceedings of IEEE international conference on data mining, pp 225–232Google Scholar
  33. 33.
    Jain AK, Murty MN, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3): 264–323CrossRefGoogle Scholar
  34. 34.
    Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of 6th workshop on intelligent data analysisGoogle Scholar
  35. 35.
    Kersten P (1997) Implementation issues in the fuzzy C-medians clustering algorithm. In: Proceedings of the 6th ieee international conference on fuzzy systems, vol 2, pp 957–962Google Scholar
  36. 36.
    Kolen J, Hutcheson T (2002) Surnameucing the time complexity of the fuzzy C-means algorithm. IEEE Trans Fuzzy Syst 10(2): 263–267CrossRefGoogle Scholar
  37. 37.
    Koza J (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, CambridgezbMATHGoogle Scholar
  38. 38.
    Koza J (1994) Genetic programming II: automatic discovery of reusable programs. MIT Press, CambridgezbMATHGoogle Scholar
  39. 39.
    Leski J (2003) Generalized weighted conditional fuzzy clustering. IEEE Trans Fuzzy Syst 11(6): 709–715CrossRefGoogle Scholar
  40. 40.
    Luo H, Jing F, Xie X (2006) Combining multiple clusterings using information theory based genetic algorithm. IEEE Int Conf Comput Intell Security 1: 84–89CrossRefGoogle Scholar
  41. 41.
    Michalewicz Z (1992) Genetic algorithms + data structures = evolution programs. Springer, New YorkzbMATHGoogle Scholar
  42. 42.
    Minaei B, Topchy A, Punch WF (2004) Ensembles of partitions via data resampling. In: Proceeding of international conference on information technology, ITCC 04, Las VegasGoogle Scholar
  43. 43.
    Mohammadi M, Davoodi R, Rahmani A (2007) A genetic based clustering method. In: Proceeding of 12th annual international computer society of iran computer conference (CSICC)Google Scholar
  44. 44.
    Mohammadi M, Nikanjam A, Rahmani A (2008) An evolutionary approach to clustering ensemble. IEEE four international conference on natural computation, pp 77–82Google Scholar
  45. 45.
    Pedrycz W, Waletzky J (1997) Fuzzy clustering with partial supervision. IEEE Trans Syst Man Cybern Part B Cybern 27(5): 787–795CrossRefGoogle Scholar
  46. 46.
    Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 583–617Google Scholar
  47. 47.
    Topchy A, Jain AK, Punch W (2003) Combining multiple weak clusterings. In: Proceeding of the third IEEE international conference on data miningGoogle Scholar
  48. 48.
    Topchy A, Jain AK, Punch W (2004) A mixture model for clustering ensembles. In: Proceedings of the SIAM international conference on data mining. Michigan State University, MichiganGoogle Scholar
  49. 49.
    Topchy A, Jain AK, Punch W (2005) Clustering ensembles: models of consensus and weak partitions. IEEE Trans Pattern Anal Mach Intell 27(12): 1866–1881CrossRefGoogle Scholar
  50. 50.
    Topchy A, Minaei Bidgoli B, Jain AK, Punch W (2004) Adaptive clustering ensembles. In: Proceedings of international conference on pattern recognition (ICPR), Cambridge, UK, pp 272–275Google Scholar
  51. 51.
    Trauwaert E (1987) L 1 in fuzzy clustering. In: Dodge Y (ed) Statistical data analysis based on the L 1. Elsevier Science Publishers, Amsterdam, pp 417–426Google Scholar
  52. 52.
    Wong C, Chen C, Su M (2001) A novel algorithm for data clustering. Pattern Recognit 34: 425–442zbMATHCrossRefGoogle Scholar
  53. 53.
    Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3): 645–678CrossRefGoogle Scholar
  54. 54.
    Xu R, Wunsch DC (2009) Clustering. In: IEEE press series on computational intelligence. Wiley, New YorkGoogle Scholar
  55. 55.
    Yager R, Filev D (1994) Approximate clustering via the mountain method. IEEE Trans Syst Man Cybern 24(8): 1279–1284CrossRefGoogle Scholar
  56. 56.
    Zadeh L (1965) Fuzzy sets. Inform Control 8(8): 338–353MathSciNetzbMATHCrossRefGoogle Scholar
  57. 57.
    Pacheco J (2005) A scatter search approach for the minimum sum-of-squares clustering problem. Comput Oper Res 32: 1325–1335MathSciNetzbMATHCrossRefGoogle Scholar
  58. 58.
    Sivanandam SN, Deepa SN (2008) Introduction to genetic algorithms. Springer, BerlinzbMATHGoogle Scholar
  59. 59.
  60. 60.
    Blake CL, Merz CJ (1998) UCI repository of machine learning databases, University of California, IrvineGoogle Scholar
  61. 61.
    Huijsmans DP, Sebe N (2001) Extended performance graphs for cluster retrieval. In: Proceedings of the computer society conference computer vision pattern recognition, IEEE Computer Society, vol 1, pp 1063–6919Google Scholar
  62. 62.
    Demiriz A, Bennett KP, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. Artif Neural Netw Eng 7: 809–814Google Scholar
  63. 63.
    Chen X, Ong YS, Lim MH, Tan KC (2011) A multi-facet survey on memetic computation. IEEE Trans Evol Comput 15(5): 591–607CrossRefGoogle Scholar
  64. 64.
    Ong YS, Lim MH, Zhu N, Wong KW (2006) Classification of adaptive memetic algorithms: a comparative study. IEEE Trans Syst Man Cybern Part B Cybern 36(1): 141–152CrossRefGoogle Scholar
  65. 65.
    Bosman PAN, De Jong ED (2006) Combining gradient techniques for numerical multi-objective evolutionary optimization. Proc Genet Evol Comput Conf 1: 627–634Google Scholar
  66. 66.
    Ong YS, Lim MH, Chen X (2010) Memetic computing–an overview. Res Front Art IEEE Comput Intell Mag 5(2): 24–36CrossRefGoogle Scholar
  67. 67.
    Burke E, Gustafson S, Kendall G, Krasnogor N (2002) Advanced population diversity measures in genetic programming. In: Proceedings of seventh PPSN, pp 341–350Google Scholar
  68. 68.
    Neri F, Tirronen V, Karkkainen T, Rossi T (2007) Fitness diversity based adaptation in multimeme algorithms: a comparative study. IEEE Congr Evol Comput 36: 2374–2381CrossRefGoogle Scholar
  69. 69.
    Coello Coello C, Pulido G, Montes E (2005) Current and future research trends in evolutionary multiobjective optimization. In: Information processing with evolutionary algorithms (advanced information and knowledge processing). Springer, London, pp 213–231Google Scholar
  70. 70.
    Neri F, Kotilainen N, Vapa M (2008) A memetic-neural approach to discover resources in P2P networks. In: Recent advances in evolutionary computation for combinatorial optimization, vol 153. Springer, Berlin, Germany, pp 113–129Google Scholar
  71. 71.
    Tirronen V, Neri F, Karkkainen T, Majava K, Rossi T (2007) A memetic differential evolution in filter design for defect detection in paper production. In: Proceedings of EvoWorkshops EvoCoMnet EvoFIN EvoIASP EvoINTERACTION EvoMUSART EvoSTOC EvoTransLog: applications of evolutionary computing, pp 320–329Google Scholar
  72. 72.
    Ghaemi R, Sulaiman MN, Ibrahim H, Mustapha N (2011) A review: accuracy optimization in clustering ensembles using genetic algorithms. Int J Artif Intell Rev 35(4): 287–318CrossRefGoogle Scholar
  73. 73.
    Attea BA (2010) A fuzzy multi-objective particle swarm optimization for effective data clustering. Springer, Berlin, pp 305–312Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Reza Ghaemi
    • 1
    Email author
  • Md. Nasir Sulaiman
    • 1
  • Hamidah Ibrahim
    • 1
  • Norwati Mustapha
    • 1
  1. 1.Department of Computer Science, Faculty of Computer Science and Information TechnologyUniversity Putra MalaysiaSelangorMalaysia

Personalised recommendations