Rough Set Based Fuzzy K-Modes for Categorical Data

  • Indrajit Saha
  • Jnanendra Prasad Sarkar
  • Ujjwal Maulik
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7677)


With the growing demand of categorical data clustering, a new hybrid clustering algorithm, namely Rough set based Fuzzy K-Modes, is proposed in this paper. The principles of rough and fuzzy sets are used in integrated form. It gives the better handling of uncertainty, vagueness, and incompleteness in class definition, while using the concept of lower and upper approximations of rough, on the other hand, the membership function of fuzzy sets enables efficient handling of overlapping partitions. Superiority of the proposed method over state-of-the-art methods is demonstrated quantitatively. For this purpose, two artificial and two real life categorical data sets are used. Also statistical significance test has been carried out to establish the statistical significance of the proposed clustering results.


Categorical attributes fuzzy clustering rough set statistical test 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain, A.K., Dubes, R.C.: Data clustering: A review. ACM Computing Surveys 31 (1999)Google Scholar
  2. 2.
    Maulik, U., Saha, I.: Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery. Pattern Recognition 42(9), 2135–2149 (2009)zbMATHCrossRefGoogle Scholar
  3. 3.
    Saha, I., Maulik, U., Bandyopadhyay, S.: An Improved Multi-objective Technique for Fuzzy Clustering with Application to IRS Image Segmentation. In: Giacobini, M., Brabazon, A., Cagnoni, S., Di Caro, G.A., Ekárt, A., Esparcia-Alcázar, A.I., Farooq, M., Fink, A., Machado, P. (eds.) EvoWorkshops 2009. LNCS, vol. 5484, pp. 426–431. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Saha, I., Maulik, U., Plewczynski, D.: A new multi-objective technique for differential fuzzy clustering. Applied Soft Computing 11(2), 2765–2776 (2011)CrossRefGoogle Scholar
  5. 5.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)Google Scholar
  6. 6.
    Huang, Z.: Extension of k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, 283–304 (1998)CrossRefGoogle Scholar
  7. 7.
    Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems 7(4) (1999)Google Scholar
  8. 8.
    Gan, G., Wu, J., Yang, Z.: A genetic fuzzy k-Modes algorithm for clustering categorical data. Expert Systems with Applications 36, 1615–1620 (2009)CrossRefGoogle Scholar
  9. 9.
    Kaufman, L., Roussenw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY (1990)CrossRefGoogle Scholar
  10. 10.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Resoning About Data. Kluwer Academic, MA (1992)Google Scholar
  11. 11.
    Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. International Journal of General Systems 17(2/3), 191–209 (1990)zbMATHCrossRefGoogle Scholar
  12. 12.
    Lingras, P., West, C.: Interval set clustering of web users with rough k-means. Journal of Intelligent Information Systems 23(1), 5–16 (2004)zbMATHCrossRefGoogle Scholar
  13. 13.
    Mitra, S., Banka, H., Pedrycz, W.: Rough-fuzzy collaborative clustering. IEEE Transactions on Systems, Man, and Cybernetics - Part B 36(4), 795–805 (2006)CrossRefGoogle Scholar
  14. 14.
    Maji, P., Pal, S.K.: Rough set based generalized fuzzy c-means algorithm and quantitative indices. IEEE Transactions on Systems, Man, and Cybernetics - Part B 37(6), 1529–1540 (2007)CrossRefGoogle Scholar
  15. 15.
    Maulik, U., Bandyopadhyay, S., Saha, I.: Integrating clustering and supervised learning for categorical data analysis. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 40, 664–675 (2010)CrossRefGoogle Scholar
  16. 16.
    Parmar, D., Wu, T., Blackhurst, J.: MMR: An algorithm for clustering categorical data using rough set theory. Data and Knowledge Engineering 63, 879–893 (2007)CrossRefGoogle Scholar
  17. 17.
    Vermeulen-Jourdan, L., Dhaenens, C., Talbi, E.-G.: Clustering Nominal and Numerical Data: A New Distance Concept for a Hybrid Genetic Algorithm. In: Gottlieb, J., Raidl, G.R. (eds.) EvoCOP 2004. LNCS, vol. 3004, pp. 220–229. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Jardine, N., Sibson, R.: Mathematical Taxonomy. John Wiley and Sons (1971)Google Scholar
  19. 19.
    Mukhopadhyay, A., Bandyopadhyay, S., Maulik, U.: Clustering using multi-objective genetic algorithm and its application to image segmentation. In: Proc. IEEE International Conference on Systems, Man and Cybernetics (SMC 2006), vol. 3, pp. 2678–2683 (2006)Google Scholar
  20. 20.
    Ferguson, G.A., Takane, Y.: Statistical analysis in psychology and education (2005)Google Scholar
  21. 21.
    He, Z., Xu, X., Deng, S.: Attribute value weighting in k-modes clustering. Expert Systems with Applications 38, 15365–15369 (2011)CrossRefGoogle Scholar
  22. 22.
    Cao, F., Liang, J., Li, D., Bai, L., Dang, C.: A dissimilarity measure for the k-Modes clustering algorithm. Knowledge-Based Systems 26, 120–127 (2012)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Indrajit Saha
    • 1
  • Jnanendra Prasad Sarkar
    • 1
  • Ujjwal Maulik
    • 1
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations