International Journal of Fuzzy Systems

, Volume 18, Issue 3, pp 339–348 | Cite as

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

  • Arkanath Pathak
  • Nikhil R. Pal


Clustering of numerical data is a very well researched problem and so is clustering of categorical data. However, when it comes to clustering of data with mixed attributes, the literature is not that rich. For numerical data, fuzzy clustering, in particular, the fuzzy c-means (FCM), is a very effective and popular algorithm, while for categorical data, use of mixture model is quite popular. In this paper, we propose a novel framework for clustering of mixed data which contains both numerical and categorical attributes. Our objective is to find the cluster substructures that are common to both the categorical and numerical data. Our formulation is inspired by the FCM algorithm (for dealing with numerical data), mixture models (for dealing with categorical data), and the collaborative clustering framework for aggregation of the two—it is an integrated approach that judiciously uses all three components. We use our algorithm on a few commonly used datasets and compare our results with those by some state of the art methods.


Fuzzy clustering Mixed data Mixture models Collaborative clustering 


  1. 1.
    Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)CrossRefGoogle Scholar
  2. 2.
    Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy models and algorithms for pattern recognition and image processing. Springer, Norwell, MA (1999)CrossRefzbMATHGoogle Scholar
  3. 3.
    Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA (1981)CrossRefzbMATHGoogle Scholar
  4. 4.
    Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)CrossRefGoogle Scholar
  5. 5.
    Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 4. Springer, New York (2006)zbMATHGoogle Scholar
  6. 6.
    Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38(7), 8684–8689 (2011)CrossRefGoogle Scholar
  7. 7.
    Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–2238 (2013)CrossRefzbMATHGoogle Scholar
  8. 8.
    Coletta, L.F., Vendramin, L., Hruschka, E.R., Campello, R.J., Pedrycz, W.: Collaborative fuzzy clustering algorithms: Some refinements and design guidelines. IEEE Trans. Fuzzy Syst. 20(3), 444–462 (2012)CrossRefGoogle Scholar
  9. 9.
    Everitt, B.S.: A finite mixture model for the clustering of mixed-mode data. Stat. Prob. Lett. 6(5), 305–309 (1988)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 773–780 (1989)CrossRefzbMATHGoogle Scholar
  11. 11.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)CrossRefzbMATHGoogle Scholar
  12. 12.
    He, Z., Xu, X., Deng, S.: Squeezer: an efficient algorithm for clustering categorical data. J. Comput. Sci. Technol. 17(5), 611–624 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. arXiv preprintarXiv:cs/0509011 (2005)
  14. 14.
    Honda, K., Ichihashi, H.: Regularized linear fuzzy clustering and probabilistic pca mixture models. IEEE Trans. Fuzzy Syst. 13(4), 508–516 (2005)CrossRefGoogle Scholar
  15. 15.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD), pp. 21–34. Singapore (1997)Google Scholar
  16. 16.
    Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: DMKD, Citeseer (1997)Google Scholar
  17. 17.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  18. 18.
    Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)CrossRefGoogle Scholar
  19. 19.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  20. 20.
    Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Syst. 30, 129–135 (2012)CrossRefGoogle Scholar
  21. 21.
    Jorgensen, M., Hunt, L.: Mixture model clustering of data sets with categorical and continuous variables. In: Proceedings of the Conference ISIS’96, Australia, pp. 375–84 (1996)Google Scholar
  22. 22.
    Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003). doi: 10.1023/A:1024016609528 CrossRefzbMATHGoogle Scholar
  23. 23.
    Pedrycz, W.: Collaborative fuzzy clustering. Pattern Recognit. Lett. 23(14), 1675–1686 (2002). doi:  10.1016/S0167-8655(02)00130-7. URL
  24. 24.
    San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–248 (2004)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Witold, P., Rai, P.: Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst. 159(18), 2399–2427 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Yang, M.S., Hwang, P.Y., Chen, D.H.: Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets Syst. 141(2), 301–317 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Zheng, Z., Gong, M., Ma, J., Jiao, L., Wu, Q.: Unsupervised evolutionary clustering algorithm for mixed type data. In: Evolutionary Computation (CEC), 2010 IEEE Congress on, pp. 1–8. IEEE (2010)Google Scholar

Copyright information

© Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Indian Institute of Technology KharagpurKharagpurIndia
  2. 2.Electronics and Communication Sciences UnitIndian Statistical InstituteCalcuttaIndia

Personalised recommendations