Advertisement

International Journal of Fuzzy Systems

, Volume 18, Issue 3, pp 339–348 | Cite as

Clustering of Mixed Data by Integrating Fuzzy, Probabilistic, and Collaborative Clustering Framework

Article

Abstract

Clustering of numerical data is a very well researched problem and so is clustering of categorical data. However, when it comes to clustering of data with mixed attributes, the literature is not that rich. For numerical data, fuzzy clustering, in particular, the fuzzy c-means (FCM), is a very effective and popular algorithm, while for categorical data, use of mixture model is quite popular. In this paper, we propose a novel framework for clustering of mixed data which contains both numerical and categorical attributes. Our objective is to find the cluster substructures that are common to both the categorical and numerical data. Our formulation is inspired by the FCM algorithm (for dealing with numerical data), mixture models (for dealing with categorical data), and the collaborative clustering framework for aggregation of the two—it is an integrated approach that judiciously uses all three components. We use our algorithm on a few commonly used datasets and compare our results with those by some state of the art methods.

Keywords

Fuzzy clustering Mixed data Mixture models Collaborative clustering 

References

  1. 1.
    Ahmad, A., Dey, L.: A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl. Eng. 63(2), 503–527 (2007)CrossRefGoogle Scholar
  2. 2.
    Bezdek, J., Keller, J., Krishnapuram, R., Pal, N.: Fuzzy models and algorithms for pattern recognition and image processing. Springer, Norwell, MA (1999)CrossRefMATHGoogle Scholar
  3. 3.
    Bezdek, J.C.: Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell, MA (1981)CrossRefMATHGoogle Scholar
  4. 4.
    Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2), 191–203 (1984)CrossRefGoogle Scholar
  5. 5.
    Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 4. Springer, New York (2006)MATHGoogle Scholar
  6. 6.
    Chatzis, S.P.: A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional. Expert Syst. Appl. 38(7), 8684–8689 (2011)CrossRefGoogle Scholar
  7. 7.
    Cheung, Y.M., Jia, H.: Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number. Pattern Recognit. 46(8), 2228–2238 (2013)CrossRefMATHGoogle Scholar
  8. 8.
    Coletta, L.F., Vendramin, L., Hruschka, E.R., Campello, R.J., Pedrycz, W.: Collaborative fuzzy clustering algorithms: Some refinements and design guidelines. IEEE Trans. Fuzzy Syst. 20(3), 444–462 (2012)CrossRefGoogle Scholar
  9. 9.
    Everitt, B.S.: A finite mixture model for the clustering of mixed-mode data. Stat. Prob. Lett. 6(5), 305–309 (1988)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 773–780 (1989)CrossRefMATHGoogle Scholar
  11. 11.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)CrossRefMATHGoogle Scholar
  12. 12.
    He, Z., Xu, X., Deng, S.: Squeezer: an efficient algorithm for clustering categorical data. J. Comput. Sci. Technol. 17(5), 611–624 (2002)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    He, Z., Xu, X., Deng, S.: Clustering mixed numeric and categorical data: a cluster ensemble approach. arXiv preprintarXiv:cs/0509011 (2005)
  14. 14.
    Honda, K., Ichihashi, H.: Regularized linear fuzzy clustering and probabilistic pca mixture models. IEEE Trans. Fuzzy Syst. 13(4), 508–516 (2005)CrossRefGoogle Scholar
  15. 15.
    Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, (PAKDD), pp. 21–34. Singapore (1997)Google Scholar
  16. 16.
    Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: DMKD, Citeseer (1997)Google Scholar
  17. 17.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl. Discov. 2(3), 283–304 (1998)CrossRefGoogle Scholar
  18. 18.
    Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Syst. 7(4), 446–452 (1999)CrossRefGoogle Scholar
  19. 19.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  20. 20.
    Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowledge-Based Syst. 30, 129–135 (2012)CrossRefGoogle Scholar
  21. 21.
    Jorgensen, M., Hunt, L.: Mixture model clustering of data sets with categorical and continuous variables. In: Proceedings of the Conference ISIS’96, Australia, pp. 375–84 (1996)Google Scholar
  22. 22.
    Modha, D.S., Spangler, W.S.: Feature weighting in k-means clustering. Mach. Learn. 52(3), 217–237 (2003). doi: 10.1023/A:1024016609528 CrossRefMATHGoogle Scholar
  23. 23.
    Pedrycz, W.: Collaborative fuzzy clustering. Pattern Recognit. Lett. 23(14), 1675–1686 (2002). doi:  10.1016/S0167-8655(02)00130-7. URL http://www.sciencedirect.com/science/article/pii/S0167865502001307
  24. 24.
    San, O.M., Huynh, V.N., Nakamori, Y.: An alternative extension of the k-means algorithm for clustering categorical data. Int. J. Appl. Math. Comput. Sci. 14(2), 241–248 (2004)MathSciNetMATHGoogle Scholar
  25. 25.
    Witold, P., Rai, P.: Collaborative clustering with the use of fuzzy c-means and its quantification. Fuzzy Sets Syst. 159(18), 2399–2427 (2008)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Yang, M.S., Hwang, P.Y., Chen, D.H.: Fuzzy clustering algorithms for mixed feature variables. Fuzzy Sets Syst. 141(2), 301–317 (2004)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Zheng, Z., Gong, M., Ma, J., Jiao, L., Wu, Q.: Unsupervised evolutionary clustering algorithm for mixed type data. In: Evolutionary Computation (CEC), 2010 IEEE Congress on, pp. 1–8. IEEE (2010)Google Scholar

Copyright information

© Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Indian Institute of Technology KharagpurKharagpurIndia
  2. 2.Electronics and Communication Sciences UnitIndian Statistical InstituteCalcuttaIndia

Personalised recommendations