Fuzzy rough clustering for categorical data

  • Shuliang Xu
  • Shenglan Liu
  • Jian Zhou
  • Lin FengEmail author
Original Article


Unlabeled categorical data is common in many applications. Because there is no geometric structure for categorical data, how to discover knowledge and patterns from unlabeled categorical data is an important problem. In this paper, a fuzzy rough clustering algorithm for categorical data is proposed. The proposed algorithm uses the partition of each attribute to calculate the granularity of each attribute and introduces information granularity to measure the significance of each attribute. It is different from traditional clustering algorithms for categorical data that the proposed algorithm can transform categorical data set into numeric data set and introduces a nonlinear dimension reduction algorithm to decrease the dimensions of data set. The proposed algorithm and the comparison algorithms are executed on real data sets. The experimental results show that the proposed algorithm outperforms the comparison algorithms on the most data sets and the results prove that the proposed algorithm is an effective clustering algorithm for categorical data sets.


Cluster analysis Rough set Categorical data Granular computing Dimension reduction 



This work was supported by National Key Research and Development Program of China (Nos.2017YFB1300200, 2017YFB1300203), National Natural Science Fund of China (Nos.61972064, 61672130, 61602082, 61627808, 91648205), the Open Program of State Key Laboratory of Software Architecture (No.SKLSAOP1701), LiaoNing Revitalization Talents Program (No. XLYC1806006), the Fundamental Research Funds for the Central Universities (Nos. DUT19RC(3)012, DUT17RC(3)071) and the development of science and technology of Guangdong province special fund project (No.2016B090910001). The authors are grateful to the editor and the anonymous reviewers for constructive comments that helped to improve the quality and presentation of this paper.


  1. 1.
    An S, Hu QH, Yu DR (2015) Robust rough sets and applications. Tsinghua University Press, TsinghuaGoogle Scholar
  2. 2.
    Andritsos P, Tsaparas P, Miller RJ, Sevcik KC (2004) Limbo: scalable clustering of categorical data. In: International conference on extending database technology. Springer, pp. 123–146Google Scholar
  3. 3.
    Cao F, Liang J, Li D, Bai L, Dang C (2012) A dissimilarity measure for the k-modes clustering algorithm. Knowl Based Syst 26:120–127CrossRefGoogle Scholar
  4. 4.
    Cao F, Liang J, Li D, Zhao X (2013) A weighting k-modes algorithm for subspace clustering of categorical data. Neurocomputing 108:23–30CrossRefGoogle Scholar
  5. 5.
    Chaturvedi A, Green PE, Caroll JD (2001) K-modes clustering. J Class 18(1):35–55MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen K, Liu L (2005) The“ best k” for entropy-based categorical data clustering. In: international conference on scientific and statistical database management, pp 253–262Google Scholar
  7. 7.
    Correa ES, Freitas AA, Johnson CG (2006) A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set. In: Proceedings of the 8th annual conference on Genetic and evolutionary computation. ACM, pp 35–42Google Scholar
  8. 8.
    Fan J, Niu Z, Liang Y, Zhao Z (2016) Probability model selection and parameter evolutionary estimation for clustering imbalanced data without sampling. Neurocomputing 211:172–181CrossRefGoogle Scholar
  9. 9.
    Fan JC, Li Y, Tang LY, Wu GK (2018) Roughpso: rough set-based particle swarm optimisation. Int J Bio-Inspir Comput 12(4):245–253CrossRefGoogle Scholar
  10. 10.
    Feng L, Xu S, Wang F, Liu S, Qiao H (2019) Rough extreme learning machine: a new classification method based on uncertainty measure. Neurocomputing 325:269–282CrossRefGoogle Scholar
  11. 11.
    Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 186–193Google Scholar
  12. 12.
    Fu L, Niu B, Zhu Z, Wu S, Li W (2012) Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152CrossRefGoogle Scholar
  13. 13.
    Gao C, Pedrycz W, Miao D (2013) Rough subspace-based clustering ensemble for categorical data. Soft Comput 17(9):1643–1658CrossRefGoogle Scholar
  14. 14.
    Gong Z, Zhang X (2017) The further investigation of variable precision intuitionistic fuzzy rough set model. Int J Mach Learn Cybern 8(5):1565–1584CrossRefGoogle Scholar
  15. 15.
    Guha S, Rastogi R, Shim K (2000) Rock: A robust clustering algorithm for categorical attributes. Information systems 25(5):345–366CrossRefGoogle Scholar
  16. 16.
    He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160Google Scholar
  17. 17.
    Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals. In: Proceedings of the 22nd international conference on World Wide Web. ACM, pp 607–618Google Scholar
  18. 18.
    Kim M, Kim I, Lee M, Jang B (2018) Worldwide emerging disease-related information extraction system from news data. In: Proceedings of the 16th ACM conference on embedded networked sensor systems. ACM, pp 331–332Google Scholar
  19. 19.
    Li C, Zhu L, Luo Z (2018) Underdetermined blind separation via rough equivalence clustering for satellite communications. In: 2018 international symposium on networks, computers and communications (ISNCC). IEEE, pp 1–5Google Scholar
  20. 20.
    Li W, Jia X, Wang L, Zhou B (2019) Multi-objective attribute reduction in three-way decision-theoretic rough set model. Int J Approx Reason 105:327–341MathSciNetCrossRefGoogle Scholar
  21. 21.
    Li Y, Li D, Wang S, Zhai Y (2014) Incremental entropy-based clustering on categorical data streams with concept drift. Knowl Based Syst 59:33–47CrossRefGoogle Scholar
  22. 22.
    Lin T, Zha H (2008) Riemannian manifold learning. IEEE Trans Pattern Anal Mach Intell 30(5):796–809CrossRefGoogle Scholar
  23. 23.
    Nath B, Bhattacharyya D, Ghosh A (2013) Incremental association rule mining: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 3(3):157–169CrossRefGoogle Scholar
  24. 24.
    Ng MK, Li MJ, Huang JZ, He Z (2007) On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Trans Pattern Anal Mach Intell 3:503–507CrossRefGoogle Scholar
  25. 25.
    Parmar D, Wu T, Blackhurst J (2007) Mmr: an algorithm for clustering categorical data using rough set theory. Data Knowl Eng 63(3):879–893CrossRefGoogle Scholar
  26. 26.
    Rekik R, Kallel I, Casillas J, Alimi AM (2018) Assessing web sites quality: a systematic literature review by text and association rules mining. Int J Inf Manag 38(1):201–216CrossRefGoogle Scholar
  27. 27.
    Song L, Tekin C, van der Schaar M (2016) Online learning in large-scale contextual recommender systems. IEEE Trans Serv Comput 9(3):433–445CrossRefGoogle Scholar
  28. 28.
    Steinbach M, Karypis G, Kumar V et al (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, vol 400. Boston, pp. 525–526Google Scholar
  29. 29.
    Tiwari AK, Shreevastava S, Som T, Shukla KK (2018) Tolerance-based intuitionistic fuzzy-rough set approach for attribute reduction. Expert Syst Appl 101:205–212CrossRefGoogle Scholar
  30. 30.
    Wang R, Wang XZ, Kwong S, Xu C (2017) Incorporating diversity and informativeness in multiple-instance active learning. IEEE Trans Fuzzy Syst 25(6):1460–1475CrossRefGoogle Scholar
  31. 31.
    Wang XZ, Wang R, Xu C (2017) Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE Trans Cybern 48(2):703–715CrossRefGoogle Scholar
  32. 32.
    Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2014) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654CrossRefGoogle Scholar
  33. 33.
    Wang XZ, Zhang T, Wang R (2019) Noniterative deep learning: incorporating restricted boltzmann machine into multilayer random weight neural networks. IEEE Trans Syst Man Cybern Syst 49(7):1299–1380CrossRefGoogle Scholar
  34. 34.
    Xie J (2016) Unsupervised learning methods and applications. Publishing Hourse of Electronics Industry, BeijingGoogle Scholar
  35. 35.
    Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678CrossRefGoogle Scholar
  36. 36.
    Yang Q, Du Pa, Wang Y, Liang B (2018) Developing a rough set based approach for group decision making based on determining weights of decision makers with interval numbers. Oper Res 18(3):757–779Google Scholar
  37. 37.
    Yao Y (2007) Decision-theoretic rough set models. In: International conference on rough sets and knowledge technology. Springer, pp 1–12Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Electronic Information and Electrical EngineeringDalian University of TechnologyDalianChina
  2. 2.School of Innovation and EntrepreneurshipDalian University of TechnologyDalianChina

Personalised recommendations