A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm

Yuan, Fang; Yang, Youlong; Yuan, Tiantian

doi:10.1007/s10489-019-01583-5

A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm

Published: 25 January 2020

Volume 50, pages 1498–1509, (2020)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

944 Accesses
8 Citations
Explore all metrics

Abstract

Among the existing clustering algorithms, the k-Means algorithm is one of the most commonly used clustering methods. As an extension of the k-Means algorithm, the k-Modes algorithm has been widely applied to categorical data clustering by replacing means with modes. However, there are more mixed-type data containing categorical, ordinal and numerical attributes. Mixed-type data clustering problem has recently attracted much attention from the data mining research community, but most of them fail to notice the ordinal attributes and establish explicit metric similarity of ordinal attributes. In this paper, the limitations of some existing dissimilarity measure of k-Modes algorithm in mixed ordinal and nominal data are analyzed by using some illustrative examples. Based on the idea of mining ordinal information of ordinal attribute, a new dissimilarity measure for the k-Modes algorithm to cluster this type of data is proposed. The distinct characteristic of the new dissimilarity measure is to take account of the ordinal information of ordinal attribute. A convergence study and time complexity of the k-Modes algorithm based on this new dissimilarity measure indicates that it can be effectively used for large data sets. The results of comparative experiments on nine real data sets from UCI show the effectiveness of the new dissimilarity measure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure

Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes

Exploiting Order Information Embedded in Ordered Categories for Ordinal Data Clustering

References

Jiang F, Liu G (2016) Initialization of K-modes clustering using outlier detection techniques. Inf Sci 332:167–183
Article MATH Google Scholar
Ding S, Du M, Sun T, et al. (2017) An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood[J]. Knowl-Based Syst 294-313:133
Google Scholar
Yu H, Chang Z, Zhou B. (2017) A novel three-way clustering algorithm for mixed-type data[C]. In: 2017 IEEE International Conference on Big Knowledge (ICBK), IEEE, pp 119–126
Noorbehbahani F, Mousavi S R, Mirzaei A. (2015) An incremental mixed data clustering method using a new distance measure[J]. Soft Comput 19:731–743
Article Google Scholar
Rajan V, Bhattacharya S (2016) Dependency clustering of mixed data with gaussian mixture copulas[C], IJCAI-16: 1967–1973
Cao F, Liang J, Li D, et al. (2012) A dissimilarity measure for the k-Modes clustering algorithm[J]. Knowl-Based Syst 26:120–127
Article Google Scholar
He Z, Xu X, Deng S (2011) Attribute value weighting in k-modes clustering[J]. Expert Syst Appl 38 (12):15365–15369
Article Google Scholar
Gates AJ, Ahn YY (2017) The impact of random models on clustering similarity[J]. J Mach Learn Res 18 (1):3049–3076
MathSciNet Google Scholar
Herawan T, Deris MM, Abawajy JH (2010) A rough set approach for selecting clustering attribute[J]. Knowl-Based Syst 23(3):220–231
Article Google Scholar
Yang P, Zhu Q (2011) Finding key attribute subset in dataset for outlier detection[J]. Knowl-Based Syst 24(2):269–274
Article Google Scholar
Ng MK, Li MJ, Huang JZ et al (2007) On the impact of dissimilarity measure in k-modes clustering algorithm[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3):503–507
Article Google Scholar
Hsu CC, Chen CL, Su YW (2007) Hierarchical clustering of mixed data based on distance hierarchy[J]. Inf Sci 177(20):4474–4492
Article Google Scholar
Hsu CC, Chen YC (2007) Mining of mixed data with application to catalog marketing[J]. Expert Syst Appl 32(1):12–23
Article Google Scholar
Gates AJ, Ahn YY (2017) The impact of random models on clustering similarity[J]. J Mach Learn Res 18 (1):3049–3076
MathSciNet Google Scholar
Parmar D, Wu T, Blackhurst J (2007) MMR: An algorithm for clustering categorical data using Rough Set Theory[J]. Data Knowl Eng 63(3):879–893
Article Google Scholar
Chen CB, Wang LY (2006) Rough set-based clustering with refinement using shannon’s entropy theory[J]. Comput Math Appl 52(10-11):1563–1576
Article MathSciNet MATH Google Scholar
Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: A comparative evaluation[C]. In: Proceedings of the 2008 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, pp 243–254
Ahmad A, Dey L (2007) A k-mean clustering algorithm for mixed numeric and categorical data[J]. Data Knowl Eng 63(2):503–527
Article Google Scholar
Li C, Biswas G (2002) Unsupervised learning with mixed numeric and nominal data[J]. IEEE Trans Knowl Data Eng 4:673–690
Article Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values[J]. Data Min Knowl Disc 2(3):283–304
Article Google Scholar
Huang Z (1997) Clustering large data sets with mixed numeric and categorical values[C]. In: Proceedings of the 1st Pacific-asia Conference on Knowledge Discovery and Data Mining,(PAKDD), pp 21–34
Gibson D, Kleinberg J, Raghavan P (1998) Clustering categorical data: an approach based on dynamical systems[J]. Databases 1:75
Google Scholar
Goodall DW (1966) A new similarity index based on probability[J]. Biometrics, 882–907
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques[M]. elsevier
Zaki MJ, Meira W Jr, Meira W (2014) Data mining and analysis: fundamental concepts and algorithms[M]. Cambridge University Press
Huang Z, Ng MK (1999) A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Trans Fuzzy Syst 7(4):446–452
Article Google Scholar
Pawlak Z (1982) Rough sets[J]. Int J Comput Inf Sci 11(5):341–356
Article MATH Google Scholar
Jiang F, Sui Y, Cao C (2008) A rough set approach to outlier detection[J]. Int J Gen Syst 37(5):519–536
Article MATH Google Scholar
Cao F, Liang J, Bai L et al (2010) A framework for clustering categorical time-evolving data[J]. IEEE Transactions on Fuzzy Systems 18(5):872–882
Article Google Scholar
Brouwer RK (2006) A method for fuzzy clustering with ordinal attributes replaced by fuzzy set parameters[C]. In: 2006 3rd International IEEE Conference on Intelligent Systems, IEEE, pp 553–558
Jian S, Cao L, Lu K, Gao H (2018) Unsupervised coupled metric similarity for non-IID categorical data. Trans Knowl Data Eng 30(9):1810–1823
Article Google Scholar
Qian Y, Li F et al (2016) Space structure and clustering of categorical data. Trans Neur Net Lear Syst 27(10):2047– 2059
Article MathSciNet Google Scholar
UCI Machine Learning Repository< http://archive.ics.uci.edu/ml/datasets.h

Download references

Acknowledgments

We would also like to thank the anonymous reviewers for their helpful suggestions. This work was supported by National Natural Science Foundation of China(61573266).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xidian University, Xi’an, 710071, People’s Republic of China
Fang Yuan & Youlong Yang
School of Management, Northwestern Polytechnical University, Xi’an, 710071, People’s Republic of China
Tiantian Yuan

Authors

Fang Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Youlong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tiantian Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Yuan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, F., Yang, Y. & Yuan, T. A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm. Appl Intell 50, 1498–1509 (2020). https://doi.org/10.1007/s10489-019-01583-5

Download citation

Published: 25 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10489-019-01583-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm

Abstract

Access this article

Similar content being viewed by others

A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure

Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes

Exploiting Order Information Embedded in Ordered Categories for Ordinal Data Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dissimilarity measure for mixed nominal and ordinal attribute data in k-Modes algorithm

Abstract

Access this article

Similar content being viewed by others

A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure

Clustering of Categorical Data Using Intuitionistic Fuzzy k-modes

Exploiting Order Information Embedded in Ordered Categories for Ordinal Data Clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation