Attribute weights-based clustering centres algorithm for initialising K-modes clustering

Peng, Liwen; Liu, Yongguo

doi:10.1007/s10586-018-1889-5

Attribute weights-based clustering centres algorithm for initialising K-modes clustering

Published: 16 February 2018

Volume 22, pages 6171–6179, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

411 Accesses
3 Citations
Explore all metrics

Abstract

The K-modes algorithm based on partitional clustering technology is a very popular and effective clustering method; moreover, it handles categorical data. However, the performance of the K-modes method is largely affected by the initial clustering centres. Random selection of the initial clustering centres commonly leads to non-repeatable clustering result. Hence, suitable choice of the initial clustering centres is crucial to realizing high-performance K-modes clustering. The present article develops an initialisation algorithm for K-modes. At initialisation, the distance between two instances calculated after weighting the attributes of the instances. Many studies have shown that if clustering is based only on distances or density between the instances, the clustering revolves around one centre or the outliers. Therefore, based on the attribute weights, we combine the distance and density measures to select the clustering centres. In experiments on several UCI machine learning repository benchmark datasets, the new initialisation method outperformed the existing K-modes clustering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Data clustering: application and trends

Article 27 November 2022

A review of unsupervised feature selection methods

Article 29 January 2019

References

Matas, J., Kittler, J.: Spatial and feature space clustering: applications in image analysis. In: International Conference on Computer Analysis of Images and Patterns, pp. 162–173. Springer, Berlin (1995)
Chapter Google Scholar
Hsu, C.C., Huang, Y.P.: Incremental clustering of mixed data based on distance hierarchy. Inf. Sci. 35(3), 1177–1185 (2008)
Google Scholar
Anant, R., Sunita, J., Jalal, A.S., Aanjoy, K.: A density based algorithm for discovering density varied clusters in large spatial databases. Int. J. Comput. Appl. 3(6), 1–4 (2011)
Google Scholar
Bai, L., Liang, J., Sui, C., Dang, C.: Fast global k-means clustering based on local geometrical information. Inf. Sci. 245(10), 168–180 (2013)
Article MathSciNet Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
Article Google Scholar
Gheid, Z., Challal, Y.: Efficient and privacy-preserving k-means clustering for big data mining. In: 2017 IEEE Trustcom/bigdatase/ispa, pp. 791–798 (2017)
Khanmohammadi, S., Adibeig, N., Shanehbandy, S.: An improved overlapping k-means clustering method for medical applications. Expert Syst. Appl. 67, 12–18 (2016)
Article Google Scholar
Baby, V., Chandra, N.S.: Distributed threshold k-means clustering for privacy preserving data mining. In: 2016 IEEE International Conference on Advances in Computing, Communications and Informatics, pp. 2286–2289 (2016)
Wazid, M., Das, A.K.: An efficient hybrid anomaly detection scheme using k-means clustering for wireless sensor networks. Wirel. Pers. Commun. 90(4), 1971–2000 (2016)
Article Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. Data Min. Knowl. Discov. 3, 1–8 (1997)
Google Scholar
Bai, T., Kulikowski, C.A.A., Gong, L., Yang, B., Huang, L., Zhou, C.: A global k-modes algorithm for clustering categorical data. Chin. J. Electron. 21(3), 460–465 (2012)
Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization for categorical data using multiple attribute clustering. MultiClust@ SDM (2012)
Li, T.Y., Chen, Y., Jin, Z.H., Li, Y.: Initialization of k-modes clustering for categorical data. In: 2013 IEEE International Conference on Management Science and Engineering, pp. 107–112 (2013)
Ali, D.S., Ghoneim, A., Saleh, M.: K-modes and entropy cluster centers initialization methods. In: International Conference on Operations Research and Enterprise Systems, pp. 447–454 (2017)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data. Min. Knowl. Discov. 2(3), 283–304 (1998)
Article Google Scholar
Sun, Y., Zhu, Q., Chen, Z.: An iterative initial-points refinement algorithm for categorical data clustering. Pattern Recognit. Lett. 23(7), 875–884 (2002)
Article Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Fifteenth International Conference on Machine Learning. Morgan Kaufmann Publishers Inc., pp. 91–99 (1998)
Barbara, D., Li, Y., Couto, J.: COOLCAT: an entropy-based algorithm for categorical clustering. In: DBLP, vol. 1, pp. 582–589 (2002)
Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Syst. Appl. 36(7), 10223–10228 (2009)
Article Google Scholar
Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Pacific-Asia Conference Advances in Knowledge Discovery and Data Mining, vol. 4426, pp. 972–980 (2007)
Bai, L., Liang, J., Dang, C.: An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data. Knowl.-Based Syst. 24(6), 785–795 (2011)
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-modes clustering. Pattern Recognit. Lett. 40(18), 7444–7456 (2013)
Google Scholar
Jiang, F., Liu, G., Du, J., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)
Article Google Scholar
Mahajan, P., Kandwal, R., Vijay, R.: Rough set approach in machine learning: a review. Int. J. Comput. Appl. 56(10), 1–13 (1996)
Google Scholar
Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Syst. Appl. 39(9), 8022–8029 (2012)
Article Google Scholar
Wang, C., Chen, D., Wu, C., Hu, Q.: Data compression with homomorphism in covering information systems. Int. J. Approx. Reason. 52(4), 519–525 (2011)
Article MathSciNet Google Scholar
Ntsch, I., Gediga, G.: Uncertainty measures of rough set prediction. Artif. Intell. 106(1), 109–137 (1998)
Article MathSciNet Google Scholar
Hoa, N.S., Son N.H.: Some efficient algorithms for rough set methods. In: 6th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 1541–1457 (2000)
Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. 1(1), 69–90 (1999)
Article Google Scholar

Download references

Acknowledgements

This research is supported in part by the National Science and Technology Major Project of the Ministry of Science and Technology of China under Grant 2017ZX10105003-002, the National Key Research and Development Program of China under Grant 2017YFC1703900, and the Sichuan Science and Technology Program under Grant 2018PTDJ0084.

Author information

Authors and Affiliations

Knowledge and Data Engineering Laboratory of Chinese Medicine, School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China
Liwen Peng & Yongguo Liu

Authors

Liwen Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yongguo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongguo Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, L., Liu, Y. Attribute weights-based clustering centres algorithm for initialising K-modes clustering. Cluster Comput 22 (Suppl 3), 6171–6179 (2019). https://doi.org/10.1007/s10586-018-1889-5

Download citation

Received: 17 November 2017
Revised: 13 January 2018
Accepted: 18 January 2018
Published: 16 February 2018
Issue Date: May 2019
DOI: https://doi.org/10.1007/s10586-018-1889-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute weights-based clustering centres algorithm for initialising K-modes clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attribute weights-based clustering centres algorithm for initialising K-modes clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Data clustering: application and trends

A review of unsupervised feature selection methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation