Skip to main content

Advertisement

Log in

Novel dynamic k-modes clustering of categorical and non categorical dataset with optimized genetic algorithm based feature selection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Clustering is a technique that segregates a provided dataset into homogenous groups in accordance with the provided features. It aims to determine a structure in a group of unlabelled data. Cluster analysis is an unsupervised learning technology that determines the interesting patterns in data objects without class labels. K mode clustering algorithm seems to be effective in clustering categorical data due to its easy implementation and capability to handle the massive amount of data. But because of its random selectivity of initial centroids, it gives the local optimum solution. The main contribution of the paper is to evaluate the performance of clustering on the various dataset with the proposed system. The proposed method utilizes a genetic-based Metaheuristic encircle algorithm to select enriched features and novel dynamic K modes clustering based on Dimensionality Reduced PSO for clustering process with better computational time. The encircling Prey concept has been incorporated to choose the fitness function and overcome the genetic algorithm limitations in feature selection. This paper integrated the k-modes algorithm with particle swarm optimization algorithm to obtain a global optimum solution and update the initial centroid. Several dataset utilized for the evaluation of the proposed work has been found to achieve low accuracy in the previous work. But the proposed approach’s effectiveness has been proved to be better by performing a comparative analysis with the state of art methods in terms of performance metrics such as F1 score, accuracy, NMI.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466

    Article  Google Scholar 

  2. Agbaje MB, Ezugwu AE, Els R (2019) Automatic data clustering using hybrid firefly particle swarm optimization algorithm. IEEE Access 7:184963–184984

    Article  Google Scholar 

  3. Ahmadyfard A, Modares H (2008) Combining PSO and k-means to enhance data clustering. 2008 Int Symp Telecomm:688–691

  4. Alguliyev RM, Aliguliyev RM, Sukhostat LV (2020) Efficient algorithm for big data clustering on single machine. CAAI Trans Intell Technol 5:9–14

    Article  Google Scholar 

  5. Bai L, Liang J, Cao F (2020) A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters. Inform Fusion 61:36–47

    Article  Google Scholar 

  6. Cao F, Huang JZ, Liang J, Zhao X, Meng Y, Feng K et al (2017) An algorithm for clustering categorical data with set-valued features. IEEE Trans Neural Networks Learning Syst 29:4593–4606

    Article  MathSciNet  Google Scholar 

  7. Castro GT, Zárate LE, Nobre CN, Freitas HC (2019) A fast parallel K-modes algorithm for clustering nucleotide sequences to predict translation initiation sites. J Comput Biol 26:442–456

    Article  Google Scholar 

  8. Ding Y, Zhou K, Bi W (2020) Feature selection based on hybridization of genetic algorithm and competitive swarm optimizer. Soft Comput 24:1–10

    Article  Google Scholar 

  9. K. S. Dorman and R. Maitra, "An Efficient $ k $-modes Algorithm for Clustering Categorical Datasets," arXiv preprint arXiv:2006.03936, 2020.

  10. Ghany KKA, AbdelAziz AM, Soliman THA, Sewisy AAE-M (2020) A hybrid modified step whale optimization algorithm with Tabu search for data clustering. Journal of King Saud University-Computer and Information Sciences

    Google Scholar 

  11. Gupta T, Panda SP (2018) A comparison of k-means clustering algorithm and clara clustering algorithm on iris dataset. Int J Eng Technol 7:4766–4768

    Google Scholar 

  12. He H, Tan Y (2017) Automatic pattern recognition of ECG signals using entropy-based adaptive dimensionality reduction and clustering. Appl Soft Comput 55:238–252

    Article  Google Scholar 

  13. Heil J, Häring V, Marschner B, Stumpe B (2019) Advantages of fuzzy k-means over k-means clustering in the classification of diffuse reflectance soil spectra: a case study with west African soils. Geoderma 337:11–21

    Article  Google Scholar 

  14. Hou J, Zhang A (2019) Enhancing density peak clustering via density normalization. IEEE Trans Industrial Inform 16:2477–2485

    Article  Google Scholar 

  15. Islam MZ, Estivill-Castro V, Rahman MA, Bossomaier T (2018) Combining K-means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering. Expert Syst Appl 91:402–417

    Article  Google Scholar 

  16. Jadhav AN, Gomathi N (2018) WGC: hybridization of exponential grey wolf optimizer with whale optimization for data clustering. Alexandria Eng J 57:1569–1584

    Article  Google Scholar 

  17. Kumari S, Singh B (2020) Optimization of the distance between swarms using soft computing. Wirel Pers Commun:1–9

  18. Kuo R, Zheng Y, Nguyen TPQ (2021) Metaheuristic-based possibilistic fuzzy k-modes algorithms for categorical data clustering. Inf Sci 557:1–15

    Article  MathSciNet  Google Scholar 

  19. Kurniati R, Arsalan O, Ramadhana Y (2021) Initial centroid determination using genetic algorithm in data clustering. Generic 13:6–9

    Google Scholar 

  20. Lai W, Zhou M, Hu F, Bian K, Song Q (2019) A new DBSCAN parameters determination method based on improved MVO. IEEE Access 7:104085–104095

    Article  Google Scholar 

  21. Lakshmi K, Visalakshi NK, Shanthi S, Parvathavarthini S (2017) Clustering categorical data using k-modes based on cuckoo search optimization algorithm. ICTACT J Soft Computing 8

  22. Liu C, Wang X, Huang Y, Liu Y, Li R, Li Y, … Liu J (2020) A moving shape-based robust fuzzy K-modes clustering algorithm for electricity profiles. Electr Power Syst Res 187:106425

    Article  Google Scholar 

  23. Luchi D, Rodrigues AL, Varejão FM (2019) Sampling approaches for applying DBSCAN to large datasets. Pattern Recogn Lett 117:90–96

    Article  Google Scholar 

  24. Naouali S, Salem SB, Chtourou Z (2020) Uncertainty mode selection in categorical clustering using the rough set theory. Expert Syst Appl 158:113555

    Article  Google Scholar 

  25. Narayana GS, Kolli K (2020) Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimed Tools Appl 80:1–19

    Google Scholar 

  26. Narayana GS, Vasumathi D (2016) Clustering for high dimensional categorical data based on text similarity. Proceed 2nd Int Conf Commun Inform Process:17–21

  27. Narayana GS, Vasumathi D (2018) An attributes similarity-based K-medoids clustering technique in data mining. Arab J Sci Eng 43:3979–3992

    Article  Google Scholar 

  28. Nock R, Nielsen F (2006) On weighting clustering. IEEE Trans Pattern Anal Mach Intell 28:1223–1235

    Article  Google Scholar 

  29. Pal R, Yadav S, Karnwal R (2020) EEWC: energy-efficient weighted clustering method based on genetic algorithm for HWSNs. Complex Intell Syst 6:1–10

    Article  Google Scholar 

  30. Panagiotakis C (2015) Point clustering via voting maximization. J Classif 32:212–240

    Article  MathSciNet  Google Scholar 

  31. Prasanna K, Kumar MSP, Narayana GS (2011) A novel benchmark K-means clustering on continuous data. Int J Comp Sci Eng (IJCSE) 3:2974–2977

    Google Scholar 

  32. Rahnema N, Gharehchopogh FS (2020) An improved artificial bee colony algorithm based on whale optimization algorithm for data clustering. Multimed Tools Appl 79:32169–32194

    Article  Google Scholar 

  33. Sajidha S, Chodnekar SP, Desikan K (2018) Initial seed selection for K-modes clustering–a distance and density based approach. J King Saud Univ-Comp Inform Sci

  34. Sangaiah AK, Fakhry AE, Abdel-Basset M, El-henawy I (2019) Arabic text clustering using improved clustering algorithms with dimensionality reduction. Clust Comput 22:4535–4549

    Article  Google Scholar 

  35. Sekaran R, Goddumarri SN, Kallam S, Ramachandran M, Patan R, Gupta D (2021) 5G integrated Spectrum selection and Spectrum access using AI-based frame work for IoT based sensor networks. Comput Netw 186:107649

    Article  Google Scholar 

  36. Sinaga KP, Yang M-S (2020) Unsupervised K-means clustering algorithm. IEEE Access 8:80716–80727

    Article  Google Scholar 

  37. Singh T (2021) A novel data clustering approach based on whale optimization algorithm. Expert Syst 38:e12657

    Google Scholar 

  38. Wang Q, Liu R, Chen M, Li X (2021) Robust rank-constrained sparse learning: a graph-based framework for single view and Multiview clustering. IEEE Trans Cybernetics

  39. H. Wilde, V. Knight, and J. Gillard (2020) A novel initialisation based on hospital-resident assignment for the k-modes algorithm," arXiv preprint arXiv:2002.02701 .

  40. Yuan F, Yang Y, Yuan T (2020) A dissimilarity measure for mixed nominal and ordinal attribute data in k-modes algorithm. Appl Intell 50:1498–1509

    Article  Google Scholar 

  41. Zhao Y-P, Chen L, Chen CP (2020) Laplacian regularized nonnegative representation for clustering and dimensionality reduction. IEEE Trans Circ Syst Video Technol 31:1–14

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Suryanarayana.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suryanarayana, G., Prakash K, L., Mahesh, P.C.S. et al. Novel dynamic k-modes clustering of categorical and non categorical dataset with optimized genetic algorithm based feature selection. Multimed Tools Appl 81, 24399–24418 (2022). https://doi.org/10.1007/s11042-022-12126-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12126-5

Keywords

Navigation