Abstract
Clustering is considered as one of the important methods in data mining. The performance of the K-means algorithm, as one of the most common clustering methods, is high sensitivity to the initial cluster centers. Hence, selecting appropriate initial cluster centers for implementing the algorithm improves clustering resulted from the algorithm. The present study aims to find suitable initial cluster centers for the K-means. In fact, the initial cluster centers should be selected in such a way that clusters with high separation and high density can be obtained. Therefore, in this paper, finding initial cluster centers is considered as a multi-objective optimization problem through maximizing the distance between the initial cluster centers, as well as the neighbor density of the initial cluster centers. Solving the above problem through using the MOPSO algorithm provided a set of initial cluster centers of the candidate. Then, the hesitant fuzzy sets were used to evaluate the clusters generated from initial cluster centers by considering separation, cohesion and silhouette index. After that, the concept of informational energy of hesitant fuzzy sets is used, by which non-dominated particles in the Pareto optimal set were ranked and the initial cluster centers were selected for starting the K-means algorithm. The proposed HFSMOOK-means method was compared with several clustering algorithms by considering common and widely used criteria. The results indicated the successful performance of HFSMOOK-means in the majority of the datasets compared to the other algorithms.
Similar content being viewed by others
References
Han, J.; Kamber, M.; Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2012)
Jain, A.K.: Data clustering: 50 years beyond \(K\)-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Coello, C.A.C.; Pulido, G.T.; Lechuga, M.S.: Handling multiple objectives with particle swarm optimization. IEEE Trans. Evol. Comput. 8(3), 256–279 (2004)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965)
Torra, V.: Hesitant fuzzy sets. Int. J. Intell. Syst. 25(6), 529–539 (2010)
Tan, P.N.; Steinbach, M.; Kumar, V.: Introduction to Data Mining. Pearson Addison Wesley, Boston (2005)
Arthur, D.; Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Aggarwal, S.; Singh, P.: Cuckoo, Bat and Krill Herd based \(k\)-means++ clustering algorithms. Clust. Comput. 22, 14169–14180 (2019)
Likas, A.; Vlassis, N.; Verbeek, J.J.: The global \(k\)-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003)
Tzortzis, G.; Likas, A.: The MinMax \(k\)-means clustering algorithm. Pattern Recogn. 47(7), 2505–2516 (2014)
Wang, X.; Bai, Y.: The Global Minmax \(k\)-Means Algorithm, p. 1665. Springer, New York (2016)
Kushwaha, N.; Pant, M.; Kant, S.; Jain, V.K.: Magnetic optimization algorithm for data clustering. Pattern Recogn. Lett. 115, 59–65 (2018)
Gu, X.; Angelov, P.; Zhao, Z.: A distance-type-insensitive clustering approach. Appl. Soft Comput. 77, 622–634 (2019)
Salem, S.B.; Naouali, S.; Chtourou, Z.: A fast and effective partitional clustering algorithm for large categorical datasets using a \(k\)-means based approach. Comput. Electr. Eng. 68, 463–483 (2018)
Zhao, Y.; Ming, Y.; Liu, X.; Zhu, E.; Zhao, K.; Yin, J.: Large-scale \(k\)-means clustering via variance reduction. Neurocomputing 307, 184–194 (2018)
Majhi, S.K.; Biswal, S.: Optimal cluster analysis using hybrid \(K\)-means and ant lion optimizer. Karbala Int J Mod Sci 4(4), 347–360 (2018)
Manochandar, S.; Punniyamoorthy, M.; Jeyachitra, R.: Development of new seed with modified validity measures for \(k\)-means clustering. Comput. Ind. Eng. 84, 106290 (2020)
Peng, H.; Shi, P.; Wang, J.; Riscos-Núñez, A.; Pérez-Jiménez, M.J.: Multiobjective fuzzy clustering approach based on tissue-like membrane systems. Knowl. Based Syst. 125, 74–82 (2017)
Deb, K.: Multi Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001)
Coello, C.A.C.; Lamont, G.B.: Applications of Multi-objective Evolutionary Algorithms, vol. 1. World Scientific, Singapore (2004)
Shi, Y.; Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), pp. 69–73. IEEE (1998)
Xia, M.; Xu, Z.: Hesitant fuzzy information aggregation in decision making. Int. J. Approx. Reason. 52(3), 395–407 (2011)
Chen, N.; Xu, Z.; Xia, M.: Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis. Appl. Math. Model. 37(4), 2197–2211 (2013)
Dua, D.; Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Fränti, P.; Sieranoja, S.: \(K\)-means properties on six clustering benchmark datasets (2018). http://cs.uef.fi/sipu/datasets/
Filho, T.M.S.; Pimentel, B.A.; Souza, R.M.; Oliveira, A.L.: Hybrid methods for fuzzy clustering based on fuzzy \(c\)-means and improved particle swarm optimization. Expert Syst. Appl. 42(17), 6315–6328 (2015)
Vinh, N.X.; Epps, J.; Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11(Oct), 2837–2854 (2010)
Zainuddin, Z.; Pauline, O.: An effective fuzzy \(C\)-means algorithm based on symmetry similarity approach. Appl. Soft Comput. 35, 433–448 (2015)
Malinen, M.I.; Mariescu-Istodor, R.; Fränti, P.: \(K\)-means: clustering by gradual data transformation. Pattern Recogn. 47(10), 3376–3386 (2014)
Chen, S.; Xu, Z.; Tang, Y.: A hybrid clustering algorithm based on fuzzy \(C\)-means and improved particle swarm optimization. Arab. J. Sci. Eng. 39(12), 8875–8887 (2014)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7(Jan), 1–30 (2006)
Wang, H.; Wang, W.; Zhou, X.; Sun, H.; Zhao, J.; Yu, X.; Cui, Z.: Firefly algorithm with neighborhood attraction. Inf. Sci. 382, 374–387 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Rezaei, K., Rezaei, H. HFSMOOK-Means: An Improved K-Means Algorithm Using Hesitant Fuzzy Sets and Multi-objective Optimization. Arab J Sci Eng 45, 6241–6257 (2020). https://doi.org/10.1007/s13369-020-04620-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-020-04620-5