Abstract
The K-Means algorithm is a powerful tool for data analysis, but it faces several challenges when dealing with large multi-feature data. Centroid initialization and centroid determination are two significant hurdles that can reduce the performance of the K-Means algorithm. To address these challenges, based on partial-order relations, an enhanced K-Means algorithm, the multi-feature induced order K-Means algorithm (OWAK-Means) is developed which combines with a novel centroid initialization based on partial-order relations and a multi-feature induced ordered weighted average (MFIOWA) operator. By using a weighted iteration method based on partial-order relations, the OWAK-Means algorithm initializes centroids with greater precision. The MFIOWA operator is designed based on database indexing theory and the Sigmoid weight function that improves its information filtering ability. These techniques, combined with an ordered weighted distance metric and the MFIOWA operator, make the OWAK-Means algorithm an effective tool for multi-feature data analysis. In comparative analysis with the variants of the K-Means algorithm, the OWAK-Means algorithm has significant improvement in the adjusted rand score, normalized mutual information, and purity. Statistical tests, comprehensive evaluation methods, and sensitivity analysis prove that the OWAK-Means algorithm is effective and reliable.
Similar content being viewed by others
Data availability
Data will be made available on request.
References
Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31:1351–1373. https://doi.org/10.1007/s11053-022-10051-w
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Asheghi R, Hosseini SA, Saneie M, Shahri AA (2020) Updating the neural network sediment load models using different sensitivity analysis methods: a regional application. J Hydroinf 22:562–577. https://doi.org/10.2166/hydro.2020.098
Askari S (2021) Noise-resistant fuzzy clustering algorithm. Granul Comput 6:815–828. https://doi.org/10.1007/s41066-020-00230-6
Ay M, Özbakır L, Kulluk S, Gülmez B, Öztürk G, Özer S (2023) FC-Kmeans: fixed-centered K-means algorithm. Expert Syst Appl 211:118656. https://doi.org/10.1016/j.eswa.2022.118656
Breiman L, Friedman J, Stone CJ (1984) Classification and regression trees. CRC Press, Boca Raton
Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn 37:943–952. https://doi.org/10.1016/j.patcog.2003.11.003
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn 45:434–446. https://doi.org/10.1016/j.patcog.2011.06.004
Chen Y, Li W, Gao F, Wen Q, Zhang H, Wang H (2022) Practical attribute-based multi-keyword ranked search scheme in cloud computing. IEEE Trans Serv Comput 15:724–735. https://doi.org/10.1109/TSC.2019.2959306
Cheng C-H, Wang J-W, Wu M-C (2009) OWA-weighted based clustering method for classification problem. Expert Syst Appl 36:4988–4995. https://doi.org/10.1016/j.eswa.2008.06.013
Chiclana F, Herrera-Viedma E, Herrera F, Alonso S (2007) Some induced ordered weighted averaging operators and their use for solving group decision-making problems based on fuzzy preference relations. Eur J Oper Res 182:383–399. https://doi.org/10.1016/j.ejor.2006.08.032
De Amorim RC (2016) A survey on feature weighting based k-means algorithms. J Classif 33:210–242. https://doi.org/10.1007/s00357-016-9208-4
De Amorim R, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recogn 45:1061–1075. https://doi.org/10.1016/j.patcog.2011.08.012
Dombi J, Jónás T (2022) Generalizing the sigmoid function using continuous-valued logic. Fuzzy Sets Syst 449:79–99. https://doi.org/10.1016/j.fss.2022.02.010
Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 37:567–581. https://doi.org/10.1016/j.patcog.2003.08.002
Fu Q, Li Y, Albathan M (2023) A supervised method to enhance distance-based neural network clustering performance by discovering perfect representative neurons. Granul Comput 8:1051–1065. https://doi.org/10.1007/s41066-023-00370-5
Goicovich I, Olivares P, Román C, Román C, Vázquez A, Poupon C, Mangin J, Guevara P, Hernández C (2021) Fiber clustering acceleration with a modified kmeans++ algorithm using data parallelism. Front Neuroinf 15:727859. https://doi.org/10.3389/fninf.2021.727859
Hashemzadeh M, Golzari Oskouei A, Farajzadeh N (2019) New fuzzy C-means clustering method based on feature-weight and cluster-weight learning. Appl Soft Comput 78:324–345. https://doi.org/10.1016/j.asoc.2019.02.038
Huang YF, Chen JM (2000) The study of indexing techniques on object oriented databases. Inf Sci 130:109–131. https://doi.org/10.1016/S0020-0255(00)00088-8
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668. https://doi.org/10.1109/TPAMI.2005.95
Huang X, Yang X, Zhao J, Xiong L, Ye Y (2018) A new weighting k-means type clustering framework with an l2-norm regularization. Knowl-Based Syst 151:165–179. https://doi.org/10.1016/j.knosys.2018.03.028
Huang W, Peng Y, Ge Y, Kong W (2021) A new Kmeans clustering model and its generalization achieved by joint spectral embedding and rotation. PeerJ Comput Sci 7:e450. https://doi.org/10.7717/peerj-cs.450
Ji C, Lu X, Zhang W (2021) Development of new operators for expert opinions aggregation: average-induced ordered weighted averaging operators. Int J Intell Syst 36:997–1014. https://doi.org/10.1002/int.22328
Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19:1026–1041. https://doi.org/10.1109/TKDE.2007.1048
Khan IK, Luo Z, Huang JZ, Shahzad W (2019) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2911582
Le K-NT, Nguyenthihong D, Vovan T (2023) Fuzzy cluster analysis algorithm for image data based on the extracted feature intervals. Granul Comput 8:2067–2081. https://doi.org/10.1007/s41066-023-00420-y
Li Y, Wu H (2012) A clustering method based on k-means algorithm. Phys Procedia 25:1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
Liang Y, Li Y, Zhang K, Ma L (2021) DMSE: dynamic multi-keyword search encryption based on inverted index. J Syst Architect 119:102255. https://doi.org/10.1016/j.sysarc.2021.102255
Ma F-M, Guo Y-J (2011) Density-induced ordered weighted averaging operators. Int J Intell Syst 26:866–886. https://doi.org/10.1002/int.20500
Ma J, Xia D, Wang Y, Niu X, Jiang S, Liu Z, Guo H (2022) A comprehensive comparison among metaheuristics (MHs) for geohazard modeling using machine learning: insights from a case study of landslide displacement prediction. Eng Appl Artif Intell 114:105150. https://doi.org/10.1016/j.engappai.2022.105150
MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, no 14, pp 281–297
Makarenkov V, Legendre P (2001) Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. J Classif 18:245–271. https://doi.org/10.1007/s00357-001-0018-x
Marques JPPG, Cunha DC, Harada LMF, Silva LN, Silva ID (2021) A cost-effective trilateration-based radio localization algorithm using machine learning and sequential least-square programming optimization. Comput Commun 177:1–9. https://doi.org/10.1016/j.comcom.2021.06.005
Mawati R, Sumertajaya IM, Afendi F (2014) Modified centroid selection method of k-means clustering. IOSR J Math 10:49–53. https://doi.org/10.9790/5728-10234953
Modha DS, Spangler WS (2003) Feature Weighting in k-Means clustering. Mach Learn 52:217–237. https://doi.org/10.1023/A:1024016609528
Naik DL, Kiran R (2021) A novel sensitivity-based method for feature selection. J Big Data 8:128. https://doi.org/10.1186/s40537-021-00515-w
Nainggolan R, Perangin-angin R, Simarmata E, Tarigan AF (2019) Improved the performance of the k-means cluster using the sum of squared error (SSE) optimized by using the Elbow method. J Phys Conf Ser 1361:012015. https://doi.org/10.1088/1742-6596/1361/1/012015
O'Hagan M (1988). Aggregating template or rule antecedents in real-time expert systems with fuzzy set logic. In: Twenty-second asilomar conference on signals, systems and computers, IEEE, vol 2, pp 681–689. https://doi.org/10.1109/ACSSC.1988.754637
Peng D, Gui Z, Wang D, Ma Y, Huang Z, Zhou Y, Wu H (2022) Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 13:5455. https://doi.org/10.1038/s41467-022-33136-9
Pons-Vives PJ, Morro-Ribot M, Mulet-Forteza C, Valero O (2022) An application of ordered weighted averaging operators to customer classification in hotels. Mathematics 10:1987. https://doi.org/10.3390/math10121987
Rashidi R, Khamforoosh K, Sheikhahmadi A (2020) An analytic approach to separate users by introducing new combinations of initial centers of clustering. Physica A 551:124185. https://doi.org/10.1016/j.physa.2020.124185
Savita KN, Siwch A (2024) Fuzzy clustering based on distance metric under intuitionistic fuzzy environment. Granul Comput 9:20. https://doi.org/10.1007/s41066-023-00446-2
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web. association for computing machinery, New York, NY, USA, pp 1177–1178. https://doi.org/10.1145/1772690.1772862
Singh S, Singh K (2023) Novel fuzzy similarity measures and their applications in pattern recognition and clustering analysis. Granul Comput 8:1715–1737. https://doi.org/10.1007/s41066-023-00393-y
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Steinbach2000ACO. https://api.semanticscholar.org/CorpusID:12808608
Sun S, Duan L, Xu Z, Zhang J (2021) Blind deblurring based on sigmoid function. Sensors 21:3484. https://doi.org/10.3390/s21103484
Wang S, Li T, Luo C, Fujita H (2016) Efficient updating rough approximations with multi-dimensional variation of ordered data. Inf Sci 372:690–708. https://doi.org/10.1016/j.ins.2016.08.044
Xu Z, Da Q (2003) Approaches to obtaining the weights of the ordered weighted aggregation operators. J Southeast Univ 33(1):94–96
Xu S, Li X, Xie C, Chen H, Chen C, Song Z (2021) A high-precision implementation of the sigmoid activation function for computing-in-memory architecture. Micromachines 12:1183. https://doi.org/10.3390/mi12101183
Yager RR (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybern. https://doi.org/10.1016/B978-1-4832-1450-4.50011-0
Yager RR (1993) Families of OWA operators. Fuzzy Sets Syst 59:125–148. https://doi.org/10.1016/0165-0114(93)90194-M
Yager RR, Filev DP (1994) Parameterized and-uke and or-like owa operators. Int J Gen Syst 22(3):297–316. https://doi.org/10.1080/03081079408935212
Yager RR, Filev DP (1999) Induced ordered weighted averaging operators. IEEE Trans Syst Man Cybern B 29:141–150. https://doi.org/10.1109/3477.752789
Yi P, Li W, Guo Y, Zhang D (2018) Quantile induced heavy ordered weighted averaging operators and its application in incentive decision making. Int J Intell Syst 33:514–528. https://doi.org/10.1002/int.21945
Ying Y, Zhang N, Shan P, Miao L, Sun P, Peng S (2021) PSigmoid: improving squeeze-and-excitation block with parametric sigmoid. Appl Intell 51:7427–7439. https://doi.org/10.1007/s10489-021-02247-z
Zhang P (2019) A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl Soft Comput 85:105859. https://doi.org/10.1016/j.asoc.2019.105859
Zhou X, Gu J, Shen S, Ma H, Miao F, Zhang H, Gong H (2017) An automatic k-means clustering algorithm of GPS data combining a novel niche genetic algorithm with noise and density. ISPRS Int J Geo Inf 6:392. https://doi.org/10.3390/ijgi6120392
Acknowledgements
This work is supported in part by the Humanities and Social Science Planning Project of the Ministry of Education under Grant 22YJA880051, the Science and Technology Project of Jiangxi Provincial Education Department under Grant GJJ2200535, and the 18th Student Research Project of Jiangxi University of Finance and Economics under Grant 20231016140352946.
Author information
Authors and Affiliations
Contributions
Benting Wan: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing—original draft, Writing—review & editing. Weikang Huang: Analysis and/or interpretation of data, Writing—original draft, Writing— review & editing. Bilivogui Pierre: Analysis and/or interpretation of data, Writing—review & editing. Youyu Cheng: Writing—review & editing. Shufen Zhou: Analysis and/or interpretation of data. Writing—review & editing.
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wan, B., Huang, W., Pierre, B. et al. K-Means algorithm based on multi-feature-induced order. Granul. Comput. 9, 45 (2024). https://doi.org/10.1007/s41066-024-00470-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41066-024-00470-w