K-Means algorithm based on multi-feature-induced order

Wan, Benting; Huang, Weikang; Pierre, Bilivogui; Cheng, Youyu; Zhou, Shufen

doi:10.1007/s41066-024-00470-w

K-Means algorithm based on multi-feature-induced order

ORIGINAL PAPER
Published: 09 April 2024

Volume 9, article number 45, (2024)
Cite this article

Granular Computing Aims and scope Submit manuscript

Benting Wan¹,
Weikang Huang¹,
Bilivogui Pierre²,
Youyu Cheng¹ &
…
Shufen Zhou¹

51 Accesses
Explore all metrics

Abstract

The K-Means algorithm is a powerful tool for data analysis, but it faces several challenges when dealing with large multi-feature data. Centroid initialization and centroid determination are two significant hurdles that can reduce the performance of the K-Means algorithm. To address these challenges, based on partial-order relations, an enhanced K-Means algorithm, the multi-feature induced order K-Means algorithm (OWAK-Means) is developed which combines with a novel centroid initialization based on partial-order relations and a multi-feature induced ordered weighted average (MFIOWA) operator. By using a weighted iteration method based on partial-order relations, the OWAK-Means algorithm initializes centroids with greater precision. The MFIOWA operator is designed based on database indexing theory and the Sigmoid weight function that improves its information filtering ability. These techniques, combined with an ordered weighted distance metric and the MFIOWA operator, make the OWAK-Means algorithm an effective tool for multi-feature data analysis. In comparative analysis with the variants of the K-Means algorithm, the OWAK-Means algorithm has significant improvement in the adjusted rand score, normalized mutual information, and purity. Statistical tests, comprehensive evaluation methods, and sensitivity analysis prove that the OWAK-Means algorithm is effective and reliable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Data availability

Data will be made available on request.

References

Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in groundwater table modeling by automated predictive deep learning. Nat Resour Res 31:1351–1373. https://doi.org/10.1007/s11053-022-10051-w
Article Google Scholar
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Asheghi R, Hosseini SA, Saneie M, Shahri AA (2020) Updating the neural network sediment load models using different sensitivity analysis methods: a regional application. J Hydroinf 22:562–577. https://doi.org/10.2166/hydro.2020.098
Article Google Scholar
Askari S (2021) Noise-resistant fuzzy clustering algorithm. Granul Comput 6:815–828. https://doi.org/10.1007/s41066-020-00230-6
Article Google Scholar
Ay M, Özbakır L, Kulluk S, Gülmez B, Öztürk G, Özer S (2023) FC-Kmeans: fixed-centered K-means algorithm. Expert Syst Appl 211:118656. https://doi.org/10.1016/j.eswa.2022.118656
Article Google Scholar
Breiman L, Friedman J, Stone CJ (1984) Classification and regression trees. CRC Press, Boca Raton
Google Scholar
Chan EY, Ching WK, Ng MK, Huang JZ (2004) An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recogn 37:943–952. https://doi.org/10.1016/j.patcog.2003.11.003
Article Google Scholar
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recogn 45:434–446. https://doi.org/10.1016/j.patcog.2011.06.004
Article Google Scholar
Chen Y, Li W, Gao F, Wen Q, Zhang H, Wang H (2022) Practical attribute-based multi-keyword ranked search scheme in cloud computing. IEEE Trans Serv Comput 15:724–735. https://doi.org/10.1109/TSC.2019.2959306
Article Google Scholar
Cheng C-H, Wang J-W, Wu M-C (2009) OWA-weighted based clustering method for classification problem. Expert Syst Appl 36:4988–4995. https://doi.org/10.1016/j.eswa.2008.06.013
Article Google Scholar
Chiclana F, Herrera-Viedma E, Herrera F, Alonso S (2007) Some induced ordered weighted averaging operators and their use for solving group decision-making problems based on fuzzy preference relations. Eur J Oper Res 182:383–399. https://doi.org/10.1016/j.ejor.2006.08.032
Article Google Scholar
De Amorim RC (2016) A survey on feature weighting based k-means algorithms. J Classif 33:210–242. https://doi.org/10.1007/s00357-016-9208-4
Article MathSciNet Google Scholar
De Amorim R, Mirkin B (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recogn 45:1061–1075. https://doi.org/10.1016/j.patcog.2011.08.012
Article Google Scholar
Dombi J, Jónás T (2022) Generalizing the sigmoid function using continuous-valued logic. Fuzzy Sets Syst 449:79–99. https://doi.org/10.1016/j.fss.2022.02.010
Article MathSciNet Google Scholar
Frigui H, Nasraoui O (2004) Unsupervised learning of prototypes and attribute weights. Pattern Recogn 37:567–581. https://doi.org/10.1016/j.patcog.2003.08.002
Article Google Scholar
Fu Q, Li Y, Albathan M (2023) A supervised method to enhance distance-based neural network clustering performance by discovering perfect representative neurons. Granul Comput 8:1051–1065. https://doi.org/10.1007/s41066-023-00370-5
Article Google Scholar
Goicovich I, Olivares P, Román C, Román C, Vázquez A, Poupon C, Mangin J, Guevara P, Hernández C (2021) Fiber clustering acceleration with a modified kmeans++ algorithm using data parallelism. Front Neuroinf 15:727859. https://doi.org/10.3389/fninf.2021.727859
Article Google Scholar
Hashemzadeh M, Golzari Oskouei A, Farajzadeh N (2019) New fuzzy C-means clustering method based on feature-weight and cluster-weight learning. Appl Soft Comput 78:324–345. https://doi.org/10.1016/j.asoc.2019.02.038
Article Google Scholar
Huang YF, Chen JM (2000) The study of indexing techniques on object oriented databases. Inf Sci 130:109–131. https://doi.org/10.1016/S0020-0255(00)00088-8
Article Google Scholar
Huang JZ, Ng MK, Rong H, Li Z (2005) Automated variable weighting in k-means type clustering. IEEE Trans Pattern Anal Mach Intell 27:657–668. https://doi.org/10.1109/TPAMI.2005.95
Article Google Scholar
Huang X, Yang X, Zhao J, Xiong L, Ye Y (2018) A new weighting k-means type clustering framework with an l2-norm regularization. Knowl-Based Syst 151:165–179. https://doi.org/10.1016/j.knosys.2018.03.028
Article Google Scholar
Huang W, Peng Y, Ge Y, Kong W (2021) A new Kmeans clustering model and its generalization achieved by joint spectral embedding and rotation. PeerJ Comput Sci 7:e450. https://doi.org/10.7717/peerj-cs.450
Article Google Scholar
Ji C, Lu X, Zhang W (2021) Development of new operators for expert opinions aggregation: average-induced ordered weighted averaging operators. Int J Intell Syst 36:997–1014. https://doi.org/10.1002/int.22328
Article Google Scholar
Jing L, Ng MK, Huang JZ (2007) An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans Knowl Data Eng 19:1026–1041. https://doi.org/10.1109/TKDE.2007.1048
Article Google Scholar
Khan IK, Luo Z, Huang JZ, Shahzad W (2019) Variable weighting in fuzzy k-means clustering to determine the number of clusters. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2019.2911582
Article Google Scholar
Le K-NT, Nguyenthihong D, Vovan T (2023) Fuzzy cluster analysis algorithm for image data based on the extracted feature intervals. Granul Comput 8:2067–2081. https://doi.org/10.1007/s41066-023-00420-y
Article Google Scholar
Li Y, Wu H (2012) A clustering method based on k-means algorithm. Phys Procedia 25:1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
Article Google Scholar
Liang Y, Li Y, Zhang K, Ma L (2021) DMSE: dynamic multi-keyword search encryption based on inverted index. J Syst Architect 119:102255. https://doi.org/10.1016/j.sysarc.2021.102255
Article Google Scholar
Ma F-M, Guo Y-J (2011) Density-induced ordered weighted averaging operators. Int J Intell Syst 26:866–886. https://doi.org/10.1002/int.20500
Article Google Scholar
Ma J, Xia D, Wang Y, Niu X, Jiang S, Liu Z, Guo H (2022) A comprehensive comparison among metaheuristics (MHs) for geohazard modeling using machine learning: insights from a case study of landslide displacement prediction. Eng Appl Artif Intell 114:105150. https://doi.org/10.1016/j.engappai.2022.105150
Article Google Scholar
MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, no 14, pp 281–297
Makarenkov V, Legendre P (2001) Optimal variable weighting for ultrametric and additive trees and k-means partitioning: methods and software. J Classif 18:245–271. https://doi.org/10.1007/s00357-001-0018-x
Article MathSciNet Google Scholar
Marques JPPG, Cunha DC, Harada LMF, Silva LN, Silva ID (2021) A cost-effective trilateration-based radio localization algorithm using machine learning and sequential least-square programming optimization. Comput Commun 177:1–9. https://doi.org/10.1016/j.comcom.2021.06.005
Article Google Scholar
Mawati R, Sumertajaya IM, Afendi F (2014) Modified centroid selection method of k-means clustering. IOSR J Math 10:49–53. https://doi.org/10.9790/5728-10234953
Article Google Scholar
Modha DS, Spangler WS (2003) Feature Weighting in k-Means clustering. Mach Learn 52:217–237. https://doi.org/10.1023/A:1024016609528
Article Google Scholar
Naik DL, Kiran R (2021) A novel sensitivity-based method for feature selection. J Big Data 8:128. https://doi.org/10.1186/s40537-021-00515-w
Article Google Scholar
Nainggolan R, Perangin-angin R, Simarmata E, Tarigan AF (2019) Improved the performance of the k-means cluster using the sum of squared error (SSE) optimized by using the Elbow method. J Phys Conf Ser 1361:012015. https://doi.org/10.1088/1742-6596/1361/1/012015
Article Google Scholar
O'Hagan M (1988). Aggregating template or rule antecedents in real-time expert systems with fuzzy set logic. In: Twenty-second asilomar conference on signals, systems and computers, IEEE, vol 2, pp 681–689. https://doi.org/10.1109/ACSSC.1988.754637
Peng D, Gui Z, Wang D, Ma Y, Huang Z, Zhou Y, Wu H (2022) Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 13:5455. https://doi.org/10.1038/s41467-022-33136-9
Article Google Scholar
Pons-Vives PJ, Morro-Ribot M, Mulet-Forteza C, Valero O (2022) An application of ordered weighted averaging operators to customer classification in hotels. Mathematics 10:1987. https://doi.org/10.3390/math10121987
Article Google Scholar
Rashidi R, Khamforoosh K, Sheikhahmadi A (2020) An analytic approach to separate users by introducing new combinations of initial centers of clustering. Physica A 551:124185. https://doi.org/10.1016/j.physa.2020.124185
Article Google Scholar
Savita KN, Siwch A (2024) Fuzzy clustering based on distance metric under intuitionistic fuzzy environment. Granul Comput 9:20. https://doi.org/10.1007/s41066-023-00446-2
Article Google Scholar
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of the 19th international conference on world wide web. association for computing machinery, New York, NY, USA, pp 1177–1178. https://doi.org/10.1145/1772690.1772862
Singh S, Singh K (2023) Novel fuzzy similarity measures and their applications in pattern recognition and clustering analysis. Granul Comput 8:1715–1737. https://doi.org/10.1007/s41066-023-00393-y
Article Google Scholar
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Steinbach2000ACO. https://api.semanticscholar.org/CorpusID:12808608
Sun S, Duan L, Xu Z, Zhang J (2021) Blind deblurring based on sigmoid function. Sensors 21:3484. https://doi.org/10.3390/s21103484
Article Google Scholar
Wang S, Li T, Luo C, Fujita H (2016) Efficient updating rough approximations with multi-dimensional variation of ordered data. Inf Sci 372:690–708. https://doi.org/10.1016/j.ins.2016.08.044
Article Google Scholar
Xu Z, Da Q (2003) Approaches to obtaining the weights of the ordered weighted aggregation operators. J Southeast Univ 33(1):94–96
MathSciNet Google Scholar
Xu S, Li X, Xie C, Chen H, Chen C, Song Z (2021) A high-precision implementation of the sigmoid activation function for computing-in-memory architecture. Micromachines 12:1183. https://doi.org/10.3390/mi12101183
Article Google Scholar
Yager RR (1988) On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Trans Syst Man Cybern. https://doi.org/10.1016/B978-1-4832-1450-4.50011-0
Article MathSciNet Google Scholar
Yager RR (1993) Families of OWA operators. Fuzzy Sets Syst 59:125–148. https://doi.org/10.1016/0165-0114(93)90194-M
Article MathSciNet Google Scholar
Yager RR, Filev DP (1994) Parameterized and-uke and or-like owa operators. Int J Gen Syst 22(3):297–316. https://doi.org/10.1080/03081079408935212
Article Google Scholar
Yager RR, Filev DP (1999) Induced ordered weighted averaging operators. IEEE Trans Syst Man Cybern B 29:141–150. https://doi.org/10.1109/3477.752789
Article Google Scholar
Yi P, Li W, Guo Y, Zhang D (2018) Quantile induced heavy ordered weighted averaging operators and its application in incentive decision making. Int J Intell Syst 33:514–528. https://doi.org/10.1002/int.21945
Article Google Scholar
Ying Y, Zhang N, Shan P, Miao L, Sun P, Peng S (2021) PSigmoid: improving squeeze-and-excitation block with parametric sigmoid. Appl Intell 51:7427–7439. https://doi.org/10.1007/s10489-021-02247-z
Article Google Scholar
Zhang P (2019) A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model. Appl Soft Comput 85:105859. https://doi.org/10.1016/j.asoc.2019.105859
Article Google Scholar
Zhou X, Gu J, Shen S, Ma H, Miao F, Zhang H, Gong H (2017) An automatic k-means clustering algorithm of GPS data combining a novel niche genetic algorithm with noise and density. ISPRS Int J Geo Inf 6:392. https://doi.org/10.3390/ijgi6120392
Article Google Scholar

Download references

Acknowledgements

This work is supported in part by the Humanities and Social Science Planning Project of the Ministry of Education under Grant 22YJA880051, the Science and Technology Project of Jiangxi Provincial Education Department under Grant GJJ2200535, and the 18th Student Research Project of Jiangxi University of Finance and Economics under Grant 20231016140352946.

Author information

Authors and Affiliations

Department of Software and IoT, Jiangxi University of Finance and Economics, Nanchang, 330013, China
Benting Wan, Weikang Huang, Youyu Cheng & Shufen Zhou
School of Finance, Zhongnan University of Economics and Law, Wuhan, 430073, China
Bilivogui Pierre

Authors

Benting Wan
View author publications
You can also search for this author in PubMed Google Scholar
Weikang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Bilivogui Pierre
View author publications
You can also search for this author in PubMed Google Scholar
Youyu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Shufen Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Benting Wan: Conception and design of study, Acquisition of data, Analysis and/or interpretation of data, Writing—original draft, Writing—review & editing. Weikang Huang: Analysis and/or interpretation of data, Writing—original draft, Writing— review & editing. Bilivogui Pierre: Analysis and/or interpretation of data, Writing—review & editing. Youyu Cheng: Writing—review & editing. Shufen Zhou: Analysis and/or interpretation of data. Writing—review & editing.

Corresponding author

Correspondence to Benting Wan.

Ethics declarations

Conflict of interest

All authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wan, B., Huang, W., Pierre, B. et al. K-Means algorithm based on multi-feature-induced order. Granul. Comput. 9, 45 (2024). https://doi.org/10.1007/s41066-024-00470-w

Download citation

Received: 23 December 2023
Accepted: 05 March 2024
Published: 09 April 2024
DOI: https://doi.org/10.1007/s41066-024-00470-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-Means algorithm based on multi-feature-induced order

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

K-Means algorithm based on multi-feature-induced order

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation