Abstract
In hierarchical clustering, the most important factor is the selection of the linkage method which is the decision of how the distances between clusters will be calculated. It extremely affects not only the clustering quality but also the efficiency of the algorithm. However, the traditional linkage methods do not consider the effect of the objects around cluster centers. Based on this motivation, in this article, we propose a novel linkage method, named k-centroid link, in order to provide a better solution than the traditional linkage methods. In the proposed k-centroid link method, the dissimilarity between two clusters is mainly defined as the average distance between all pairs of k data objects in each cluster, which are the k closest ones to the centroid of each cluster. In the experimental studies, the proposed method was tested on 24 different publicly available benchmark datasets. The results demonstrate that by hierarchical clustering via the k-centroid link method, it is possible to obtain better performance in terms of clustering quality compared to the conventional linkage methods such as single link, complete link, average link, mean link, centroid link, and the Ward method.
Similar content being viewed by others
References
Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview II. WIREs Data Mining and Knowledge Discovery 7(6):1–16. https://doi.org/10.1002/widm.1219
Eustace J, Wang X, Cui Y (2015) Overlapping community detection using neighborhood ratio matrix. Physica A 421:510–521. https://doi.org/10.1016/j.physa.2014.11.039
Eustace J, Wang X, Cui Y (2015) Community detection using local neighborhood in complex networks. Physica A 436:665–677. https://doi.org/10.1016/j.physa.2015.05.044
Cui Y, Wang X, Eustace J (2014) Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks. Physica A 416:198–207. https://doi.org/10.1016/j.physa.2014.08.050
Wang X, Qin X (2016) Asymmetric intimacy and algorithm for detecting communities in bipartite networks. Physica A 462:569–578. https://doi.org/10.1016/j.physa.2016.06.096
Wang X, Li J (2013) Detecting communities by the core-vertex and intimate degree in complex networks. Physica A 392:2555–2563. https://doi.org/10.1016/j.physa.2013.01.039
He L, Agard B, Trepanier M (2020) A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method. Transportmetrica A: Transport Science 16(1):56–75. https://doi.org/10.1080/23249935.2018.1479722
Ashton JJ, Borca F, Mossotto E, Phan HTT, Ennis S, Beattie RM (2020) Analysis and hierarchical clustering of blood results before diagnosis in pediatric inflammatory bowel disease. Inflamm Bowel Dis 26(3):469–475. https://doi.org/10.1093/ibd/izy369
Senthilnath J, Shreyas PB, Rajendra R, Suresh S, Kulkarni S, Benediktsson JA (2019) Hierarchical clustering approaches for flood assessment using multi-sensor satellite images. International Journal of Image and Data Fusion 10(1):28–44. https://doi.org/10.1080/19479832.2018.1513956
Unglert K, Radic V, Jellinek AM (2016) Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra. J Volcanol Geotherm Res 320:58–74. https://doi.org/10.1016/j.jvolgeores.2016.04.014
Maleki S, Bingham C (2019) Robust hierarchical clustering for novelty identification in sensor networks: with applications to industrial systems. Applied Soft Computing Journal 85:1–9. https://doi.org/10.1016/j.asoc.2019.105771
Saravanan D (2016) Information retrieval using hierarchical clustering algorithm. International Journal of Pharmacy and Technology 8(4):22793–22803
Shi P, Zhao Z, Zhong H, Shen H, Ding L (2020) An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurrency Computation e6077:1–16. https://doi.org/10.1002/cpe.6077
Bibi M, Aziz W, Almaraashi M, Khan I, Nadeem M, Habib N (2020) A cooperative binary-clustering framework based on majority voting for twitter sentiment analysis. IEEE Access 8:68580–68592. https://doi.org/10.1109/ACCESS.2020.2983859
Ren G, Wang X (2014) Epidemic spreading in time-varying community networks. Chaos 24:1–6. https://doi.org/10.1063/1.4876436
Cui Y, Wang X (2016) Detecting one-mode communities in bipartite networks by bipartite clustering triangular. Physica A 457:307–315. https://doi.org/10.1016/j.physa.2016.03.002
Cui Y, Wang X (2014) Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks. Physica A 407:7–14. https://doi.org/10.1016/j.physa.2014.03.077
Patnaik AK, Bhuyan PK, Rao KVK (2016) Divisive analysis (DIANA) of hierarchical clustering and GPS data for level of service criteria of urban streets. Alexandria Engineering Journal 55(1):407–418. https://doi.org/10.1016/j.aej.2015.11.003
Nietto PR, Nicoletti MDC (2017) Case studies in divisive hierarchical clustering. Int J Innov Comput Appl 8(2):102–112. https://doi.org/10.1016/j.aej.2015.11.003
Roux M (2018) A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif 35(2):345–366. https://doi.org/10.1007/s00357-018-9259-9
Vatsalan D, Christen P, Rahm E (2020) Incremental clustering techniques for multi-party privacy-preserving record linkage. Data & Knowledge Engineering 128:1–19. https://doi.org/10.1016/j.datak.2020.101809
Farinelli A, Bicego M, Ramchurn S, Zucchelli M (2013) C-link: a hierarchical clustering approach to large-scale near-optimal coalition formation. In: 23rd international joint conference on artificial intelligence. Beijing, China, pp 106–112
Tang CH, Tsai MF, Chuang SH, Cheng JJ, Wang WJ (2014) Shortest-linkage-based parallel hierarchical clustering on main-belt moving objects of the solar system. Futur Gener Comput Syst 34:26–46. https://doi.org/10.1016/j.future.2013.12.029
Cena A, Gagolowski A (2020) Genie+OWA: robustifying hierarchical clustering with OWA-based linkages. Inf Sci 520:324–336. https://doi.org/10.1016/j.ins.2020.02.025
Zhang P, She K (2020) A novel hierarchical clustering approach based on universal gravitation. Math Probl Eng 2020:1–15. https://doi.org/10.1155/2020/6748056
Fernandez A, Gomez S (2019) Versatile linkage: a family of space-conserving strategies for agglomerative hierarchical clustering. Journal of Classification 2019:1–14. https://doi.org/10.1007/s00357-019-09339-z
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. MA, USA
Elghazel H, Aussem A (2013) Unsupervised feature selection with ensemble learning. Mach Learn 98:157–180. https://doi.org/10.1007/s10994-013-5337-8
Ren M, Karimi HA (2013) Adaptive road candidates search algorithm for map matching by clustering road segments. The Journal of Navigation 66:435–447. https://doi.org/10.1017/S0373463313000076
Jeon Y, Yoo J, Lee J, Yoon S (2017) NC-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 2017(5):5594–5608. https://doi.org/10.1109/ACCESS.2017.2690987
Lall U, Sharma A (1996) A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour Res 32(3):679–693
Oh S (2011) A new dataset evaluation method based on category overlap. Comput Biol Med 41 (2):115–122. https://doi.org/10.1016/j.compbiomed.2010.12.006
Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019. https://doi.org/10.1016/j.knosys.2018.10.022
Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) CMC: a consensus multi-view clustering model for predicting Alzheimer’s disease progression. Computer Methods and Programs in Biomedicine 199:1–13. https://doi.org/10.1016/j.cmpb.2020.105895
Himabindu G, Kumar CH, Hemanand CH, Krishna N (2021) Hybrid clustering algorithm to process big data using firefly optimization mechanism. Materials Today. (in press) https://doi.org/10.1016/j.matpr.2020.10.273
Tekerek A, Dörterler M (2021) The adaptation of gray wolf optimizer to data clustering. Journal of Polytechnic. (in press)
Csenki A, Neagu D, Torgunov D, Micic N (2020) Proximity curves for potential-based clustering. J Classif 37:671–695. https://doi.org/10.1007/s00357-019-09348-y
Brown D (2020) Fast clustering using a grid-based underlying density function approximation. Master Thesis at Kennesaw State University. 31. https://digitalcommons.kennesaw.edu/cs_etd/31
Dagde R, Radke D, Lokhande A (2019) A clustering approach using PSO optimization technique for data mining. In: 6th international conference on computing for sustainable global development. New Delhi, India, pp 427–431
Aparna K (2019) Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data. Sadhana 44(2):45. https://doi.org/10.1007/s12046-018-1011-y
Imani M, Kim Y, Worley T, Gupta S, Rosing T (2019) HDCluster: an accurate clustering using brain-inspired high-dimensional computing. In: 2019 design, automation & test in europe conference & exhibition (DATE). Florence, Italy. https://doi.org/10.23919/DATE.2019.8715147, pp 1591–1594
Chander S, Vijaya P, Dhyani P (2018) ADOFL: multi-kernel-based Adaptive directive operative fractional lion optimisation algorithm for data clustering. J Intell Syst 27(3):317–329. https://doi.org/10.1515/jisys-2016-017528
Tiwana K, Saleema J (2017) Comparitive study on the performance of various clustering approaches. Int J Adv Res Comput Sci 8(3):491–494
Nasir M, Budiman I (2017) Perbandingan pengaruh nilai centroid awal pada algoritma K-Means dan K-Means++ terhadap hasil cluster menggunakan metode confusion matrix. Seminar Nasional Ilmu Komputer (SOLITER) 1:118–127
Angelov P, Gu X, Gutierrez G, Iglesias J, Sanchis A (2016) Autonomous data density based clustering method. In: 2016 international joint conference on neural networks (IJCNN). Vancouver, BC, Canada. https://doi.org/10.1109/IJCNN.2016.7727498, pp 2405–2413
Chu S, Deng Y, Tu L (2015) K-means algorithm based on fitting function. In: International conference on applied science and engineering innovation. Jinan, China. https://doi.org/10.2991/asei-15.2015.383, pp 1940–1945
Wang C, Fang H, Kim S, Moormann A, Wang H (2015) A new integrated fuzzifier evaluation and selection (NIFEs) algorithm for fuzzy clustering. Journal of Applied Mathematics and Physics 3:802–807. https://doi.org/10.4236/jamp.2015.37098
Gard S, Trivedi P (2014) Fuzzy k-mean clustering in mapReduce on cloud based hadoop. In: IEEE international conference on advanced communications, control and computing technologies. Ramanathapuram, India. https://doi.org/10.1109/ICACCCT.2014.7019379, pp 1607–1610
Saini G, Kaur H (2014) A novel approach towards K-Mean clustering algorithm with PSO. International Journal of Computer Science and Information Technologies 5(4):5978–5986
Li X, Hu W, Shen C, Dick A, Zhang Z (2014) Context-aware hypergraph construction for robust spectral clustering. IEEE Trans Knowl Data Eng 26(10):2588–2597. https://doi.org/10.1109/TKDE.2013.126
Park H, Lee J, Jun C (2014) Clustering noise-included data by controlling decision errors. Ann Oper Res 216:129–144. https://doi.org/10.1007/s10479-012-1238-7
Dogdas T, Akyokus S (2013) Document clustering using GIS visualizing and EM clustering method. In: IEEE international symposium on innovations in intelligent systems and applications. Albena, Bulgaria. https://doi.org/10.1109/INISTA.2013.6577647, pp 1–4
Askari B, Hashemi S, Yektaei M (2013) Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. International Journal of Computer Applications Technology and Research 2(5):552–556. https://doi.org/10.7753/IJCATR0205.1009
Elbatta M, Ashour W (2013) A dynamic method for discovering density varied clusters. International Journal of Signal Processing, Image Processing and Pattern Recognition 6(1):123–134
Bishnu P, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-Means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150. https://doi.org/10.1109/TKDE.2011.163
Elbatta M, Bolbol R, Ashour W (2012) A vibration method for discovering density varied clusters. International Scholarly Research Network 2012, Article ID 723516. https://doi.org/10.5402/2012/723516
Elkourd A, Ashourd W (2011) A modified DBSCAN clustering algorithm. Computing & Information Systems 15(2)
Wang J, Su X (2011) An improved K-means clustering algorithm. In: 3rd international conference on communication software and networks. Xi’an, China. https://doi.org/10.1109/ICCSN.2011.6014384, pp 44–46
Yedla M, Pathakota S, Srinivasa T (2010) Enhancing K-means clustering algorithm with improved initial center. International Journal of Computer Science and Information Technologies 1(2):121–125
Sowjanya M, Shashi M (2010) Cluster feature-based incremental clustering approach (CFICA) for numerical data. International Journal of Computer Science and Network Security 10(9):1875–1880
Nazeer K, Sebastian M (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering 2009. London, UK, pp 308–312
Xiang W (2009) A gravity-base objects’ weight clustering algorithm. In: International conference on computational intelligence and software engineering. Wuhan, China. https://doi.org/10.1109/CISE.2009.5364783, pp 1–6
Xue J, Liu X (2014) Acute inflammations analysis by P system with floor membrane structure. Frontier and Future Development of Information Technology in Medicine and Education 269:281–291. https://doi.org/10.1007/978-94-007-7618-0_28
Sabo K (2014) Center–based L1–clustering method. Frontier and Future International Journal of Applied Mathematics and Computer Science 24(1):151–163. https://doi.org/10.2478/amcs-2014-0012
Chongstitvatana J, Thubtimdang W (2011) Clustering by attraction and distraction. In: Eighth international joint conference on computer science and software engineering. Nakhonpathom, Thailand. https://doi.org/10.1109/JCSSE.2011.5930149, pp 368–372
Eustace J, Wang X, Li J (2014) Approximating web communities using subspace decomposition. Knowl-Based Syst 70:118–127. https://doi.org/10.1016/j.knosys.2014.06.017
Li J, Wang X, Cui Y (2014) Uncovering the overlapping community structure of complex networks by maximal cliques. Physica A 415:398–406. https://doi.org/10.1016/j.physa.2014.08.025
Cui Y, Wang X, Li J (2014) Detecting overlapping communities in networks using the maximal sub-graph and the clustering coefficient. Physica A 405:85–91. https://doi.org/10.1016/j.physa.2014.03.027
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dogan, A., Birant, D. K-centroid link: a novel hierarchical clustering linkage method. Appl Intell 52, 5537–5560 (2022). https://doi.org/10.1007/s10489-021-02624-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02624-8