Skip to main content
Log in

K-centroid link: a novel hierarchical clustering linkage method

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In hierarchical clustering, the most important factor is the selection of the linkage method which is the decision of how the distances between clusters will be calculated. It extremely affects not only the clustering quality but also the efficiency of the algorithm. However, the traditional linkage methods do not consider the effect of the objects around cluster centers. Based on this motivation, in this article, we propose a novel linkage method, named k-centroid link, in order to provide a better solution than the traditional linkage methods. In the proposed k-centroid link method, the dissimilarity between two clusters is mainly defined as the average distance between all pairs of k data objects in each cluster, which are the k closest ones to the centroid of each cluster. In the experimental studies, the proposed method was tested on 24 different publicly available benchmark datasets. The results demonstrate that by hierarchical clustering via the k-centroid link method, it is possible to obtain better performance in terms of clustering quality compared to the conventional linkage methods such as single link, complete link, average link, mean link, centroid link, and the Ward method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Murtagh F, Contreras P (2017) Algorithms for hierarchical clustering: an overview II. WIREs Data Mining and Knowledge Discovery 7(6):1–16. https://doi.org/10.1002/widm.1219

    Article  Google Scholar 

  2. Eustace J, Wang X, Cui Y (2015) Overlapping community detection using neighborhood ratio matrix. Physica A 421:510–521. https://doi.org/10.1016/j.physa.2014.11.039

    Article  Google Scholar 

  3. Eustace J, Wang X, Cui Y (2015) Community detection using local neighborhood in complex networks. Physica A 436:665–677. https://doi.org/10.1016/j.physa.2015.05.044

    Article  MathSciNet  MATH  Google Scholar 

  4. Cui Y, Wang X, Eustace J (2014) Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks. Physica A 416:198–207. https://doi.org/10.1016/j.physa.2014.08.050

    Article  MATH  Google Scholar 

  5. Wang X, Qin X (2016) Asymmetric intimacy and algorithm for detecting communities in bipartite networks. Physica A 462:569–578. https://doi.org/10.1016/j.physa.2016.06.096

    Article  Google Scholar 

  6. Wang X, Li J (2013) Detecting communities by the core-vertex and intimate degree in complex networks. Physica A 392:2555–2563. https://doi.org/10.1016/j.physa.2013.01.039

    Article  Google Scholar 

  7. He L, Agard B, Trepanier M (2020) A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method. Transportmetrica A: Transport Science 16(1):56–75. https://doi.org/10.1080/23249935.2018.1479722

    Article  Google Scholar 

  8. Ashton JJ, Borca F, Mossotto E, Phan HTT, Ennis S, Beattie RM (2020) Analysis and hierarchical clustering of blood results before diagnosis in pediatric inflammatory bowel disease. Inflamm Bowel Dis 26(3):469–475. https://doi.org/10.1093/ibd/izy369

    Google Scholar 

  9. Senthilnath J, Shreyas PB, Rajendra R, Suresh S, Kulkarni S, Benediktsson JA (2019) Hierarchical clustering approaches for flood assessment using multi-sensor satellite images. International Journal of Image and Data Fusion 10(1):28–44. https://doi.org/10.1080/19479832.2018.1513956

    Article  Google Scholar 

  10. Unglert K, Radic V, Jellinek AM (2016) Principal component analysis vs. self-organizing maps combined with hierarchical clustering for pattern recognition in volcano seismic spectra. J Volcanol Geotherm Res 320:58–74. https://doi.org/10.1016/j.jvolgeores.2016.04.014

    Article  Google Scholar 

  11. Maleki S, Bingham C (2019) Robust hierarchical clustering for novelty identification in sensor networks: with applications to industrial systems. Applied Soft Computing Journal 85:1–9. https://doi.org/10.1016/j.asoc.2019.105771

    Article  Google Scholar 

  12. Saravanan D (2016) Information retrieval using hierarchical clustering algorithm. International Journal of Pharmacy and Technology 8(4):22793–22803

    MathSciNet  Google Scholar 

  13. Shi P, Zhao Z, Zhong H, Shen H, Ding L (2020) An improved agglomerative hierarchical clustering anomaly detection method for scientific data. Concurrency Computation e6077:1–16. https://doi.org/10.1002/cpe.6077

    Google Scholar 

  14. Bibi M, Aziz W, Almaraashi M, Khan I, Nadeem M, Habib N (2020) A cooperative binary-clustering framework based on majority voting for twitter sentiment analysis. IEEE Access 8:68580–68592. https://doi.org/10.1109/ACCESS.2020.2983859

    Article  Google Scholar 

  15. Ren G, Wang X (2014) Epidemic spreading in time-varying community networks. Chaos 24:1–6. https://doi.org/10.1063/1.4876436

    Article  MathSciNet  MATH  Google Scholar 

  16. Cui Y, Wang X (2016) Detecting one-mode communities in bipartite networks by bipartite clustering triangular. Physica A 457:307–315. https://doi.org/10.1016/j.physa.2016.03.002

    Article  Google Scholar 

  17. Cui Y, Wang X (2014) Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks. Physica A 407:7–14. https://doi.org/10.1016/j.physa.2014.03.077

    Article  Google Scholar 

  18. Patnaik AK, Bhuyan PK, Rao KVK (2016) Divisive analysis (DIANA) of hierarchical clustering and GPS data for level of service criteria of urban streets. Alexandria Engineering Journal 55(1):407–418. https://doi.org/10.1016/j.aej.2015.11.003

    Article  Google Scholar 

  19. Nietto PR, Nicoletti MDC (2017) Case studies in divisive hierarchical clustering. Int J Innov Comput Appl 8(2):102–112. https://doi.org/10.1016/j.aej.2015.11.003

    Article  Google Scholar 

  20. Roux M (2018) A comparative study of divisive and agglomerative hierarchical clustering algorithms. J Classif 35(2):345–366. https://doi.org/10.1007/s00357-018-9259-9

    Article  MathSciNet  MATH  Google Scholar 

  21. Vatsalan D, Christen P, Rahm E (2020) Incremental clustering techniques for multi-party privacy-preserving record linkage. Data & Knowledge Engineering 128:1–19. https://doi.org/10.1016/j.datak.2020.101809

    Article  Google Scholar 

  22. Farinelli A, Bicego M, Ramchurn S, Zucchelli M (2013) C-link: a hierarchical clustering approach to large-scale near-optimal coalition formation. In: 23rd international joint conference on artificial intelligence. Beijing, China, pp 106–112

  23. Tang CH, Tsai MF, Chuang SH, Cheng JJ, Wang WJ (2014) Shortest-linkage-based parallel hierarchical clustering on main-belt moving objects of the solar system. Futur Gener Comput Syst 34:26–46. https://doi.org/10.1016/j.future.2013.12.029

    Article  Google Scholar 

  24. Cena A, Gagolowski A (2020) Genie+OWA: robustifying hierarchical clustering with OWA-based linkages. Inf Sci 520:324–336. https://doi.org/10.1016/j.ins.2020.02.025

    Article  Google Scholar 

  25. Zhang P, She K (2020) A novel hierarchical clustering approach based on universal gravitation. Math Probl Eng 2020:1–15. https://doi.org/10.1155/2020/6748056

    Google Scholar 

  26. Fernandez A, Gomez S (2019) Versatile linkage: a family of space-conserving strategies for agglomerative hierarchical clustering. Journal of Classification 2019:1–14. https://doi.org/10.1007/s00357-019-09339-z

    Google Scholar 

  27. Witten IH, Frank E, Hall MA, Pal CJ (2016) Data mining: practical machine learning tools and techniques. MA, USA

  28. Elghazel H, Aussem A (2013) Unsupervised feature selection with ensemble learning. Mach Learn 98:157–180. https://doi.org/10.1007/s10994-013-5337-8

    Article  MathSciNet  MATH  Google Scholar 

  29. Ren M, Karimi HA (2013) Adaptive road candidates search algorithm for map matching by clustering road segments. The Journal of Navigation 66:435–447. https://doi.org/10.1017/S0373463313000076

    Article  Google Scholar 

  30. Jeon Y, Yoo J, Lee J, Yoon S (2017) NC-link: a new linkage method for efficient hierarchical clustering of large-scale data. IEEE Access 2017(5):5594–5608. https://doi.org/10.1109/ACCESS.2017.2690987

    Google Scholar 

  31. Lall U, Sharma A (1996) A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour Res 32(3):679–693

    Article  Google Scholar 

  32. Oh S (2011) A new dataset evaluation method based on category overlap. Comput Biol Med 41 (2):115–122. https://doi.org/10.1016/j.compbiomed.2010.12.006

    Article  Google Scholar 

  33. Wang H, Yang Y, Liu B, Fujita H (2019) A study of graph-based system for multi-view clustering. Knowl-Based Syst 163:1009–1019. https://doi.org/10.1016/j.knosys.2018.10.022

    Article  Google Scholar 

  34. Zhang X, Yang Y, Li T, Zhang Y, Wang H, Fujita H (2021) CMC: a consensus multi-view clustering model for predicting Alzheimer’s disease progression. Computer Methods and Programs in Biomedicine 199:1–13. https://doi.org/10.1016/j.cmpb.2020.105895

    Article  Google Scholar 

  35. Himabindu G, Kumar CH, Hemanand CH, Krishna N (2021) Hybrid clustering algorithm to process big data using firefly optimization mechanism. Materials Today. (in press) https://doi.org/10.1016/j.matpr.2020.10.273

  36. Tekerek A, Dörterler M (2021) The adaptation of gray wolf optimizer to data clustering. Journal of Polytechnic. (in press)

  37. Csenki A, Neagu D, Torgunov D, Micic N (2020) Proximity curves for potential-based clustering. J Classif 37:671–695. https://doi.org/10.1007/s00357-019-09348-y

    Article  MATH  Google Scholar 

  38. Brown D (2020) Fast clustering using a grid-based underlying density function approximation. Master Thesis at Kennesaw State University. 31. https://digitalcommons.kennesaw.edu/cs_etd/31

  39. Dagde R, Radke D, Lokhande A (2019) A clustering approach using PSO optimization technique for data mining. In: 6th international conference on computing for sustainable global development. New Delhi, India, pp 427–431

  40. Aparna K (2019) Evolutionary computing based hybrid bisecting clustering algorithm for multidimensional data. Sadhana 44(2):45. https://doi.org/10.1007/s12046-018-1011-y

    Article  Google Scholar 

  41. Imani M, Kim Y, Worley T, Gupta S, Rosing T (2019) HDCluster: an accurate clustering using brain-inspired high-dimensional computing. In: 2019 design, automation & test in europe conference & exhibition (DATE). Florence, Italy. https://doi.org/10.23919/DATE.2019.8715147, pp 1591–1594

  42. Chander S, Vijaya P, Dhyani P (2018) ADOFL: multi-kernel-based Adaptive directive operative fractional lion optimisation algorithm for data clustering. J Intell Syst 27(3):317–329. https://doi.org/10.1515/jisys-2016-017528

    Google Scholar 

  43. Tiwana K, Saleema J (2017) Comparitive study on the performance of various clustering approaches. Int J Adv Res Comput Sci 8(3):491–494

    Google Scholar 

  44. Nasir M, Budiman I (2017) Perbandingan pengaruh nilai centroid awal pada algoritma K-Means dan K-Means++ terhadap hasil cluster menggunakan metode confusion matrix. Seminar Nasional Ilmu Komputer (SOLITER) 1:118–127

    Google Scholar 

  45. Angelov P, Gu X, Gutierrez G, Iglesias J, Sanchis A (2016) Autonomous data density based clustering method. In: 2016 international joint conference on neural networks (IJCNN). Vancouver, BC, Canada. https://doi.org/10.1109/IJCNN.2016.7727498, pp 2405–2413

  46. Chu S, Deng Y, Tu L (2015) K-means algorithm based on fitting function. In: International conference on applied science and engineering innovation. Jinan, China. https://doi.org/10.2991/asei-15.2015.383, pp 1940–1945

  47. Wang C, Fang H, Kim S, Moormann A, Wang H (2015) A new integrated fuzzifier evaluation and selection (NIFEs) algorithm for fuzzy clustering. Journal of Applied Mathematics and Physics 3:802–807. https://doi.org/10.4236/jamp.2015.37098

    Article  Google Scholar 

  48. Gard S, Trivedi P (2014) Fuzzy k-mean clustering in mapReduce on cloud based hadoop. In: IEEE international conference on advanced communications, control and computing technologies. Ramanathapuram, India. https://doi.org/10.1109/ICACCCT.2014.7019379, pp 1607–1610

  49. Saini G, Kaur H (2014) A novel approach towards K-Mean clustering algorithm with PSO. International Journal of Computer Science and Information Technologies 5(4):5978–5986

    Google Scholar 

  50. Li X, Hu W, Shen C, Dick A, Zhang Z (2014) Context-aware hypergraph construction for robust spectral clustering. IEEE Trans Knowl Data Eng 26(10):2588–2597. https://doi.org/10.1109/TKDE.2013.126

    Article  Google Scholar 

  51. Park H, Lee J, Jun C (2014) Clustering noise-included data by controlling decision errors. Ann Oper Res 216:129–144. https://doi.org/10.1007/s10479-012-1238-7

    Article  MathSciNet  MATH  Google Scholar 

  52. Dogdas T, Akyokus S (2013) Document clustering using GIS visualizing and EM clustering method. In: IEEE international symposium on innovations in intelligent systems and applications. Albena, Bulgaria. https://doi.org/10.1109/INISTA.2013.6577647, pp 1–4

  53. Askari B, Hashemi S, Yektaei M (2013) Detection of outliers and reduction of their undesirable effects for improving the accuracy of K-means clustering algorithm. International Journal of Computer Applications Technology and Research 2(5):552–556. https://doi.org/10.7753/IJCATR0205.1009

    Article  Google Scholar 

  54. Elbatta M, Ashour W (2013) A dynamic method for discovering density varied clusters. International Journal of Signal Processing, Image Processing and Pattern Recognition 6(1):123–134

    Google Scholar 

  55. Bishnu P, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-Means clustering algorithm. IEEE Trans Knowl Data Eng 24(6):1146–1150. https://doi.org/10.1109/TKDE.2011.163

    Article  Google Scholar 

  56. Elbatta M, Bolbol R, Ashour W (2012) A vibration method for discovering density varied clusters. International Scholarly Research Network 2012, Article ID 723516. https://doi.org/10.5402/2012/723516

  57. Elkourd A, Ashourd W (2011) A modified DBSCAN clustering algorithm. Computing & Information Systems 15(2)

  58. Wang J, Su X (2011) An improved K-means clustering algorithm. In: 3rd international conference on communication software and networks. Xi’an, China. https://doi.org/10.1109/ICCSN.2011.6014384, pp 44–46

  59. Yedla M, Pathakota S, Srinivasa T (2010) Enhancing K-means clustering algorithm with improved initial center. International Journal of Computer Science and Information Technologies 1(2):121–125

    Google Scholar 

  60. Sowjanya M, Shashi M (2010) Cluster feature-based incremental clustering approach (CFICA) for numerical data. International Journal of Computer Science and Network Security 10(9):1875–1880

    Google Scholar 

  61. Nazeer K, Sebastian M (2009) Improving the accuracy and efficiency of the k-means clustering algorithm. In: Proceedings of the world congress on engineering 2009. London, UK, pp 308–312

  62. Xiang W (2009) A gravity-base objects’ weight clustering algorithm. In: International conference on computational intelligence and software engineering. Wuhan, China. https://doi.org/10.1109/CISE.2009.5364783, pp 1–6

  63. Xue J, Liu X (2014) Acute inflammations analysis by P system with floor membrane structure. Frontier and Future Development of Information Technology in Medicine and Education 269:281–291. https://doi.org/10.1007/978-94-007-7618-0_28

    Article  Google Scholar 

  64. Sabo K (2014) Center–based L1–clustering method. Frontier and Future International Journal of Applied Mathematics and Computer Science 24(1):151–163. https://doi.org/10.2478/amcs-2014-0012

    MATH  Google Scholar 

  65. Chongstitvatana J, Thubtimdang W (2011) Clustering by attraction and distraction. In: Eighth international joint conference on computer science and software engineering. Nakhonpathom, Thailand. https://doi.org/10.1109/JCSSE.2011.5930149, pp 368–372

  66. Eustace J, Wang X, Li J (2014) Approximating web communities using subspace decomposition. Knowl-Based Syst 70:118–127. https://doi.org/10.1016/j.knosys.2014.06.017

    Article  Google Scholar 

  67. Li J, Wang X, Cui Y (2014) Uncovering the overlapping community structure of complex networks by maximal cliques. Physica A 415:398–406. https://doi.org/10.1016/j.physa.2014.08.025

    Article  MathSciNet  MATH  Google Scholar 

  68. Cui Y, Wang X, Li J (2014) Detecting overlapping communities in networks using the maximal sub-graph and the clustering coefficient. Physica A 405:85–91. https://doi.org/10.1016/j.physa.2014.03.027

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alican Dogan.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dogan, A., Birant, D. K-centroid link: a novel hierarchical clustering linkage method. Appl Intell 52, 5537–5560 (2022). https://doi.org/10.1007/s10489-021-02624-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02624-8

Keywords

Navigation