Abstract
Clustering is a typical unsupervised data analysis method, which divides a given data set without label information into multiple clusters. The data on each cluster has a great deal of association, which can be used as the preprocessing stage of other algorithms or for further association analysis. Therefore, clustering plays an important role in a wide range of fields. Chameleon is a clustering algorithm that combines the relative interconnectivity and relative closeness to find clusters of arbitrary shape with high quality. However, the graph-partitioning technology hMETIS algorithm used in the algorithm is difficult to operate and easy to cause uncertainty of results. In addition, the final number of clusters need to be specified by user as a parameter to stop merging, which is difficult to determine without prior information. Aiming at these shortcomings, Chameleon algorithm based on mutual k-nearest neighbors (MChameleon) is proposed. Firstly, the idea of mutual k-nearest neighbors is introduced to directly generate sub-clusters, which omits the process of partitioning graph. Then, the concept of MC modularity is introduced, which is used to objectively identify the final clustering results. By experiments on artificial data sets and UCI data sets, we compared MChameleon with the original Chameleon algorithm, the improved AChameleon algorithm and the classic K-Means, DBSCAN, BIRCH algorithm in accuracy. Experimental results on data sets show that Chameleon algorithm based on mutual k-nearest neighbors has great advantages and is feasible.
Similar content being viewed by others
References
Zanin M, Papo D, Sousa PA et al (2016) Combining complex networks and data mining: why and how. Phys Rep 635:1–44
Du M, Ding S, Jia H (2016) Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl-Based Syst 99:135–145
Nguyen HL, Woon YK, Ng WK (2015) A survey on data stream clustering and classification. Knowl Inf Syst 45(3):535–569
Xu X, Ding S, Xu H et al (2018) A feasible density peaks clustering algorithm with a merging strategy. Soft Comput 23(13):5171–5183
Khanmohammadi S, Adibeig N, Shanehbandy S (2017) An improved overlapping k-means clustering method for medical applications. Expert Syst Appl 67:12–18
Yu Z, Li L, Liu J, Zhang J, Han G (2015) Adaptive noise immune cluster ensemble using affinity propagation. IEEE Trans Knowl Data Eng 27(12):3176–3189
Morris K, McNicholas PD (2016) Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures. Comput Stat and Data An 97:133–150
Huang D, Wang CD, Lai JH (2018) Locally weighted ensemble clustering. IEEE T Cybernetics 48(5):1460–1473
Han J, Micheline K (2006) Data mining: concepts and techniques. data mining concepts models methods & algorithms second edition, 5(4), pp 1–18
Fan SY, Ding SF, Xue Y (2018) Self-adaptive kernel K-means algorithm based on the shuffled frog leaping algorithm. Soft Comput 22(3):861–872
Galan SF (2019) Comparative evaluation of region query strategies for DBSCAN clustering. Inf Sci 502:76–90
Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE T Ind Inform 13(4):1620–1628
Gorricha J, Lobo V (2012) Improvements on the visualization of clusters in geo-referenced data using self-organizing maps. Comput Geosci 43:177–186
Ros F, Guillaume S (2019) A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise. Expert Syst Appl 128:96–108
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75
Xue W, Liu P, Liu D (2012) Improved Chameleon algorithm using weighted nearest neighbors graph. Journal of Computer Applications 10:208–211
Karypis G, Aggarwal R, Kumar V, Shekhar S (1999) Multilevel hypergraph partitioning: applications in VLSI domain. IEEE T VLSI Syst 7(1):69–79
Guo D, Zhao J, Liu J (2019) Research and application of improved CHAMELEON algorithm based on condensed hierarchical clustering method. In: Proceedings of the 2019 8th international conference on networks, communication and computing. Association for Computing Machinery, Luoyang, pp 14–18
Zhang W, Li J (2015) Extended fast search clustering algorithm: widely density clusters, no density peaks. Comput SciInf Technol 5(7):1–17
Barton T, Bruna T, Kordik P (2019) Chameleon 2: an improved graph-based clustering algorithm. ACM Trans Knowl Discov Data 13(1):1–27
Wang L, Dai G, Zhao H (2010) Research on modularity for evaluating community structure. Comput Eng 36(14):227–229
Garruzzo S, Rosaci D (2008) Agent clustering based on semantic negotiation. ACM T Auton Adap Sys 3(2):1–40
Fan J, Jia P, Ge L (2019) Mk-NNG-DPC: density peaks clustering based on improved mutual K-nearest-neighbor graph. Int J Mach Learn Cybern 11(6):1179–1195
Liu H, Zhang S (2012) Noisy data elimination using mutual k-nearest neighbor for classification mining. J Syst Softw 84(5):1067–1074
Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Blondel VD, Guillaume JL, Lambiotte R et al (2008) Fast unfolding of communities in large networks. J Stat Mech-Theory E:10008
Kong B, Zhou L, Liu W (2012) Improved modularity based on Girvan-Newman modularity. In: 2012 second international conference on intelligent system design and Engineering application. IEEE, Sanya, pp 293–296
Xu X, Ding S, Shi Z (2018) An improved density peaks clustering algorithm with fast finding cluster centers. Knowl-Based Syst 158:65–74
Xu TS, Chiang HD, Liu GY, Tan CW (2017) Hierarchical K-means method for clustering large-scale advanced metering infrastructure data. IEEE TPower Deliver 32(2):609–616
Madan S, Dana KJ (2016) Modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) for visual clustering. Pattern Anal Appl 19(4):1023–1040
Acknowledgements
This work is supported by the National Natural Science Foundations of China (no.61672522, and no.61976216).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declared that we have no conflicts of interest to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Y., Ding, S., Wang, L. et al. Chameleon algorithm based on mutual k-nearest neighbors. Appl Intell 51, 2031–2044 (2021). https://doi.org/10.1007/s10489-020-01926-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01926-7