Abstract
To conquer the shortcoming that general clustering methods cannot process big data in the main memory, this paper presents an effective multi-level synchronization clustering (MLSynC) method by using a framework of “divide and collect” and a linear weighted Vicsek model. We also introduce two concrete implementations of MLSynC method, a two-level framework algorithm and a recursive algorithm. MLSynC method has a different process with SynC algorithm, ESynC algorithm and SSynC algorithm. By the theoretic analysis, we find the time complexity of MLSynC method is less than SSynC. Simulation and experimental study on multi-kinds of data sets validate that MLSynC method not only gets better local synchronization effect but also needs less iterative times and time cost than SynC algorithm. Moreover, we observe that MLSynC method not only needs less time cost than ESynC and SSynC, but also almost gets the same local synchronization effect as ESynC and SSynC if the partition of the data set is proper. Further comparison experiments with some classical clustering algorithms demonstrate the clustering effect of MLSynC method.
Similar content being viewed by others
References
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5-th MSP. University of California Press, Berkeley, pp 281–297
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Bouguettaya A, Yu Q, Liu X et al (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Guha S, Rastogi R, Shim K (1998) CURE: an efficient clustering algorithm for clustering large databases. In: Proceedings of ACM SIGMOD, pp 73–84
Karypis G, Han EH, Kumar V (1999) CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Comput 32(8):68–75
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD, pp 103–114
Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of ACM SIGMOD, pp 49–60
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial data sets with noise. In: Proceedings of ACM SIGKDD, pp 226–231
Agrawal R, Gehrke J, Gunopolos D et al (1998) Automatic subspace clustering of high dimensional data for data mining application. In: Proceedings of ACM SIGMOD, pp 94–105
Wang W, Yang J, Muntz R (1997) STING: a statistical information grid approach to spatial data mining. In: Proceedings of VLDB, pp 186–195
Theodoridis S, Koutroumbas K (2006) Pattern recognition. Academic, New York
Tan PN, Steinbach M, Kumar V (2005) Introduction to data mining. Addison Wesley, Boston
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput C-20(1):68–86
Horn D, Gottlieb A (2002) Algorithm for data clustering in pattern recognition problems based on quantum mechanics. Phys Rev Lett 88(1):018702
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Proceedings of NIPS, pp 849–856
Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13:780–784
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(16):972–976
Böhm C, Plant C, Shao J et al (2010) Clustering by synchronization. In: Proceedings of ACM SIGKDD, Washington, USA, pp 583–592
Shao J, Yang Q, Böhm C, Plant C (2011) Detection of arbitrarily oriented synchronized clusters in high-dimensional data. In: Proceedings of ICDM, pp 607–616
Shao J, He X, Plant C, Yang Q, Böhm C (2013a) Robust synchronization-based graph clustering. In: Proceedings of PAKDD, pp 249–260
Shao J, He X, Böhm C, Yang Q, Plant C (2013b) Synchronization inspired partitioning and hierarchical clustering. IEEE Trans Knowl Data Eng 25(4):893–905
Chen X (2014) A fast synchronization clustering algorithm. arXiv:1407.7449 [cs.LG]. http://arxiv.org/abs/1407.7449
Chen X (2017) An effective synchronization clustering algorithm. Appl Intell 46(1):135–157
Chen X (2018) Fast synchronization clustering algorithms based on spatial index structures. Expert Syst Appl 94:276–290
Hang W, Choi K, Wang S (2017) Synchronization clustering based on central force optimization and its extension for large-scale datasets. Knowl-Based Syst 118:31–44
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Vicsek T, Czirok A, Ben-Jacob E et al (1995) Novel type of phase transitions in a system of self-driven particles. Phys Rev Lett 75(6):1226–1229
Jadbabaie A, Lin J, Morse AS (2003) Coordination of groups of mobile autonomous agents using nearest neighbor rules. IEEE Trans Autom Control 48(6):998–1001
Wang L, Liu Z (2009) Robust consensus of multi-agent systems with noise. Sci China Ser F: Inform Sci 52(5):824–834
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
GrÄunwald P (2005) A tutorial introduction to the minimum description length principle. MIT Press, Cambridge
Von der Malsburg C (1973) Self-organization of orientation sensitive cells in the striate cortex. Kybernetik 14:85–100
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Kohonen T (1989) On the significance of internal representations in neural networks. In: Proceedings of ICANN, pp 158–162
Grossberg S (1976a) Adaptive pattern classification and universal recoding: I. Parallel development and coding of neural feature detectors. Biol Cybern 23(3):121–134
Grossberg S (1976b) Adaptive pattern classification and universal recoding: II. Feedback, expectation, olfaction, illusions. Biol Cybern 23(4):187–202
Du KL (2010) Clustering: a neural network approach. Neural Netw 23(1):89–107
Brito da Silva LE, Elnabarawy I, Wunsch DC II (2019) A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw 120:167–203
Amis GP, Carpenter GA (2010) Self-supervised ARTMAP. Neural Netw 2:265–282
Seiffertt J (2019) Adaptive resonance theory in the time scales calculus. Neural Netw 120:32–39
Bradley PS, Fayyad UM, Reina C et al (1998) Scaling clustering algorithms to large databases. In: Proceedings of ACM SIGKDD, pp 9–15
Sculley D (2010) Web-scale k-means clustering. In: Proceedings of WWW, pp 1177–1178
Urruty T, Djeraba C, Simovici DA (2007) Clustering by random projections. In: Proceedings of ICDM, pp 107–119
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of ICML, pp 186–193
Milenova BL, Campos MM (2002) O-cluster: scalable clustering of large high dimensional data sets. In: Proceedings of ICDM, pp 290–297
Rathore P, Kumar D, Bezdek JC, Rajasegarar S, Palaniswami M (2019) A rapid hybrid clustering algorithm for large volumes of high dimensional data. IEEE Trans Knowl Data Eng 31(4):641–654
Chen X (2015) A new clustering algorithm based on near neighbor influence. Expert Syst Appl 42(21):7746–7758
Dua D, Graff C (2019) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine. http://archive.ics.uci.edu/ml
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
Strehl A, Ghosh J (2002) Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal 24(5):603–619
Slonim N, Atwal GS, Tkacik G, Bialek W (2005) Information-based clustering. Proc Natl Acad Sci U S A 102(51):18297–18302
Acknowledgments
This work was supported by the projects from Natural Science Research in Colleges and Universities of Anhui Province of China (grant number: KJ2019ZD15, KJ2019A0158), the University Synergy Innovation Program of Anhui Province (grant number: GXXT-2019-002), Anhui Polytechnic University (grant number: 2018YQQ031), Chongqing Cutting-edge and Applied Foundation Research Program (grant number: cstc2016jcyjA0521), Chongqing Three Gorges University (grant number: 16PY08) and National Natural Science Foundation of China (grant number: 61976005). The authors thank the editors and the anonymous reviewers for their useful suggestions that help us to improve this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(DOCX 1.64 mb)
Rights and permissions
About this article
Cite this article
Chen, X., Qiu, Y. An effective multi-level synchronization clustering method based on a linear weighted Vicsek model. Appl Intell 50, 4063–4080 (2020). https://doi.org/10.1007/s10489-020-01767-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01767-4