Abstract
This paper describes a large-scale data clustering algorithm which is a combination of Balanced Iterative Reducing and Clustering using Hierarchies Algorithm (BIRCH) and Artificial Immune Network Clustering Algorithm (aiNet). Compared with traditional clustering algorithms, aiNet can better adapt to non-convex datasets and does not require a given number of clusters. But it is not suitable for handling large-scale datasets for it needs a long time to evolve. Besides, the aiNet model is very sensitive to noise, which greatly restricts its application. Contrary to aiNet, BIRCH can better process large-scale datasets but cannot deal with non-convex datasets like traditional clustering algorithms, and requires the cluster number. By combining these two methods, a new large-scale data clustering algorithm is obtained which inherits the advantages and overcomes the disadvantages of BIRCH and aiNet simultaneously.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Statis. 28(1), 100–108 (1979)
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33(9), 1455–1465 (2004)
Das, S., Abraham, A., Konar, A.: Automatic kernel clustering with a Multi-Elitist particle swarm optimization algorithm. Pattern Recogn. Lett.-PRL 29(5), 688–699 (2008)
Handl, J., Knowles, J.D.: Multi-objective clustering and cluster validation. In: Jin, Y. (ed.) Multi-Objective Machine Learning, vol. 16. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-33019-4_2
Fred, A.L.N., Leitao, Y.M.N.: Partitional vs hierarchical clustering using a minimum grammar complexity approach. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition. SSPR/SPR 2000, vol. 1876. Springer, Heidelberg, pp. 193–202 (2000). https://doi.org/10.1007/3-540-44522-6_20
Nanni, M., Pedreschi, D.: Time-Focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27(3), 267–289 (2006)
Girolami, M.: Mercer kernel-based clustering in feature space. IEEE Trans. Neural Netw. 13(3), 780–784 (2002)
Ng, A.Y,, Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Neural Information Processing Systems, pp. 849–856 (2001)
Martínez, A.M, Kak, A.C.: PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell.–PAMI 23(2), 228–233 (2009)
de Castro, L.N., Von, Z.F.J.: aiNet: an artificial immune network for data analysis. In: Data Mining: A Heuristic Approach, pp. 231–259 (2001)
Timmis, J., Neal, M.: A Resource Limited Artificial Immune System for Data Analysis. Research and Development in Intelligent Systems XVII, pp. 19–32, December 2000
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, Canada, pp. 103–114 (1996)
Greensmith, J., Aickelin, U., Cayzer, S.: Introducing dendritic cells as a novel immune-inspired algorithm for anomaly detection. In: The 4th International Conference on Artificial Immune Systems (ICARIS 2005), Banff, Alberta, Canada (2005)
Richard, O.D.: Sequential k-Means clustering (2008). http://www.cs.princeton.edu/courses/archive/fall08/cos436/Duda/C/sk_means.html
Richard, O.D., Peter, E.H., David, G.S.: Pattern Classification, 2nd edn. China Machine Press, Beijing (2004)
Barbakh, W., Fyfe, C.: Online clustering algorithms. Int. J. Neural Syst. 18(3), 185–194 (2008)
Havens, T.C., Bezdek, J.C., Leckie, C., et al.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20(6), 1130–1146 (2012)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Acknowledgment
This work was supported by the National Natural Science Foundation of China under Grant 61772399, Grant U170126, Grant 61773304, Grant 61672405 and Grant 61772-400, the Program for Cheung Kong Scholars and Innovative Research Team in University Grant IRT_15R53, the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) Grant B07048, and the Major Research Plan of the National Natural Science Foundation of China Grant 91438201.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Li, Y., Liu, G., Li, P., Jiao, L. (2018). A Large-Scale Data Clustering Algorithm Based on BIRCH and Artificial Immune Network. In: Tan, Y., Shi, Y., Tang, Q. (eds) Advances in Swarm Intelligence. ICSI 2018. Lecture Notes in Computer Science(), vol 10941. Springer, Cham. https://doi.org/10.1007/978-3-319-93815-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-93815-8_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93814-1
Online ISBN: 978-3-319-93815-8
eBook Packages: Computer ScienceComputer Science (R0)