A Large-Scale Data Clustering Algorithm Based on BIRCH and Artificial Immune Network

Li, Yangyang; Liu, Guangyuan; Li, Peidao; Jiao, Licheng

doi:10.1007/978-3-319-93815-8_32

Yangyang Li¹⁶,
Guangyuan Liu¹⁶,
Peidao Li¹⁶ &
…
Licheng Jiao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10941))

Included in the following conference series:

International Conference on Swarm Intelligence

1615 Accesses

Abstract

This paper describes a large-scale data clustering algorithm which is a combination of Balanced Iterative Reducing and Clustering using Hierarchies Algorithm (BIRCH) and Artificial Immune Network Clustering Algorithm (aiNet). Compared with traditional clustering algorithms, aiNet can better adapt to non-convex datasets and does not require a given number of clusters. But it is not suitable for handling large-scale datasets for it needs a long time to evolve. Besides, the aiNet model is very sensitive to noise, which greatly restricts its application. Contrary to aiNet, BIRCH can better process large-scale datasets but cannot deal with non-convex datasets like traditional clustering algorithms, and requires the cluster number. By combining these two methods, a new large-scale data clustering algorithm is obtained which inherits the advantages and overcomes the disadvantages of BIRCH and aiNet simultaneously.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.ics.uci.edu/~mlearn/MLRepository.html.

References

Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Statis. 28(1), 100–108 (1979)
Article Google Scholar
Dunn, J.C.: A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J. Cybern. 3(3), 32–57 (1974)
Article MathSciNet Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Book Google Scholar
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recogn. 33(9), 1455–1465 (2004)
Article Google Scholar
Das, S., Abraham, A., Konar, A.: Automatic kernel clustering with a Multi-Elitist particle swarm optimization algorithm. Pattern Recogn. Lett.-PRL 29(5), 688–699 (2008)
Article Google Scholar
Handl, J., Knowles, J.D.: Multi-objective clustering and cluster validation. In: Jin, Y. (ed.) Multi-Objective Machine Learning, vol. 16. Springer, Heidelberg (2006). https://doi.org/10.1007/3-540-33019-4_2
Fred, A.L.N., Leitao, Y.M.N.: Partitional vs hierarchical clustering using a minimum grammar complexity approach. In: Ferri, F.J., Iñesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition. SSPR/SPR 2000, vol. 1876. Springer, Heidelberg, pp. 193–202 (2000). https://doi.org/10.1007/3-540-44522-6_20
Chapter Google Scholar
Nanni, M., Pedreschi, D.: Time-Focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27(3), 267–289 (2006)
Article Google Scholar
Girolami, M.: Mercer kernel-based clustering in feature space. IEEE Trans. Neural Netw. 13(3), 780–784 (2002)
Article Google Scholar
Ng, A.Y,, Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Neural Information Processing Systems, pp. 849–856 (2001)
Google Scholar
Martínez, A.M, Kak, A.C.: PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell.–PAMI 23(2), 228–233 (2009)
Article Google Scholar
de Castro, L.N., Von, Z.F.J.: aiNet: an artificial immune network for data analysis. In: Data Mining: A Heuristic Approach, pp. 231–259 (2001)
Google Scholar
Timmis, J., Neal, M.: A Resource Limited Artificial Immune System for Data Analysis. Research and Development in Intelligent Systems XVII, pp. 19–32, December 2000
Chapter Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD Conference, Montreal, Canada, pp. 103–114 (1996)
Article Google Scholar
Greensmith, J., Aickelin, U., Cayzer, S.: Introducing dendritic cells as a novel immune-inspired algorithm for anomaly detection. In: The 4th International Conference on Artificial Immune Systems (ICARIS 2005), Banff, Alberta, Canada (2005)
Google Scholar
Richard, O.D.: Sequential k-Means clustering (2008). http://www.cs.princeton.edu/courses/archive/fall08/cos436/Duda/C/sk_means.html
Richard, O.D., Peter, E.H., David, G.S.: Pattern Classification, 2nd edn. China Machine Press, Beijing (2004)
MATH Google Scholar
Barbakh, W., Fyfe, C.: Online clustering algorithms. Int. J. Neural Syst. 18(3), 185–194 (2008)
Article Google Scholar
Havens, T.C., Bezdek, J.C., Leckie, C., et al.: Fuzzy c-means algorithms for very large data. IEEE Trans. Fuzzy Syst. 20(6), 1130–1146 (2012)
Article Google Scholar
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Trans. Evol. Comput. 11(1), 56–76 (2007)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grant 61772399, Grant U170126, Grant 61773304, Grant 61672405 and Grant 61772-400, the Program for Cheung Kong Scholars and Innovative Research Team in University Grant IRT_15R53, the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) Grant B07048, and the Major Research Plan of the National Natural Science Foundation of China Grant 91438201.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education of China, International Research Center for Intelligent Perception and Computation, School of Artificial Intelligence, Xidian University, Xi’an, 710071, China
Yangyang Li, Guangyuan Liu, Peidao Li & Licheng Jiao

Authors

Yangyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peidao Li
View author publications
You can also search for this author in PubMed Google Scholar
Licheng Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yangyang Li .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi
Tongji University, Shanghai, China
Qirong Tang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Liu, G., Li, P., Jiao, L. (2018). A Large-Scale Data Clustering Algorithm Based on BIRCH and Artificial Immune Network. In: Tan, Y., Shi, Y., Tang, Q. (eds) Advances in Swarm Intelligence. ICSI 2018. Lecture Notes in Computer Science(), vol 10941. Springer, Cham. https://doi.org/10.1007/978-3-319-93815-8_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-93815-8_32
Published: 16 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93814-1
Online ISBN: 978-3-319-93815-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics