Abstract
AIDCOR is an artificial immunity inspired density based clustering algorithm which is able to identify crisp clusters with high degree of accuracy where the input dataset presented can have varied shaped clusters, varied density distribution, low inter cluster separation and with noise/outliers also. The algorithm works out into two phases, a data preprocessing module and a clustering module. The initial data processing part of AIDCOR is artificial immune system inspired and uses a novel approach of somatic hypermutation and affinity maturation with selective antigenic binding to reduce data redundancy while preserving the original data patterns. The actual data clustering part pursues a density based approach which forms clusters with the compressed data set and doing so it inherently identifies outliers also. We have thoroughly analyzed both theoretical aspects and experimental results of the proposed algorithm with wide variety of real and synthetic data set. The results of AIDCOR are compared with several current state of art algorithms where we found that it is giving much higher clustering accuracy for nearly all type of dataset. The time complexity of AIDCOR is coming to be sub quadratic when some indexing data structure is used for nearest neighbor search and quadratic otherwise. AIDCOR needs 3 user defined parameters for its operation. A heuristic method is also proposed to automatically determine those parameters.
Similar content being viewed by others
References
Guojun G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications. ASA-SIAM series on statistics and applied probability, SIAM, Philadelphia, ASA, Alexandria, VA, 2007
de Castro LN, Zuben FJV (2001) AiNet: an artificial immune network for data analysis. Idea Group Publishing, USA, pp 231–259
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability 1. University of California Press, 1967, pp 281–297
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conf. on knowledge discovery and data mining, Portland, OR, AAAI Press, 1996, pp 226–231
Paul SK, Bhaumik P (2014) A density based clustering with Artificial Immunity inspired preprocessing. 2014 International conference on advances in computing, communications and informatics (ICACCI), IEEE, New Delhi, September 2014, pp 2648–2654
Graaff AJ, Engelbrecht AP (2007) A local network neighborhood artificial immune system for data clustering. 2007 IEEE Congress on Evolutionary Computation, Singapore, 25–28 Sept. 2007, pp 260–267
de Castro LN, Zuben FJV (2000) An evolutionary immune network for data clustering. IEEE SBRN, Rio de Janeiro, pp 84–89
Graaff AJ, Engelbrecht AP (2012) Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cybernet 3(1):1–26
Timmis J, Neal M (2001) A resource limited artificial immune system for data analysis. Knowl Based Syst 14(3–4):121–130
Burnet FM (1959) The clonal selection theory of acquired immunity. Cambridge University Press, Cambridge
Kepler TB, Perelson AS (1993) Somatic hypermutation in B cells: an optimal control treatment. J Theor Biol 164(1):37–64
Bezerra GB, Barra TV, de Castro LN, Von Zuben FJ (2005) Adaptive radius immune algorithm for data clustering. Artificial Immune Systems, Springer, Berlin, Heidelberg, 2005, pp 290–303
Younsi R, Wang W (2004) A new artificial immune system algorithm for clustering. In: Intelligent Data Engineering and Automated Learning—IDEAL, Springer, Berlin, Heidelberg, 2004, pp 58–64
De Castro LN, Von Zuben FJ (2002) Learning and optimization using the clonal selection principle. IEEE Trans Evolut Comput 6(3):239–251
Ahmad W, Narayanan A (2011) Population-based artificial immune system clustering algorithm. In: Artificial immune systems, Springer, Berlin, Heidelberg, 2011, pp 348–360
van der Merwe DW, Engelbrecht AP (2003) Data clustering using particle swarm optimization. The 2003 Congress on Evolutionary Computation, vol 1, 2003, pp 215–220
Tang R, Fong S, Yang X-S, Deb S (2012) Integrating nature-inspired optimization algorithms to K-means clustering. 2012 Seventh international conference on digital information management (ICDIM), Macau, 2012, pp 116–123
Folino G, Forestiero A, Spezzano G (2009) An adaptive flocking algorithm for performing approximate clustering. Inf Sci 179(18):3059–3078
Ankerst M, Breunig MM, Kriegel H-P, Sander J (1999) OPTICS: ordering points to identify the clustering structure. ACM SIGMOD international conference on management of data. ACM Press, 1999, pp 49–60
Liu P, Zhou D, Wu N (2007) VDBSCAN: varied density based spatial clustering of applications with noise. Service Systems and Service Management, 2007 International Conference on, Chengdu, 2007, pp 1–4
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Duan L, Xu L, Guo F, Lee J, Yan B (2007) A local-density based spatial clustering algorithm with noise. Inf Syst 32(7):978–986
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proc. ACM SIGMOD 2000 Int. Conf. On management of data, Dallas, TX, 2000, pp 93–104
Mai ST, He X, Feng J, Plant C, Bohm C (2015) Anytime density-based clustering of complex data. Knowl Inform Syst (KAIS) 45(2):319–355
Mai ST, He X, Hubig N, Plant C, Bohm C (2013) Active density-based clustering. 2013 IEEE 13th international conference on data mining (ICDM), IEEE, December 2013, pp 508–517
Gan J, Tao Y (2015) DBSCAN revisited: mis-claim, un-fixability, and approximation. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, May 2015, pp 519–530
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2001, pp 420–434
Beyer K, Goldstein J, Ramakrishnan R, Shaft U (1999) When is “Nearest Neighbor” meaningful? 7th International Conference Jerusalem, Israel, January 10–12 1999, pp 217–235
Labroche N, Monmarch N, Venturini G (2003) AntClust: ant clustering and web usage mining. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Springer Berlin Heidelberg, 2003, pp 25–36
Aliguliyev RM (2009) Performance evaluation of density-based clustering methods. Inform Sci 179(20):3583–3602
Tran TN, Drab K, Daszykowski M (2013) Revised DBSCAN algorithm to cluster data with dense adjacent clusters. Chemom Intell Lab Syst 120:92–96
Chang H, Yeung DY (2008) Robust path-based spectral clustering. Pattern Recogn 41(1):191–203
Zahn CT (1971) Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans Comput C-20(1):68–86
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Paul, S.K., Bhaumik, P. AIDCOR: artificial immunity inspired density based clustering with outlier removal. Int. J. Mach. Learn. & Cyber. 9, 309–334 (2018). https://doi.org/10.1007/s13042-016-0499-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-016-0499-x