Pattern Analysis and Applications

, Volume 20, Issue 1, pp 183–199 | Cite as

A fast and noise resilient cluster-based anomaly detection

  • Elnaz BigdeliEmail author
  • Mahdi Mohammadi
  • Bijan Raahemi
  • Stan Matwin
Theoretical Advances


Clustering, while systematically applied in anomaly detection, has a direct impact on the accuracy of the detection methods. Existing cluster-based anomaly detection methods are mainly based on spherical shape clustering. In this paper, we focus on arbitrary shape clustering methods to increase the accuracy of the anomaly detection. However, since the main drawback of arbitrary shape clustering is its high memory complexity, we propose to summarize clusters first. For this, we design an algorithm, called Summarization based on Gaussian Mixture Model (SGMM), to summarize clusters and represent them as Gaussian Mixture Models (GMMs). After GMMs are constructed, incoming new samples are presented to the GMMs, and their membership values are calculated, based on which the new samples are labeled as “normal” or “anomaly.” Additionally, to address the issue of noise in the data, instead of labeling samples individually, they are clustered first, and then each cluster is labeled collectively. For this, we present a new approach, called Collective Probabilistic Anomaly Detection (CPAD), in which, the distance of the incoming new samples and the existing SGMMs is calculated, and then the new cluster is labeled the same as of the closest cluster. To measure the distance of two GMM-based clusters, we propose a modified version of the Kullback–Libner measure. We run several experiments to evaluate the performances of the proposed SGMM and CPAD methods and compare them against some of the well-known algorithms including ABACUS, local outlier factor (LOF), and one-class support vector machine (SVM). The performance of SGMM is compared with ABACUS using Dunn and DB metrics, and the results indicate that the SGMM performs superior in terms of summarizing clusters. Moreover, the proposed CPAD method is compared with the LOF and one-class SVM considering the performance criteria of (a) false alarm rate, (b) detection rate, and (c) memory efficiency. The experimental results show that the CPAD method is noise resilient, memory efficient, and its accuracy is higher than the other methods.


Anomaly detection Arbitrary shape clustering Gaussian Mixture Model Distribution distance 


  1. 1.
    Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. The Morgan Kaufmann series in data management systemsGoogle Scholar
  2. 2.
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58CrossRefGoogle Scholar
  3. 3.
    Zhou CV, Leckie C, Karunasekera S (2009) A survey of coordinated attacks and collaborative intrusion detection. Comput Secur 29:124–140CrossRefGoogle Scholar
  4. 4.
    Teodoro PG, Verdejo JED, Fernández GM, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28:18–28CrossRefGoogle Scholar
  5. 5.
    Beusekom JV, Shafait F (2011) Distortion measurement for automatic document verification. In: International conference on document analysis and recognition (ICDAR), 2011Google Scholar
  6. 6.
    Lin J, Keogh E, Herle HV (2005) Approximations to magic: finding unusual medical time series. In: Proceedings of the 18th IEEE symposium on computer-based medical systems, 2005Google Scholar
  7. 7.
    Sajja PS, Akerkar R (2010) Knowledge-based systems for development. Advanced Knowledge Based Systems: Model, Applications & Research 1–11Google Scholar
  8. 8.
    Mohammadi M, Akbari A, Raahemi B, Nasersharif B, Asgharian H (2014) A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evol Intel 6(5):135–156CrossRefGoogle Scholar
  9. 9.
    Smith R, Bivens A, Embrechits M, Palagiri C, Szymanski B (2002) Clustering approaches for anomaly-based intrusion detection. In: Proceedings of intelligent engineering systems through artificial neural networksGoogle Scholar
  10. 10.
    Hajji H (2005) Statistical analysis of network traffic for adaptive faults detection. Trans Neural Netw 16(5):1053–1063CrossRefGoogle Scholar
  11. 11.
    Ndousse TD, Okuda T (1996) Computational intelligence for distributed fault management in networks using fuzzy cognitive maps. In: 1996 IEEE international conference on communications, 1996. ICC ‘96, Conference Record, Converging Technologies for Tomorrow’s Applications, Dallas, TXGoogle Scholar
  12. 12.
    Brause R, Langsdorf T (1999) Neural data mining for credit card fraud detection. In: Proceedings of the 11th IEEE international conference on tools with artificial intelligence, 1999Google Scholar
  13. 13.
    Tandon G, Chan P (2007) Weighting versus pruning in rule validation for detecting network and host anomalies. In: KDD ‘07 proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
  14. 14.
    Thatte G, Mitra U, Heidemann J (2011) Parametric methods for anomaly detection in aggregate traffic. IEEE/ACM Trans Netw 19(2):512–525CrossRefGoogle Scholar
  15. 15.
    Leng M, Lai X, Tan G, Xu X (2009) Time series representation for anomaly detection. In: 2nd IEEE international conference on computer science and information technology, 2009 (ICCSIT 2009)Google Scholar
  16. 16.
    Sricharan K, Hero AO (2011) Efficient anomaly detection using bipartite k-nn graphs. In: Proceedings of advances in neural information processing systems (NIPS)Google Scholar
  17. 17.
    Orair GH, Teixeira CHC, Meira W, Wang JY, Parthasarathy S (2010) Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow 3(1–2):1469–1480CrossRefGoogle Scholar
  18. 18.
    Xie M, Hu J, Han S, Chen HH (2013) Scalable hyper-grid k-NN-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24:1661–1670CrossRefGoogle Scholar
  19. 19.
    Boriah S, Chandola V, Kumar V (2008) Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the eighth SIAM international conference on data miningGoogle Scholar
  20. 20.
    Breunig MM, Kriegel HP, Ng TR, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of dataGoogle Scholar
  21. 21.
    Kim MS, Han J (2009) A particle-and-density based evolutionary clustering method for dynamic networks. Proc VLDB Endow 2(1):622–633CrossRefGoogle Scholar
  22. 22.
    Scholkopf B, Platt JC, Taylor JS, Smola AJ, Williamson RC (2001) Estimating the support of a high dimensional distribution. Neural Comput 13:1443–1471CrossRefzbMATHGoogle Scholar
  23. 23.
    Keerthi SS, Chapelle O, DeCoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515MathSciNetzbMATHGoogle Scholar
  24. 24.
    Amer M, Goldstein M, Abdennadher S (2013) Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceeding ODD ‘13 proceedings of the ACM SIGKDD workshop on outlier detection and descriptionGoogle Scholar
  25. 25.
    Chen Y, Qian J, Saligrama V (2013) A new one-class SVM for anomaly detection. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP)Google Scholar
  26. 26.
    Shon T, Moon J (2007) A hybrid machine learning approach to network anomaly detection. Inf Sci 177(18):3799–3821CrossRefGoogle Scholar
  27. 27.
    Han JS, Cho SB (2006) Evolutionary neural networks for anomaly detection based on the behavior of a program. IEEE Trans Syst Man Cybern B Cybern 36(3):559–570Google Scholar
  28. 28.
    Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd international conference on knowledge discovery and data mining (KDD-96)Google Scholar
  29. 29.
    He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recog Lett 24(9–10):1641–1650CrossRefzbMATHGoogle Scholar
  30. 30.
    Wang W, Yang J, Muntz RR (1997) Sting: a statistical information grid approach to spatial data mining. In: Proceeding VLDB ‘97 proceedings of the 23rd international conference on very large data bases, San FranciscoGoogle Scholar
  31. 31.
    Agrawal J, Gunopulos D, Raghavan P (1998) Automatic sub-space clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of dataGoogle Scholar
  32. 32.
    Kersting K, Wahabzada M, Thurau C, Bauckhage C (2010) Hierarchical convex NMF for clustering massive data. In: Machine Learning Research—Proceedings Track, pp 253–268Google Scholar
  33. 33.
    Hershberger J, Shrivastava N, Suri S (2009) Summarizing spatial data streams using ClusterHulls. J Exp Algorithm (JEA) 13:4zbMATHGoogle Scholar
  34. 34.
    Gaddam S, Phoha V, Balagani K (2007) K-means+id3: a novel method for supervised anomaly detection by cascading k-means clustering and id3 decision tree learning methods. IEEE Trans Knowl Data Eng 19(3):345–354CrossRefGoogle Scholar
  35. 35.
    Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceeding of SIAM conference on data miningGoogle Scholar
  36. 36.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Goldberger J, Gordon S Greenspan H (2003) An efficient image similarity measure based on approximations of kl divergence between two gaussian mixtures. In: Proceedings of the ninth IEEE international conference on computer visionGoogle Scholar
  38. 38.
    Hershey J, Olsen P (2007) Approximating the Kullback Leibler divergence between Gaussian mixture models. In: IEEE international conference on acoustics, speech and signal processing, 2007 (ICASSP 2007)Google Scholar
  39. 39.
    Chaoji V, Li G, Yildirim H, Zaki MJ (2011) Mining arbitrary shaped clusters from large datasets based on backbone identification. In: SDM 2011Google Scholar
  40. 40.
    Dunn K, Dunn J (1997) Well separated clusters and optimal fuzzy partitions. Cybernetics 4:95–104MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Davies LD, Bouldin WD (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(4):224–227CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Elnaz Bigdeli
    • 1
    Email author
  • Mahdi Mohammadi
    • 2
  • Bijan Raahemi
    • 2
  • Stan Matwin
    • 3
    • 4
  1. 1.School of Electrical Engineering and Computer ScienceUniversity of OttawaOttawaCanada
  2. 2.Knowledge Discovery and Data Mining Lab, Telfer School of ManagementUniversity of OttawaOttawaCanada
  3. 3.Department of ComputingDalhousie UniversityHalifaxCanada
  4. 4.Institute of Computer Science, Polish Academy of SciencesWarsawPoland

Personalised recommendations