Skip to main content
Log in

A Comprehensive Survey of Anomaly Detection Algorithms

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

Anomaly or outlier detection is consider as one of the vital application of data mining, which deals with anomalies or outliers. Anomalies are considered as data points that are dramatically different from the rest of the data points. In this survey, we comprehensively present anomaly detection algorithms in an organized manner. We begin this survey with the definition of anomaly, then provide essential elements of anomaly detection, such as different types of anomaly, different application domains, and evaluation measures. Such anomaly detection algorithms are categorized in seven categories based on their working mechanisms, which includes total of 52 algorithms. The categories are anomaly detection algorithms based on statistics, density, distance, clustering, isolation, ensemble and subspace. For each category, we provide the time complexity of each algorithm and their general advantages and disadvantages. In the end, we compared all discussed anomaly detection algorithms in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Code availability

Not applicable.

Data Availability

Not applicable.

Notes

  1. Anomaly and outlier are widely used terms. In this work, we will use both terms interchangeably.

  2. Anomaly detection and outlier detection are widely used terms. In this paper, we used both terms interchangeably.

  3. The time complexity of this kind of algorithms can be reduced to \(O(n\log (n))\) by using good indexing structure, but they are not feasible in high dimensional space. Thus we mention time complexities without such index throughout the paper.

  4. Clustered anomalies are anomalies, which form cluster of few points outside of the normal cluster.

  5. Some algorithms choose subspace based on statistical test (e.g. HiCS, CMI) and some choose randomly(e.g. Zero++).

  6. Anomaly detection algorithms based on subspace are required to search for the subspace, which requires additional time, which depends on a search method. We only provide scoring time in a subspace.

References

  1. Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4(2):149–178

    Article  Google Scholar 

  2. Ahmed M, Najmul Islam AKM (2020) Deep learning: hope or hype. Ann Data Sci 7(3):427–432

    Article  Google Scholar 

  3. Chandola V, Banerjee A, Kumar V (2007) Outlier detection: a survey. ACM Comput Surv 14:15

    Google Scholar 

  4. Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21

    Article  Google Scholar 

  5. Hawkins DM (1980) Identification of outliers, vol 11. Springer, Berlin

    Book  Google Scholar 

  6. Barnett V, Lewis T (1984) Outliers in statistical data, 3rd edn. Wiley, New York

    Google Scholar 

  7. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD ’00, Association for Computing Machinery, New York, NY, USA, pp 93–104

  8. Jiang MF, Tseng SS, Su CM (2001) Two-phase clustering process for outliers detection. Pattern Recogn Lett 22(6):691–700

    Article  Google Scholar 

  9. Hu T, Sung SY (2003) Detecting pattern-based outliers. Pattern Recogn Lett 24(16):3059–3068

    Article  Google Scholar 

  10. Aryal S, Baniya AA, Santosh KC (2019) Improved histogram-based anomaly detector with the extended principal component features. arXiv preprint arXiv: 1909.12702

  11. Ahmed M (2018) Collective anomaly detection techniques for network traffic analysis. Ann Data Sci 5(4):497–512

    Article  Google Scholar 

  12. Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  Google Scholar 

  13. Aggarwal CC (2017) An introduction to outlier analysis. Springer, Cham, pp 1–34

    Google Scholar 

  14. Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining, vol 10. McGraw-Hill/Irwin, New York

    Google Scholar 

  15. Nick C (2009) Precision at n. Springer, Boston, pp 2127–2128

    Google Scholar 

  16. Zhang E, Zhang Y (2009) Average precision. Springer, Boston, pp 192–193

    Google Scholar 

  17. Hand DJ, Till RJ (2001) A simple generalisation of the area under the roc curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  Google Scholar 

  18. Goldstein M, Uchida S (2016) A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4):1–31, 04

    Article  Google Scholar 

  19. Campos GO, Zimek A, Sander J, Campello RJGB, Micenková B, Schubert E, Assent I, Houle ME (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4):891–927

    Article  Google Scholar 

  20. Shewhart WA (1930) Economic quality control of manufactured product1. Bell Syst Tech J 9(2):364–389

    Article  Google Scholar 

  21. Rosner B (1983) Percentage points for a generalized ESD many-outlier procedure. Technometrics 25(2):165–172

    Article  Google Scholar 

  22. Liu J-P, Weng C-S (1991) Detection of outlying data in bioavailability/bioequivalence studies. Stat Med 10(9):1375–1389

    Article  Google Scholar 

  23. Surace C, Worden K, Tomlinson G (1997) A novelty detection approach to diagnose damage in a cracked beam. In: Proceedings-SPIE the international society for optical engineering, Citeseer, pp 947–953

  24. Surace C, Orden K et al (1998) A novelty detection method to diagnose damage in structures: an application to an offshore platform. In: The eighth international offshore and polar engineering conference, International Society of Offshore and Polar Engineers

  25. Laurikkala J, Juhola M, Kentala E (2000) Informal identification of outliers in medical data. In: Fifth international workshop on intelligent data analysis in medicine and pharmacology, vol 1, pp 20–24

  26. Ye N, Chen Q (2001) An anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems. Qual Reliab Eng Int 17(2):105–112

    Article  Google Scholar 

  27. Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection, vol 589. Wiley, New York

    Google Scholar 

  28. Horn PS, Feng L, Li Y, Pesce AJ (2001) Effect of outliers and nonhealthy individuals on reference interval estimation. Clin Chem 47(12):2137–2142

    Article  Google Scholar 

  29. Solberg HE, Lahti A (2005) Detection of outliers in reference distributions: performance of Horn’s algorithm. Clin Chem 51(2):2326–2332, 12

    Article  Google Scholar 

  30. Dovoedo YH, Chakraborti S (2015) Boxplot-based outlier detection for the location-scale family. Commun Stat Simul Comput 44(6):1492–1513

    Article  Google Scholar 

  31. Gibbons RD (1994) Statistical methods for groundwater monitoring. Wiley, New York

    Book  Google Scholar 

  32. Javitz HS, Valdes A (1991) The SRI ides statistical anomaly detector. In: Proceedings of 1991 IEEE computer society symposium on research in security and privacy, pp 316–326

  33. Gebski M, Wong RK (2007) An efficient histogram method for outlier detection. In: Ramamohanarao KP, Krishna R, Mohania M, Nantajeewarawat E (eds) Advances in databases: concepts, systems and applications. Springer, Berlin, pp 176–187

    Chapter  Google Scholar 

  34. Jiang X-B, Li G-Y, Lian S (2011) Outlier detection algorithm based on variable-width histogram for wireless sensor network. J Comput Appl 31(3):694–697

    Google Scholar 

  35. Goldstein M, Dengel A (2012) Histogram-based outlier score (hbos): a fast unsupervised anomaly detection algorithm. In: KI-2012: poster and demo track, pp 59–63

  36. Xie M, Hu J, Tian B (2012) Histogram-based online anomaly detection in hierarchical wireless sensor networks. In: 2012 IEEE 11th international conference on trust, security and privacy in computing and communications, pp 751–759

  37. Latecki LJ, Lazarevic A, Pokrajac D (2007) Outlier detection with kernel density functions. In: Perner P (ed) Machine learning and data mining in pattern recognition. Springer, Berlin, pp 61–75

    Chapter  Google Scholar 

  38. Oh JH, Gao J (2009) A kernel-based approach for detecting outliers of high-dimensional biological data. In: BMC bioinformatics, vol 10, Springer, p S7

  39. Gao J, Hu W, Zhang Z, Zhang X, Wu O (2011) Rkof: robust kernel-based local outlier detection. In: Huang JZ, Cao L, Srivastava J (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 270–283

    Chapter  Google Scholar 

  40. Askari A, Yang F, Ghaoui LE (2018) Kernel-based outlier detection using the inverse christoffel function

  41. Liu F, Yanwei Yu, Song P, Fan Y, Tong X (2020) Scalable KDE-based top-n local outlier detection over large-scale data streams. Knowl Based Syst 204:106186

    Article  Google Scholar 

  42. Siegel AF, Morgan CJ (1988) Statistics and data analysis: an introduction, 2nd edn. Wiley, New York

    Google Scholar 

  43. Zhang Y, Hamm NAS, Meratnia N, Stein A, van de Voort M, Havinga PJM (2012) Statistics-based outlier detection for wireless sensor networks. Int J Geogr Inf Sci 26(8):1373–1392

    Article  Google Scholar 

  44. Zimek A, Filzmoser P (2018) There and back again: outlier detection between statistical reasoning and data mining algorithms. WIREs Data Min Knowl Discov 8(6):e1280

    Google Scholar 

  45. Tang J, Chen Z, Fu AWC, Cheung DW (2002) Enhancing effectiveness of outlier detections for low density patterns. In: Chen MS, Yu PS, Liu B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 535–548

    Chapter  Google Scholar 

  46. Kriegel H-P, Kröger P, Schubert E, Zimek A (2009) Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09, Association for Computing Machinery, New York, NY, USA, pp 1649–1652

  47. Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings 19th international conference on data engineering (Cat. No. 03CH37405), pp 315–326

  48. Ren D, Wang B, Perrizo W (2004) Rdf: a density-based outlier detection method using vertical data representation. In: extitFourth IEEE international conference on data mining (ICDM’04), pp 503–506

  49. Jin W, Tung Anthony KH, Han J, Wang W (2006) Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’06, Springer, Berlin, pp 577–593

  50. Fan H, Zaïane OR, Foss A, Wu J (2009) Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowl Inf Syst 19(1):31–51

    Article  Google Scholar 

  51. Goldstein M (2012) Fastlof: an expectation-maximization based local outlier detection algorithm. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 2282–2285

  52. Momtaz R, Nesma M, Gowayyed MA (2013) Dwof: a robust density-based outlier detection approach. In: Sanches JM, Micó L, Cardoso JS (eds) Pattern recognition and image analysis. Springer, Berlin, pp 517–525

    Chapter  Google Scholar 

  53. Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc 28(1):190–237

    Article  Google Scholar 

  54. Wells JR, Ting KM, Washio T (2014) Linearn: a new approach to nearest neighbour density estimator. Pattern Recogn 47(8):2702–2720

    Article  Google Scholar 

  55. Campello Ricardo JGB, Moulavi D, Zimek A, Sander J (2015) Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans Knowl Discov Data 10(1):1–51

    Article  Google Scholar 

  56. Aryal S, Ting KM, Haffari G (2016) Revisiting attribute independence assumption in probabilistic unsupervised anomaly detection. In: Michael C, Alan Wang G, Hsinchun C (eds) Intelligence and security informatics. Springer, Cham, pp 73–86

    Chapter  Google Scholar 

  57. Abdi H, Williams LJ (2010) Principal component analysis. WIREs Comput Stat 2(4):433–459

    Article  Google Scholar 

  58. Aggarwal CC (2017) Proximity-based outlier detection. Springer, Cham, pp 111–147

    Google Scholar 

  59. Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recogn 74:406–421

    Article  Google Scholar 

  60. Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24rd international conference on very large data bases, VLDB ’98, Kaufmann Publishers Inc, San Francisco, CA, USA, Morgan, pp 392–403

  61. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3):237–253

    Article  Google Scholar 

  62. Ramaswamy S, Rastogi R, Shim K (2000) Efficient algorithms for mining outliers from large data sets. SIGMOD Rec 29(2):427–438

    Article  Google Scholar 

  63. Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Disc 16(3):349–364

    Article  Google Scholar 

  64. Kriegel HP, Schubert M, Zimek A (2008) Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, Association for Computing Machinery, New York, pp 444–452

  65. Wang B, Xiao G, Yu H, Yang X (2009) Distance-based outlier detection on uncertain data. In: 2009 Ninth IEEE international conference on computer and information technology, vol 1, pp 293–298

  66. Zhang K, Hutter M, Jin H (2009) A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho T-B (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 813–822

    Chapter  Google Scholar 

  67. Sugiyama M, Borgwardt K (2013) Rapid distance-based outlier detection via sampling. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26, Curran Associates Inc, pp 467–475

  68. Radovanović M, Nanopoulos A, Ivanović M (2015) Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans Knowl Data Eng 27(5):1369–1382

    Article  Google Scholar 

  69. Wang H, Bah MJ, Hammad M (2019) Progress in outlier detection techniques: a survey. IEEE Access 7:107964–108000

    Article  Google Scholar 

  70. Berchtold S, Keim DA, Kriegel H-P (1996) The x-tree: an index structure for high-dimensional data. In: Proceedings of the 22th international conference on very large data bases, VLDB ’96, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 28–39

  71. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. SIGMOD Rec 14(2):47–57

    Article  Google Scholar 

  72. Sellis TK, Roussopoulos N, Faloutsos C (1987) The r+-tree: a dynamic index for multi-dimensional objects. In: Proceedings of the 13th international conference on very large data bases, VLDB ’87, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 507–518

  73. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18(9):509–517

    Article  Google Scholar 

  74. Dantong Yu, Sheikholeslami G, Zhang A (2002) Findout: finding outliers in very large datasets. Knowl Inf Syst 4(4):387–412

    Article  Google Scholar 

  75. He Z, Xiaofei X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650

    Article  Google Scholar 

  76. Jiang S, An Q (2008) Clustering-based outlier detection method. In: 2008 Fifth international conference on fuzzy systems and knowledge discovery, vol 2, pp 429–433

  77. Liu FT, Ting KM, Zhou Z (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422

  78. Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6(1):1–39

    Article  Google Scholar 

  79. Liu FT, Ting KM, Zhou ZH (2010) On detecting clustered anomalies using sciforest. In: Balcázar JL, Bonchi F, Gionis A, Sebag M (eds) Machine learning and knowledge discovery in databases. Springer, Berlin, pp 274–290

    Chapter  Google Scholar 

  80. Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the twenty-second international joint conference on artificial intelligence, vol 2, IJCAI’11, AAAI Press, pp 1511–1516

  81. Aryal S, Ting KM, Wells JR, Washio T (2014) Improving iforest with relative mass. In: Tseng VS, Ho TB, Zhou ZH, Chen ALP, Kao HY (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 510–521

    Chapter  Google Scholar 

  82. Bandaragoda TR, Ting KM, Albrecht D, Liu FT, Wells JR (2014) Efficient anomaly detection by isolation using nearest neighbour ensemble. In: 2014 IEEE International conference on data mining workshop, pp 698–705

  83. Bandaragoda TR, Ting KM, Albrecht D, Liu FT, Zhu Y, Wells JR (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998

    Article  Google Scholar 

  84. Pang G, Ting KM, Albrecht D (2015) Lesinn: detecting anomalies by identifying least similar nearest neighbours. In: 2015 IEEE international conference on data mining workshop (ICDMW), pp 623–630

  85. Zhang X, Dou W, He Q, Zhou R, Leckie C, Kotagiri R, Salcic Z (2017) Lshiforest: a generic framework for fast tree isolation based ensemble anomaly analysis. In: 2017 IEEE 33rd international conference on data engineering (ICDE), pp 983–994

  86. Aryal S (2018) Anomaly detection technique robust to units and scales of measurement. In: Phung D, Tseng VS, Webb GI, Ho B, Ganji M, Rashidi L (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 589–601

    Chapter  Google Scholar 

  87. Aryal S, Santosh KC, Dazeley R (2020) usfad: a robust anomaly detector based on unsupervised stochastic forest. Int J Mach Learn Cybern 12:1–14

    Google Scholar 

  88. Ting KM, Zhou G-T, Liu FT, Tan JSC (2010) Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10, Association for Computing Machinery, New York, NY, USA, pp 989–998

  89. Fernando TL, Webb GI (2017) Simusf: an efficient and effective similarity measure that is invariant to violations of the interval scale assumption. Data Min Knowl Disc 31(1):264–286

    Article  Google Scholar 

  90. Ting KM, Washio T, Wells JR, Aryal S (2017) Defying the gravity of learning curve: a characteristic of nearest neighbour anomaly detectors. Mach Learn 106(1):55–91

    Article  Google Scholar 

  91. Bandaragoda TR (2015) Isolation based anomaly detection: a re-examination. PhD thesis, Monash University

  92. Pevnỳ T (2016) Loda: lightweight on-line detector of anomalies. Mach Learn 102(2):275–304

    Article  Google Scholar 

  93. Zhao Y, Hryniewicki MK (2018) DCSO: dynamic combination of detector scores for outlier ensembles. In: ACM SIGKDD ODD workshop, London, UK

  94. Zhao Y, Nasrullah Z, Hryniewicki MK, Li Z (2019) LSCP: locally selective combination in parallel outlier ensembles. In: Proceedings of the 2019 SIAM international conference on data mining, SDM 2019, Calgary, Canada, pp 585–593

  95. Aggarwal CC (2013) Outlier ensembles: position paper. SIGKDD Explor Newsl 14(2):49–58

    Article  Google Scholar 

  96. Aggarwal CC (2017) Outlier ensembles. Springer, Cham, pp 185–218

    Book  Google Scholar 

  97. Zimek A, Campello RJGB, Sander J (2014) Ensembles for unsupervised outlier detection: challenges and research questions a position paper. SIGKDD Explor Newsl 15(2):11–22

    Article  Google Scholar 

  98. Kriegel H-P, Kröger P, Schubert E, Zimek A (2009) Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 831–838

    Chapter  Google Scholar 

  99. Agrawal A (2009) Local subspace based outlier detection. In: Ranka S, Aluru S, Buyya R, Chung Y-C, Dua S, Grama A, Gupta SKS, Kumar R, Phoha VV (eds) Contemporary computing. Springer, Heidelberg, pp 149–157

    Chapter  Google Scholar 

  100. Nguyen HV, Gopalkrishnan V, Assent I (2011) An unbiased distance-based outlier detection approach for high-dimensional data. In: Jeffrey XY, Myoung HK, Rainer U (eds) Database systems for advanced applications. Springer, Berlin, pp 138–152

    Chapter  Google Scholar 

  101. Kriegel H, Kröger P, Schubert E, Zimek A (2012) Outlier detection in arbitrarily oriented subspaces. In: 2012 IEEE 12th international conference on data mining, pp 379–388

  102. Keller F, Muller E, Bohm K (2012) Hics: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp 1037–1048

  103. Nguyen HV, Müller E, Vreeken J, Keller F, Böhm, K (2013) Cmi: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: Proceedings of the 2013 SIAM international conference on data mining, SIAM, pp 198–206

  104. Pang G, Ting KM, Albrecht D, Jin H (2016) Zero++: harnessing the power of zero appearances to detect anomalies in large-scale data sets. J Artif Intell Res 57:593–620

    Article  Google Scholar 

  105. Aggarwal CC (2017) High-dimensional outlier detection: the subspace method, Springer International Publishing, Cham, pp 149–184

  106. Zimek A, Schubert E, Kriegel H-P (2012) A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min ASA Data Sci J 5(5):363–387

    Article  Google Scholar 

Download references

Acknowledgements

The authors would also like to thank the anonymous reviewers for their valuable comments and suggestions to improve the manuscript.

Funding

No funding recieved.

Author information

Authors and Affiliations

Authors

Contributions

DS conducted the systematic literature review and examined various outlier detection techniques. DS wrote the first draft of the manuscript. DS made significant contributions to design and structure of review. AT review the work and edit the manuscript. All authors read and approved the final manuscript.

Ethics declarations

Conflict of interest

Not applicable.

Ethical approval

This article does not contain any studies with human participants by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Samariya, D., Thakkar, A. A Comprehensive Survey of Anomaly Detection Algorithms. Ann. Data. Sci. 10, 829–850 (2023). https://doi.org/10.1007/s40745-021-00362-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-021-00362-9

Keywords

Navigation