Skip to main content
Log in

Outlier and anomaly pattern detection on data streams

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A data stream is a sequence of data generated continuously over time. A data stream is too big to be saved in memory, and its underlying data distribution may change over time. Outlier detection aims to find data instances which significantly deviate from the underlying data distribution. While most of outlier detection methods work in batch mode where all the data samples are available at once, the necessity for efficient outlier and anomaly pattern detection methods in a data stream has increased. Outlier detection is performed at an individual instance level, and anomalous pattern detection involves detecting a point in time where the behavior of the data becomes unusual and differs from normal behavior. Alternatively, concept drift detection methods find a concept-changing point in the streaming data and try to adapt the model to the new emerging pattern. In this paper, we provide a review of outlier detection, anomaly pattern detection, and concept drift detection for streaming data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Databases

  2. Ahmad S, Purdy S (2016) Real-time anomaly detection for streaming analytics. https://arxiv.org/pdf/1607.02480.pdf. Accessed 8 July 2016

  3. Wang G, Chen Y, Zheng X (2018) Gaussian field consensus: a robust nonparametric matching method for outlier rejection. Pattern Recognit 74:305–316

    Article  Google Scholar 

  4. Li S, Liu C, Yang Y (2018) Anomaly detection based on maximum a posteriori. Pattern Recognit Lett 107:91–97

    Article  Google Scholar 

  5. Mazzawi H, Dalai G, Rozenblat D, Ein-Dor L, Ninio M, Lavi O, Adir A, Aharoni E, Kermany E (2017) Anomaly detection in large databases using behavioral patterning. In: Proceedings of ICDE

  6. Li T, Ma J, Sun C (2018) Dlog: diagnosing router events with syslogs for anomaly detection. J Supercomput 74(2):845–867

    Article  Google Scholar 

  7. Ahmed M (2018) Reservoir-based network traffic stream summarization for anomaly detection. Pattern Anal Appl 21(2):579–599

    Article  MathSciNet  Google Scholar 

  8. Park CH (2018) Anomaly pattern detection on data streams. In: Proceedings of the International Workshop on Big Data Analysis for Smart Energy

  9. Breunig M, Kriegel H, Ng R, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM Sigmod International Conference on Management of Data

  10. Markou M, Singh A (2003) Novelty detection: a review—part 1: statistical approaches. Signal Process 83(12):2481–2497

    Article  Google Scholar 

  11. Markou M, Singh A (2003) Novelty detection: a review—part 2: neural network based approaches. Signal Process 83(12):2499–2521

    Article  Google Scholar 

  12. Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit 74:406–421

    Article  Google Scholar 

  13. Remy P (2016) Anomaly detection in time setries using auto encoders. http://philipperemy.github.io/anomaly-detection. Accessed 2 Oct 2016

  14. Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215

    Article  Google Scholar 

  15. Angiulli F, Fassetti F (2007) Detecting distance-based outlies in streams of data. In: Proceedings of CIKM

  16. Kontaki M, Gounaris A, Papadopoulos A, Tsichlas K, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: Proceedings of ICDE

  17. Tran L, Fan L, Shahabi C (2016) Distance-based outlier detection in data streams. In: Proceedings of the VLDB

  18. Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner E (2014) Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of ICDE

  19. Steinbach M, Tan P, Kumar V (2006) Introduction to data mining. Addison Wesley, Boston

    Google Scholar 

  20. Pokrajac D, Lazarevic A, Latecki L (2007) Incremental local outlier detection for data streams. In: Proceedings of the CIDM

  21. Karimian S, Kelarestaghi M, Hashemi S (2012) I-inclof: improved incremental local outlier detection for data streams. In: Proceedings of the AISP

  22. Kim J, Lee W, Song J, Lee S (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148

    Article  Google Scholar 

  23. Tseng S, Jiang M, Su C (2001) Two-phase clustering process for outliers detection. Pattern Recognit Lett 22(6/7):691–700

    Article  Google Scholar 

  24. Xu X, He Z, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650

    Article  Google Scholar 

  25. Bhatia M, Dhaliwal P, Bansal P (2010) A cluster-based approach for outlier detection in dynamic data streams (KORM: k-median outlier miner). J Comput 2(2):74–80

    Google Scholar 

  26. Nisar W, Elahi M, Li K, Lv X (2008) Efficient clustering-based outlier detection algorithm for dynamic data stream. In: Proceedings of FSKD

  27. Spinosa E, Carvalho A, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the SAC

  28. Spinosa E, Carvalho A, Gama J (2007) Olindda: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the SAC

  29. Ting K, Liu F, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th International Conference on Data Mining

  30. Ting K, Tan S, Liu T (2011) Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence

  31. Fan W, Edwards A, Wu K, Zhang K, Yu P (2014) Rs-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th International Conference on Data Mining

  32. Pevny T (2016) Loda: lightweight on-line detector of anomalies. Mach Learn 102:275–304

    Article  MathSciNet  Google Scholar 

  33. Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the International Conference on Data Warehousing and Knowledge Discovery

  34. Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML

  35. Feng Q, Zhang Y, Li C, Dou Z, Wang J (2017) Anomaly detection of spectrum in wireless communication via deep auto-encoders. J Supercomput 73(7):3161–3178

    Article  Google Scholar 

  36. Eyben F, Squartini S, March E, Vesperini F, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: Proceedings of ICASSP

  37. Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA

  38. Kaplan S, Tour A, Hutchinson B (2017) Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. In: Proceedings of AI for Cyber Security Workshop at AAAI

  39. Barbara D, Domeniconi C, Rogers J (2006) Detecting outliers using transduction and statistical testing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

  40. Park C, Shim H (2010) Detection of an emerging new class using statistical hypothesis testing and density estimation. Int J Pattern Recognit Artif Intell 24(1):1–14

    Article  Google Scholar 

  41. Ho D, Wechsler H (2010) A martingale framework for detecting changes in data streams by testing exchaneability. IEEE Trans Pattern Anal Mach Intell 32(12):2113–2127

    Article  Google Scholar 

  42. Bifet A, Pechenizkiy M, Gama J, Zliobaite I, Bouchachia A (2014) A survey on concept drift adaption. ACM Comput Surv 46(4):44:1–44:37

    MATH  Google Scholar 

  43. Gama J, Medas P, Castillo G, Rpdrigues P (2004) Learning with drift detection. In: Proceedings of the SBIA Brazilian Symposium on Artificial Intelligence

  44. Baena-Garcia M, Campo-Avilla J, Fidalgo R, Bifet A, Gavalda R, Moales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD Workshop on Knowledge Discovery from Data Streams

  45. Zliobaite I (2010) Change with delayed labeling: when is it detectable? In: Proceedings of the IEEE International Conference on Data Mining Workshops

  46. Lindstrom P, Namee B, Delany S (2011) Drift detection using uncertainty distribution divergence. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops

  47. Sethi T, Kantardzic M (2015) Dont pay for validation: detecting drifts from unlabeled data using margin density. In: Proceedings of the INNS Conference on Big Data

  48. Kim Y, Park C (2017) An efficient concept drift detection method for streaming data under limited labeling. IEICE Trans Inf syst E100–D(10):2537–2546

    Article  Google Scholar 

  49. Reis D, Flach P, Matwin S (2016) Fast unsupervised online drift detection using incremental kolmogorovsmironv test. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  50. Hido S, Ide T, Kashima H, Kubo H, Matsuzawa H (2008) Unsupervised change analysis using supervised learning. In: Advances in Knowledge Discovery and Data Mining

  51. Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the AAAI

  52. Masud M, Gao J, Khan L, Han J, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by Korea Electric Power Corporation. (Grant number: R18XA05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheong Hee Park.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, C.H. Outlier and anomaly pattern detection on data streams. J Supercomput 75, 6118–6128 (2019). https://doi.org/10.1007/s11227-018-2674-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2674-1

Keywords

Navigation