Abstract
A data stream is a sequence of data generated continuously over time. A data stream is too big to be saved in memory, and its underlying data distribution may change over time. Outlier detection aims to find data instances which significantly deviate from the underlying data distribution. While most of outlier detection methods work in batch mode where all the data samples are available at once, the necessity for efficient outlier and anomaly pattern detection methods in a data stream has increased. Outlier detection is performed at an individual instance level, and anomalous pattern detection involves detecting a point in time where the behavior of the data becomes unusual and differs from normal behavior. Alternatively, concept drift detection methods find a concept-changing point in the streaming data and try to adapt the model to the new emerging pattern. In this paper, we provide a review of outlier detection, anomaly pattern detection, and concept drift detection for streaming data.
Similar content being viewed by others
References
Knorr E, Ng R (1999) Finding intensional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Databases
Ahmad S, Purdy S (2016) Real-time anomaly detection for streaming analytics. https://arxiv.org/pdf/1607.02480.pdf. Accessed 8 July 2016
Wang G, Chen Y, Zheng X (2018) Gaussian field consensus: a robust nonparametric matching method for outlier rejection. Pattern Recognit 74:305–316
Li S, Liu C, Yang Y (2018) Anomaly detection based on maximum a posteriori. Pattern Recognit Lett 107:91–97
Mazzawi H, Dalai G, Rozenblat D, Ein-Dor L, Ninio M, Lavi O, Adir A, Aharoni E, Kermany E (2017) Anomaly detection in large databases using behavioral patterning. In: Proceedings of ICDE
Li T, Ma J, Sun C (2018) Dlog: diagnosing router events with syslogs for anomaly detection. J Supercomput 74(2):845–867
Ahmed M (2018) Reservoir-based network traffic stream summarization for anomaly detection. Pattern Anal Appl 21(2):579–599
Park CH (2018) Anomaly pattern detection on data streams. In: Proceedings of the International Workshop on Big Data Analysis for Smart Energy
Breunig M, Kriegel H, Ng R, Sander J (2000) Lof: identifying density-based local outliers. In: Proceedings of the 2000 ACM Sigmod International Conference on Management of Data
Markou M, Singh A (2003) Novelty detection: a review—part 1: statistical approaches. Signal Process 83(12):2481–2497
Markou M, Singh A (2003) Novelty detection: a review—part 2: neural network based approaches. Signal Process 83(12):2499–2521
Domingues R, Filippone M, Michiardi P, Zouaoui J (2018) A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recognit 74:406–421
Remy P (2016) Anomaly detection in time setries using auto encoders. http://philipperemy.github.io/anomaly-detection. Accessed 2 Oct 2016
Angiulli F, Pizzuti C (2005) Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng 17(2):203–215
Angiulli F, Fassetti F (2007) Detecting distance-based outlies in streams of data. In: Proceedings of CIKM
Kontaki M, Gounaris A, Papadopoulos A, Tsichlas K, Manolopoulos Y (2011) Continuous monitoring of distance-based outliers over data streams. In: Proceedings of ICDE
Tran L, Fan L, Shahabi C (2016) Distance-based outlier detection in data streams. In: Proceedings of the VLDB
Cao L, Yang D, Wang Q, Yu Y, Wang J, Rundensteiner E (2014) Scalable distance-based outlier detection over high-volume data streams. In: Proceedings of ICDE
Steinbach M, Tan P, Kumar V (2006) Introduction to data mining. Addison Wesley, Boston
Pokrajac D, Lazarevic A, Latecki L (2007) Incremental local outlier detection for data streams. In: Proceedings of the CIDM
Karimian S, Kelarestaghi M, Hashemi S (2012) I-inclof: improved incremental local outlier detection for data streams. In: Proceedings of the AISP
Kim J, Lee W, Song J, Lee S (2017) Optimized combinatorial clustering for stochastic processes. Cluster Comput 20(2):1135–1148
Tseng S, Jiang M, Su C (2001) Two-phase clustering process for outliers detection. Pattern Recognit Lett 22(6/7):691–700
Xu X, He Z, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641–1650
Bhatia M, Dhaliwal P, Bansal P (2010) A cluster-based approach for outlier detection in dynamic data streams (KORM: k-median outlier miner). J Comput 2(2):74–80
Nisar W, Elahi M, Li K, Lv X (2008) Efficient clustering-based outlier detection algorithm for dynamic data stream. In: Proceedings of FSKD
Spinosa E, Carvalho A, Gama J (2008) Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the SAC
Spinosa E, Carvalho A, Gama J (2007) Olindda: a cluster-based approach for detecting novelty and concept drift in data streams. In: Proceedings of the SAC
Ting K, Liu F, Zhou Z (2008) Isolation forest. In: Proceedings of the 8th International Conference on Data Mining
Ting K, Tan S, Liu T (2011) Fast anomaly detection for streaming data. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence
Fan W, Edwards A, Wu K, Zhang K, Yu P (2014) Rs-forest: a rapid density estimator for streaming anomaly detection. In: Proceedings of the 14th International Conference on Data Mining
Pevny T (2016) Loda: lightweight on-line detector of anomalies. Mach Learn 102:275–304
Hawkins S, He H, Williams G, Baxter R (2002) Outlier detection using replicator neural networks. In: Proceedings of the International Conference on Data Warehousing and Knowledge Discovery
Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. In: Proceedings of the ICML
Feng Q, Zhang Y, Li C, Dou Z, Wang J (2017) Anomaly detection of spectrum in wireless communication via deep auto-encoders. J Supercomput 73(7):3161–3178
Eyben F, Squartini S, March E, Vesperini F, Schuller B (2015) A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: Proceedings of ICASSP
Chauhan S, Vig L (2015) Anomaly detection in ECG time series via deep long short-term memory metworks. In: Proceedings of DSAA
Kaplan S, Tour A, Hutchinson B (2017) Deep learning for unsupervised insider threat detection in structured cybersecurity data streams. In: Proceedings of AI for Cyber Security Workshop at AAAI
Barbara D, Domeniconi C, Rogers J (2006) Detecting outliers using transduction and statistical testing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Park C, Shim H (2010) Detection of an emerging new class using statistical hypothesis testing and density estimation. Int J Pattern Recognit Artif Intell 24(1):1–14
Ho D, Wechsler H (2010) A martingale framework for detecting changes in data streams by testing exchaneability. IEEE Trans Pattern Anal Mach Intell 32(12):2113–2127
Bifet A, Pechenizkiy M, Gama J, Zliobaite I, Bouchachia A (2014) A survey on concept drift adaption. ACM Comput Surv 46(4):44:1–44:37
Gama J, Medas P, Castillo G, Rpdrigues P (2004) Learning with drift detection. In: Proceedings of the SBIA Brazilian Symposium on Artificial Intelligence
Baena-Garcia M, Campo-Avilla J, Fidalgo R, Bifet A, Gavalda R, Moales-Bueno R (2006) Early drift detection method. In: Proceedings of ECML PKDD Workshop on Knowledge Discovery from Data Streams
Zliobaite I (2010) Change with delayed labeling: when is it detectable? In: Proceedings of the IEEE International Conference on Data Mining Workshops
Lindstrom P, Namee B, Delany S (2011) Drift detection using uncertainty distribution divergence. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
Sethi T, Kantardzic M (2015) Dont pay for validation: detecting drifts from unlabeled data using margin density. In: Proceedings of the INNS Conference on Big Data
Kim Y, Park C (2017) An efficient concept drift detection method for streaming data under limited labeling. IEICE Trans Inf syst E100–D(10):2537–2546
Reis D, Flach P, Matwin S (2016) Fast unsupervised online drift detection using incremental kolmogorovsmironv test. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Hido S, Ide T, Kashima H, Kubo H, Matsuzawa H (2008) Unsupervised change analysis using supervised learning. In: Advances in Knowledge Discovery and Data Mining
Haque A, Khan L, Baron M (2016) Sand: semi-supervised adaptive novel class detection and classification over data stream. In: Proceedings of the AAAI
Masud M, Gao J, Khan L, Han J, Thuraisingham B (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874
Acknowledgements
This research was supported by Korea Electric Power Corporation. (Grant number: R18XA05).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Park, C.H. Outlier and anomaly pattern detection on data streams. J Supercomput 75, 6118–6128 (2019). https://doi.org/10.1007/s11227-018-2674-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2674-1