Anomaly detection in large-scale data stream networks

Pham, Duc-Son; Venkatesh, Svetha; Lazarescu, Mihai; Budhaditya, Saha

doi:10.1007/s10618-012-0297-3

Anomaly detection in large-scale data stream networks

Published: 02 December 2012

Volume 28, pages 145–189, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Duc-Son Pham¹,
Svetha Venkatesh²,
Mihai Lazarescu¹ &
…
Saha Budhaditya²

2515 Accesses
39 Citations
6 Altmetric
Explore all metrics

Abstract

This paper addresses the anomaly detection problem in large-scale data mining applications using residual subspace analysis. We are specifically concerned with situations where the full data cannot be practically obtained due to physical limitations such as low bandwidth, limited memory, storage, or computing power. Motivated by the recent compressed sensing (CS) theory, we suggest a framework wherein random projection can be used to obtained compressed data, addressing the scalability challenge. Our theoretical contribution shows that the spectral property of the CS data is approximately preserved under a such a projection and thus the performance of spectral-based methods for anomaly detection is almost equivalent to the case in which the raw data is completely available. Our second contribution is the construction of the framework to use this result and detect anomalies in the compressed data directly, thus circumventing the problems of data acquisition in large sensor networks. We have conducted extensive experiments to detect anomalies in network and surveillance applications on large datasets, including the benchmark PETS 2007 and 83 GB of real footage from three public train stations. Our results show that our proposed method is scalable, and importantly, its performance is comparable to conventional methods for anomaly detection when the complete data is available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Though we only report these amounts of data in this paper, we note that the proposed method forms a core of a more complex commercial system that has been successfully tested over thousand hours of video, equivalent to hundred of Tetrabyes. For detail see http://www.icetana.com.au.

References

Achlioptas D (2001) Database-friendly random projections. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, New York, pp 274–281. http://doi.acm.org/10.1145/375551.375608
Adam A, Rivlin E, Shimshoni I, Reinitz D (2008) Robust real-time unusual event detection using multiple fixed-location monitors. IEEE Trans Pattern Anal Mach Intell 30:555–560
Google Scholar
Aggarwal C (2005) On abnormality detection in spuriously populated data streams. In: Proceedings of the IEEE international conference on data mining (ICDM), Houston
Barnett V, Lewis T (1984) Outliers in statistical data. Chichester, New York
MATH Google Scholar
Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the KDD. ACM, New York, pp 245–250
Brand M, Oliver N, Pentland A (1997) Coupled hidden Markov models for complex action recognition. In: IEEE CVPR, San Juan, pp 994–999
Budhaditya S, Pham D, Lazarescu M, Venkatesh S (2009) Effective anomaly detection in sensor networks data streams. In: Proceedings of the IEEE international conference on data mining (ICDM), Miami, pp 722–727
Candes E, Tao T (2006) Near optimal signal recovery from random projections: universal encoding strategies. IEEE Trans Inf Theory 52:5406–5425
Article MathSciNet Google Scholar
Candes E, Romberg J, Tao T (2006) Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans Inf Theory 52(2):489–509
Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41:Article 15
Chatzigiannakis V, Papavassiliou S, Grammatikou M, Maglaris B (2006) Hierarchical anomaly detection in distributed large-scale sensor networks. In: Proceedings of the 11th IEEE symposium on computers and communications (ISCC), Washington, DC, pp 761–767
Donoho D (2006) Compressed sensing. IEEE Trans Inf Theory 52:1289–1306
Article MathSciNet Google Scholar
Drineas P, Frieze A, Kannan R, Vempala S, Vinay V (2004) Clustering large graphs via the singular value decomposition. Mach Learn 56(1):9–33
Google Scholar
Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J Comput 36(1):158
Google Scholar
Elad M (2007) Optimized projections for compressed sensing. IEEE Trans Signal Proc 55:5695–5702
Article MathSciNet Google Scholar
Fowler J (2009) Compressive-projection principal component analysis and the first eigenvector. In: Data compression conference, 2009, DCC’09, Snowbird. IEEE, Washington, DC, pp 223–232
Fujimaki R (2008) Anomaly detection support vector machine and its application to fault diagnosis. In: Proceedings of the IEEE international conference on data mining (ICDM), Washington, DC, pp 797–802
Geman S (1980) A limit theorem for the norm of random matrices. Ann Probab 8:252–261
Article MATH MathSciNet Google Scholar
Giatrakos N, Kotidis Y, Deligiannakis A, Vassalos V, Theodoridis Y (2010) Taco: tunable approximate computation of outliers in wireless sensor networks. In: Proceedings of the 2010 international conference on Management of data. ACM, New York, pp 279–290
Golub Loan V (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
http://www.abilene.iu.edu/
http://www.cvg.rdg.ac.uk/pets2007/data.html/
Huang L, Nguyen X, Garofalakis M, Jordan M, Joseph A, Taft N (2007) In-network PCA and anomaly detection. In: Proceedings of NIPS, Vancouver, pp 617–624
Jackson J (1959) Quality control methods for several related variables. Technometrics 1:359–377
Article MathSciNet Google Scholar
Jackson J (1980) Principal components and factor analysis. I—principal components. J Qual Technol 12:201–213
Google Scholar
Jackson E, Mudholkar G (1979) Control procedures for residuals associated with principal component analysis. Technometrics 21(3):341–349
Article MATH Google Scholar
Janakiram D, Reddy V, Kumar A (2006) Outlier detection in wireless sensor networks using Bayesian belief networks. In: Proceedings of the first international conference on communication system software and middleware, New Delhi
Jiang X, Cooper G (2010) A real-time temporal bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection. Data Min Knowl Discov 20(3):328–360
Article MathSciNet Google Scholar
Koufakou A, Georgiopoulos M (2010) A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes. Data Min Knowl Discov 20(2):259–289
Article MathSciNet Google Scholar
Lakhina A, Crovella M, Diot C (2004) Diagonising network-wide traffic anomalies. In: Proceedings of ACM SIGCOMM, Portland
Li W, Yue H, Valle-Cervantes S, Qin S (2000) Recursive PCA for adaptive process monitoring. J Process Control 10(5):471–486
Google Scholar
Liu K, Kargupta H, Ryan J (2006) Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Trans Knowl Data Eng 18(1):92–106
Google Scholar
Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. Proc IJCAI 81:674–679
Google Scholar
Medioni G, Cohen I, Brémond F, Hongeng S, Nevatia R (2001) Event detection and analysis from video streams. IEEE Trans Pattern Anal Mach Intell 23:873–889
Google Scholar
Niebles J, Wang H, Fei-Fei L (2008) Unsupervised learning of human action categories using spatial-temporal words. Int J Comput Vis 79(3):299–318
Google Scholar
Noto K, Brodley C, Slonim D (2011) Frac: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25:109–133
Google Scholar
Phung D, Duong T, Venkatesh S, Bui H (2005) Topic transition detection using hierarchical hidden Markov and semi-Markov models. In: Proceedings of ACM-MM, New York, pp 11–20
Rabbat M, Haupt J, Singh A, Nowak R (2006) Decentralized compression and predistribution via randomized gossiping. In: Proceedings of IPSN, New York, pp 51–59
Strohmer T, Heath R (2003) Grassmannian frames with applications to coding and communication. Appl Comput Harmon Anal 14:257–275
Article MATH MathSciNet Google Scholar
Thottan M, Ji C (2003) Anomaly detection in IP networks. IEEE Trans Signal Process 51(8):2191–2204
Article Google Scholar
Vempala S (2004) The random projection method. American Mathematical Society (AMS)
Yan J, Zhang B, Liu N, Yan S, Cheng Q, Fan W, Yang Q, Xi W, Chen Z (2006) Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing. IEEE Trans Knowl Data Eng 18:320–333
Google Scholar
Zhu C, Kitagawa H, Faloutsos C (2005) Example-based robust outlier detection in high dimensional datasets. In: Proceedings of ICDM, Houston

Download references

Author information

Authors and Affiliations

Department of Computing, Curtin University, Perth, WA, Australia
Duc-Son Pham & Mihai Lazarescu
Center for Pattern Recognition and Data Analytics (PRaDA), Deakin University, Geelong, VIC, Australia
Svetha Venkatesh & Saha Budhaditya

Authors

Duc-Son Pham
View author publications
You can also search for this author in PubMed Google Scholar
Svetha Venkatesh
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Lazarescu
View author publications
You can also search for this author in PubMed Google Scholar
Saha Budhaditya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc-Son Pham.

Additional information

Communicated by Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, DS., Venkatesh, S., Lazarescu, M. et al. Anomaly detection in large-scale data stream networks. Data Min Knowl Disc 28, 145–189 (2014). https://doi.org/10.1007/s10618-012-0297-3

Download citation

Received: 06 October 2011
Accepted: 08 November 2012
Published: 02 December 2012
Issue Date: January 2014
DOI: https://doi.org/10.1007/s10618-012-0297-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anomaly detection in large-scale data stream networks

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Maximizing adjusted covariance: new supervised dimension reduction for classification

A Critical Review on Structural Health Monitoring: Definitions, Methods, and Perspectives

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Anomaly detection in large-scale data stream networks

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Anomaly Detection Algorithms

Maximizing adjusted covariance: new supervised dimension reduction for classification

A Critical Review on Structural Health Monitoring: Definitions, Methods, and Perspectives

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation