Soft Computing

, Volume 21, Issue 20, pp 5905–5917 | Cite as

Streaming data anomaly detection method based on hyper-grid structure and online ensemble learning

Focus

Abstract

This paper proposes a novel online streaming data anomaly detection method. By using the new method, the improved \(L_{1}\) detection neighbor region optimizes the initial hyper-grid-based anomaly detection method by decreasing the quantity of neighbor detection region, and online ensemble learning adapts to the distribution evolving characteristic of streaming data and overcomes the difficulty of obtaining the optimal hyper-grid structure. To validate the proposed method, the paper uses a real-world dataset and two simulated datasets and finds out that the experimental results are near to the optimal results.

Keywords

Hyper-grid structure Online ensemble learning Anomaly detection Streaming data 

Notes

Acknowledgments

This study was funded by the National High Technology Research and Development Program of China (Grant No. 2011AA040103-7), the National Key Scientific Instrument and Equipment Development Project (Grant No. 2012YQ15008703), The Open Project of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial (Grant No. ZC323014100), the Zhejiang Provincial Natural Science Foundation of China (Grant No. LY13F020015), National Science Foundation of China (Grant No. 61104089), Science and Technology Commission of Shanghai Municipality (Grant No. 11JC1404000), Shanghai Rising-Star Program (Grant No. 13QA1401600).

Compliance with ethical standards

Conflict of interest

The four authors declare that they have no conflict of interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

References

  1. Ando S, Thanomphongphan T, Seki Y, Suzuki E (2015) Ensemble anomaly detection from multi-resolution trajectory features. Data Min Knowl Discov 29:39–83MathSciNetCrossRefGoogle Scholar
  2. Angiulli F, Fassetti F (2009) Dolphin: an efficient algorithm for mining distance-based outliers in very large datasets. ACM Trans Knowl Discov Data (TKDD) 3:1–57CrossRefGoogle Scholar
  3. Bifet A, Holmes G, Pfahringer B, Gavald R (2009a) Improving adaptive bagging methods for evolving data streams, advances in machine learning. Springer, Berlin, pp 23–37Google Scholar
  4. Bifet A, Holmes G, Pfahringer B, Kirkby R, Gavald R (2009b) New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 139–148Google Scholar
  5. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140MATHGoogle Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
  7. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM Sigmod Rec 29(2):93–104Google Scholar
  8. Chang WC, Cho CW (2010) Online boosting for vehicle detection. IEEE Trans Syst Man Cybern Part B Cybern 40:892–902CrossRefGoogle Scholar
  9. Di Martino F, Sessa S, Barillari UES, Barillari MR (2014) Spatio-temporal hotspots and application on a disease analysis case via GIS. Soft Comput 18:2377–2384CrossRefGoogle Scholar
  10. Ding Z-G, Du D-J, Fei M-R (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput (IJMSSC) 6:1–22Google Scholar
  11. Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. In: 3rd IFAC conference on intelligent control and automation science, ICONS 2013. IFAC Secretariat, Chengdu, pp 12–17Google Scholar
  12. Daneshpazhouh A, Sami A (2014) Entropy-based outlier detection using semi-supervised approach with few positive examples. Pattern Recognit Lett 49:77–84CrossRefGoogle Scholar
  13. Desir C, Bernard S, Petitjean C, Heutte L (2013) One class random forests. Pattern Recognit 46:3490–3506CrossRefGoogle Scholar
  14. Dietterich TG (1997) Machine-learning research—four current directions. AI Mag 18:97–136Google Scholar
  15. Esmaeili M, Almadan A (2011) Stream data mining and anomaly detection. Int J Comput Appl 34:38–41Google Scholar
  16. Fern A, Givan R (2003) Online ensemble learning: an empirical study. Mach Learn 53:71–109CrossRefMATHGoogle Scholar
  17. Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34:18–26CrossRefMATHGoogle Scholar
  18. Gil P, Santos A, Cardoso A (2014) Dealing with outliers in wireless sensor networks: an oil refinery application. IEEE Trans Control Syst Technol 23:1589–1596Google Scholar
  19. Gomez J, Gil C, Banos R, Marquez AL, Montoya FG, Montoya MG (2013) A Pareto-based multi-objective evolutionary algorithm for automatic rule generation in network intrusion detection systems. Soft Comput 17:255–263CrossRefGoogle Scholar
  20. Gupta M, Gao J, Aggarwal CC, Han JW (2014) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 26:2250–2267CrossRefMATHGoogle Scholar
  21. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, IEEE World congress on computational intelligence. IEEE, pp 1322–1328Google Scholar
  22. He H, Chen S, Li K, Xu X (2011) Incremental learning from stream data. IEEE Trans Neural Netw Learn Syst 22:1901–1914CrossRefGoogle Scholar
  23. Huang C-W, Lin K-P, Wu M-C, Hung K-C, Liu G-S, Jen C-H (2015) Intuitionistic fuzzy c-means clustering algorithm with neighborhood attraction in segmenting medical image. Soft Comput 19:459–470CrossRefGoogle Scholar
  24. Huang H, Yoo S, Qin H, Yu DT (2014) Physics-based anomaly detection defined on manifold space. ACM Trans Knowl Discov Data 9:1–39CrossRefGoogle Scholar
  25. Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8:237–253CrossRefGoogle Scholar
  26. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: a new ensemble method for tracking concept drift. J Mach Learn Res 8:2755–2790MATHGoogle Scholar
  27. Lee YJ, Yeh YR, Wang YCF (2013) Anomaly detection via online oversampling principal component analysis. IEEE Trans Knowl Data Eng 25:1460–1470CrossRefGoogle Scholar
  28. Limthong K, Fukuda K, Ji YS, Yamada S (2014) Unsupervised learning model for real-time anomaly detection in computer networks. IEICE Trans Inf Syst E 97D:2084–2094CrossRefGoogle Scholar
  29. Liu FT, Ting KM, Zhou ZH (2012) Isolation-based anomaly detection. ACM Trans Knowl Discov Data 6:1–39CrossRefGoogle Scholar
  30. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24:619–633CrossRefGoogle Scholar
  31. Moshtaghi M, Havens TC, Bezdek JC, Park L, Leckie C, Rajasegarar S, Keller JM, Palaniswami M (2011) Clustering ellipses for anomaly detection. Pattern Recognit 44:55–69CrossRefMATHGoogle Scholar
  32. Noto K, Brodley C, Slonim D (2012) FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25:109–133MathSciNetCrossRefGoogle Scholar
  33. Oza NC (2005) Online bagging and boosting. In: 2005 IEEE international conference on systems, man and cybernetics. IEEE, pp 2340–2345Google Scholar
  34. O’Reilly C, Gluhak A, Imran MA, Rajasegarar S (2014) Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun Surv Tutor 16:1413–1432Google Scholar
  35. Palshikar GK (2005) Distance-based outliers in sequences. In: Chakraborty G (ed) Distributed computing and internet technology, proceedings. Springer, Berlin, pp 547–552CrossRefGoogle Scholar
  36. Qi ZQ, Xu YT, Wang LS, Song Y (2011) Online multiple instance boosting for object detection. Neurocomputing 74:1769–1775CrossRefGoogle Scholar
  37. Quinn JA, Sugiyama M (2014) A least-squares approach to anomaly detection in static and sequential data. Pattern Recognit Lett 40:36–40CrossRefGoogle Scholar
  38. Sagha H, Bayati H, Mill JDR, Chavarriaga R (2013) On-line anomaly detection and resilience in classifier ensembles. Pattern Recognit Lett 34:1916–1927CrossRefGoogle Scholar
  39. Salem O, Liu YN, Mehaoua A, Boutaba R (2014) Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE J Biomed Health Inform 18:1541–1551CrossRefGoogle Scholar
  40. Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13:1443–1471CrossRefMATHGoogle Scholar
  41. Segui S, Igual L, Vitria J (2013) Bagged one-class classifiers in the presence of outliers. Int J Pattern Recognit Artif Intell 27:1–21CrossRefGoogle Scholar
  42. Serdio F, Lughofer E, Pichler K, Buchegger T, Pichler M, Efendic H (2014) Fault detection in multi-sensor networks based on multivariate time-series models and orthogonal transformations. Inf Fusion 20:272–291CrossRefGoogle Scholar
  43. Subramaniam S, Palpanas T, Papadopoulos D, Kalogeraki V, Gunopulos D (2006) Online outlier detection in sensor data using non-parametric models. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, pp 187–198Google Scholar
  44. Suhailis A, Kadir A, Abu Bakar A, Hamdan AR (2014) Frequent positive and negative (FPN) itemset approach for outlier detection. Intell Data Anal 18:1049–1065Google Scholar
  45. Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press, pp 1511–1516Google Scholar
  46. Ting K, Zhou G-T, Liu F, Tan S (2013) Mass estimation. Mach Learn 90:127–160MathSciNetCrossRefMATHGoogle Scholar
  47. UCI Machine Learning Repository (2007) http://archive.ics.uci.edu/ml/datasets.html
  48. Xie M, Hu J, Han S, Chen H (2012) Scalable hyper-grid k-NN-based online anomaly detection in wireless sensor networks. IEEE Trans Parallel Distrib Syst 24:1661–1670CrossRefGoogle Scholar
  49. Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300Google Scholar
  50. Yang X, Han L, Li Y, He L (2015) A bilateral-truncated-loss based robust support vector machine for classification problems. Soft Comput 19:2871–2882CrossRefMATHGoogle Scholar
  51. Yu X, Tang LA, Han J (2009a) Filtering and refinement: a two-stage approach for efficient and effective anomaly detection. In: ICDM’09. Ninth IEEE international conference data mining. IEEE, pp 617–626Google Scholar
  52. Yu Y, Guo SQ, Lan S, Ban T (2009b) Anomaly intrusion detection for evolving data stream based on semi-supervised learning. Adv Neuro-Inf Process 5506:571–578Google Scholar
  53. Zhang Y, Meratnia N, Havinga P (2010) Outlier detection techniques for wireless sensor networks: a survey. IEEE Commun Surv Tutor 12:159–170CrossRefGoogle Scholar
  54. Zhou XZ, Li SP, Ye Z (2013) A novel system anomaly prediction system based on belief Markov model and ensemble classification. Math Probl Eng 2013:831–842Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Shanghai Key Laboratory of Power Station Automation Technology, School of Mechatronics Engineering and AutomationShanghai UniversityShanghaiChina
  2. 2.College of Mathematics, Physics and Information EngineeringZhejiang Normal UniversityJinhuaChina

Personalised recommendations