Advertisement

A Fast kNN-Based Approach for Time Sensitive Anomaly Detection over Data Streams

  • Guangjun Wu
  • Zhihui Zhao
  • Ge FuEmail author
  • Haiping WangEmail author
  • Yong Wang
  • Zhenyu Wang
  • Junteng Hou
  • Liang Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11537)

Abstract

Anomaly detection is an important data mining method aiming to discover outliers that show significant diversion from their expected behavior. A widely used criteria for determining outliers is based on the number of their neighboring elements, which are referred to as Nearest Neighbors (NN). Existing kNN-based Anomaly Detection (kNN-AD) algorithms cannot detect streaming outliers, which present time sensitive abnormal behavior characteristics in different time intervals. In this paper, we propose a fast kNN-based approach for Time Sensitive Anomaly Detection (kNN-TSAD), which can find outliers that present different behavior characteristics, including normal and abnormal characteristics, within different time intervals. The core idea of our proposal is that we combine the model of sliding window with Locality Sensitive Hashing (LSH) to monitor streaming elements distribution as well as the number of their Nearest Neighbors as time progresses. We use an \(\epsilon \)-approximation scheme to implement the model of sliding window to compute Nearest Neighbors on the fly. We conduct widely experiments to examine our approach for time sensitive anomaly detection using three real-world data sets. The results show that our approach can achieve significant improvement on recall and precision for anomaly detection within different time intervals. Especially, our approach achieves two orders of magnitude improvement on time consumption for streaming anomaly detection, when compared with traditional kNN-based anomaly detection algorithms, such as exact-Storm, approx-Storm, MCOD etc, while it only uses 10% of memory consumption.

Keywords

Anomaly detection Data streams LSH Time sensitive 

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Aggarwal, C.C.: Data Streams: Models and Algorithms, vol. 31. Springer, London (2007).  https://doi.org/10.1007/978-0-387-47534-9CrossRefzbMATHGoogle Scholar
  5. 5.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM Sigmod Record, vol. 30, pp. 37–46. ACM (2001)CrossRefGoogle Scholar
  6. 6.
    Angiulli, F., Fassetti, F.: Detecting distance-based outliers in streams of data. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 811–820. ACM (2007)Google Scholar
  7. 7.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–16. ACM (2002)Google Scholar
  8. 8.
    Cao, L., Yang, D., Wang, Q., Yu, Y., Wang, J., Rundensteiner, E.A.: Scalable distance-based outlier detection over high-volume data streams. In: IEEE 30th International Conference on Data Engineering (ICDE), pp. 76–87. IEEE (2014)Google Scholar
  9. 9.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  10. 10.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM (2002)Google Scholar
  11. 11.
    Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K., Hughes, B.: Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30(2), 16–29 (2010)CrossRefGoogle Scholar
  12. 12.
    Dang, Q.V.: Outlier detection on network flow analysis. arXiv preprint arXiv:1808.02024 (2018)
  13. 13.
    Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 46(4), 44 (2014)CrossRefGoogle Scholar
  14. 14.
    Gibbons, P.B., Tirthapura, S.: Distributed streams algorithms for sliding windows. In: Proceedings of the Fourteenth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 63–72. ACM (2002)Google Scholar
  15. 15.
    Goemans, M.X., Williamson, D.P.: 879-approximation algorithms for MAX CUT and MAX 2SAT. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing, pp. 422–431. ACM (1994)Google Scholar
  16. 16.
    Görnitz, N., Kloft, M., Rieck, K., Brefeld, U.: Toward supervised anomaly detection. J. Artif. Intell. Res. 46, 235–262 (2013)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-662-45171-7CrossRefzbMATHGoogle Scholar
  18. 18.
    Knox, E.M., Ng, R.T.: Algorithms for mining distance based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases, pp. 392–403. Citeseer (1998)Google Scholar
  19. 19.
    Kontaki, M., Gounaris, A., Papadopoulos, A.N., Tsichlas, K., Manolopoulos, Y.: Continuous monitoring of distance-based outliers over data streams. In: IEEE 27th International Conference on Data Engineering (ICDE), pp. 135–146. IEEE (2011)Google Scholar
  20. 20.
    Leung, K., Leckie, C.: Unsupervised anomaly detection in network intrusion detection using clusters. In: Proceedings of the Twenty-Eighth Australasian Conference on Computer Science, vol. 38, pp. 333–342. Australian Computer Society, Inc. (2005)Google Scholar
  21. 21.
    Sadik, M.S., Gruenwald, L.: DBOD-DS: distance based outlier detection for data streams. In: Bringas, P.G., Hameurlain, A., Quirchmayr, G. (eds.) DEXA 2010. LNCS, vol. 6261, pp. 122–136. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15364-8_9CrossRefGoogle Scholar
  22. 22.
    Salehi, M., Rashidi, L.: A survey on anomaly detection in evolving data: [with application to forest fire risk prediction]. ACM SIGKDD Explor. Newslett. 20(1), 13–23 (2018)CrossRefGoogle Scholar
  23. 23.
    Saneja, B., Rani, R.: An efficient approach for outlier detection in big sensor data of health care. Int. J. Commun. Syst. 30(17), e3352 (2017)CrossRefGoogle Scholar
  24. 24.
    Sezari, B., Möller, D.P., Deutschmann, A.: Anomaly-based network intrusion detection model using deep learning in airports. In: 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), pp. 1725–1729. IEEE (2018)Google Scholar
  25. 25.
    Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 394–403. ACM (2006)Google Scholar
  26. 26.
    Viet, H.N., Van, Q.N., Trang, L.L.T., Nathan, S.: Using deep learning model for network scanning detection. In: Proceedings of the 4th International Conference on Frontiers of Educational Technologies, pp. 117–121. ACM (2018)Google Scholar
  27. 27.
    Yang, D., Rundensteiner, E.A., Ward, M.O.: Neighbor-based pattern detection for windows over streaming data. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 529–540. ACM (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Guangjun Wu
    • 1
  • Zhihui Zhao
    • 1
    • 2
  • Ge Fu
    • 3
    Email author
  • Haiping Wang
    • 1
    Email author
  • Yong Wang
    • 1
  • Zhenyu Wang
    • 1
  • Junteng Hou
    • 1
    • 2
  • Liang Huang
    • 3
  1. 1.Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  2. 2.School of Cyber SecurityUniversity of Chinese Academy of SciencesBeijingChina
  3. 3.National Computer Network Emergency Response Technical Team/Coordination, Center of China (CNCERT/CC)BeijingChina

Personalised recommendations