Skip to main content
Log in

Strategies for data stream mining method applied in anomaly detection

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Anomaly detection, which is a method of intrusion detection, detects anomaly behaviors and protects network security. Data mining technology has been integrated to improve the performance of anomaly detection and some algorithms have been improved for anomaly detection field. We think that most data mining algorithms are analyzed on static data sets and ignore the influence of dynamic data streams. Data stream is the potentially unbounded, ordered sequence of data objects which arrive over time. The entire data objects cannot be stored and they need to be handled in one-time scanning. The data distribution of data stream may change over time and this phenomenon is called concept drift. The properties of data stream make analysis method different from the method based on data set and the analysis model is required to be updated immediately when concept drift occurs. In this paper, we summarize the characteristics of data stream, compare the difference between data stream and data set, discuss the problems of data stream mining and propose some corresponding strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Lee, W., Stolfo, S., Mok, K.: Mining audit data to build intrusion detection models. In: International conference on knowledge discovery & data mining, pp. 66–72 (1998)

  2. Keegan, N., Ji, S.Y., Chaudhary, A., Concolato, C., Yu, B., Jeong, D.H.: A survey of cloud-based network intrusion detection analysis. Hum. Centric Comput. Inf. Sci. 6(1), 19–35 (2016)

    Article  Google Scholar 

  3. Yin, C., Zhang, S., Xi, J., Wang, J.: An improved anonymity model for big data security based on clustering algorithm. Concurr. Comput. 29(7), 1–13 (2017)

    Article  Google Scholar 

  4. Yin, C., Zhang, S.: Parallel implementing improved k-means applied for image retrieval and anomaly detection. Multimed Tools Appl. 76, 1–17 (2017)

    Article  Google Scholar 

  5. Wang, G., Hao, J., Ma, J., Huang, L.: A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert Syst. Appl. 37(9), 6225–6232 (2010)

    Article  Google Scholar 

  6. Li, L., Ye, J., Deng, F., Xiong, S., Zhong, L.: A comparison study of clustering algorithms for microblog posts. Clust. Comput. 19(3), 1333–1345 (2016)

    Article  Google Scholar 

  7. Li, W., Li, X., Yao, M., Jiang, J., Jin, Q.: Personalized fitting recommendation based on support vector regression. Hum. Centric Comput. Inf. Sci. 5(1), 21–32 (2015)

    Article  Google Scholar 

  8. Gu, B., Sun, X., Sheng, V.S.: Structural minimax probability machine. IEEE Trans. Neural Netw. Learn. Syst. 28(7), 1646–1656 (2017)

    Article  MathSciNet  Google Scholar 

  9. Gu, B., Victor, S.S.: A robust regularization path algorithm for ν-support vector classification. IEEE Trans. Neural Netw. Learn. Syst. 28(5), 1241–1248 (2017)

    Article  Google Scholar 

  10. Gu, B., Sheng, V.S., Tay, K.Y., Romano, W., Li, S.: Incremental support vector learning for ordinal regression. IEEE Trans. Neural Netw. Learn. Syst. 26(7), 1403–1416 (2015)

    Article  MathSciNet  Google Scholar 

  11. De la Hoz, E., de la Hoz, E., Ortiz, A., Ortega, J., Martínez-Álvarez, A.: Feature selection by multi-objective optimisation: application to network anomaly detection by hierarchical self-organising maps. Knowl Based Syst. 71, 322–338 (2014)

    Article  Google Scholar 

  12. Yin, C., Zhang, S., Kim, K.J.: Mobile anomaly detection based on improved self-organizing maps. Mob Inf Syst. 2017, 1–9 (2017)

    Google Scholar 

  13. Ma, T., Zhang, Y., Cao, J., Shen, J., Tang, M., Tian, Y., Al-Dhelaan, A., Al-Rodhaan, M.: KDVEM: a k-degree anonymity with vertex and edge modification algorithm. Computing 97(12), 1165–1184 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fu, Z., Ren, K., Shu, J., Sun, X., Huang, F.: Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distr. 27(9), 2546–2559 (2016)

    Article  Google Scholar 

  15. Wang, J., Zhang, Z., Li, B., Lee, S., Sherratt, R.: An enhanced fall detection system for elderly person monitoring using consumer home networks. IEEE Trans. Consum. Electr. 60(1), 23–29 (2014)

    Article  Google Scholar 

  16. Younghee, K., Wonyoung, K., Ungmo, K.: Mining frequent itemsets with normalized weight in continuous data streams. J. Inform. Process. Syst. 6(1), 79–90 (2010)

    Article  Google Scholar 

  17. Fong, S., Hang, Y., Mohammed, S., Fiaidhi, J.: Stream-based biomedical classification algorithms for analyzing biosignals. J. Inform. Process. Syst. 7(4), 717 (2011)

    Article  Google Scholar 

  18. El-Semary, A.M., Mostafa, G.H.M.: Distributed and scalable intrusion detection system based on agents and intelligent techniques. J. Inform. Process. Syst. 6(4), 481–500 (2010)

    Article  Google Scholar 

  19. Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woźniak, M.: Ensemble learning for data stream analysis: a survey. Inform. Fusion. 37, 132–156 (2017)

    Article  Google Scholar 

  20. Domingos, P., Hulten, G.: Mining high-speed data streams. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp. 71–80 (2000)

  21. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod. Rec. 29(2), 1–12 (2000)

    Article  Google Scholar 

  22. Czarnowski, I., Jędrzejowicz, P.: Ensemble online classifier based on the one-class base classifiers for mining data streams. Cybern. Syst. 46(1–2), 51–68 (2015)

    Article  Google Scholar 

  23. Gaur, M.S., Pant, B.: Trusted and secure clustering in mobile pervasive environment. Hum. Centric Comput. Inf. Sci. 5(1), 1–17 (2015)

    Article  Google Scholar 

  24. Guha, S., Meyerson, A., Mishra, N., Motwani, R.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  25. Aggarwal, C., Yu, P., Han, J., Wang, J.: A framework for clustering evolving data streams. In: International conference on very large data bases, pp. 81–92 (2003)

  26. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 133–142 (2007)

  27. Ramírez-Gallego, S., Krawczyk, B., García, S., Woźniak, M., Herrera, F.: A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239(C), 39–57 (2017)

    Article  Google Scholar 

  28. Oh, S., Kang, S., Byun, Y., Jeong, T., Lee, W.: Anomaly intrusion detection based on clustering a data stream. In: ACIS international conference on software engineering research, management and applications, pp. 220–227 (2005)

  29. Guerrieri, A., Montresor, A.: DS-means: distributed data stream clustering. In: International conference on parallel processing, pp. 260–271 (2012)

  30. Yin, C., Zhang, S., Yin, Z., Wang, J.: Anomaly detection model based on data stream clustering. Clust. Comput. 2017, 1–10 (2017)

    Google Scholar 

  31. Yin, C., Zhang, S., Wang, J.: Improved data stream clustering algorithm for anomaly detection. Adv. Multimed. Ubiquitous Eng. 448, 620–625 (2017)

    Article  Google Scholar 

  32. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 97–106 (2001)

  33. Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: ACM SigkDD international conference on knowledge discovery & data mining, pp. 523–528 (2003)

  34. Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach Learn. 106(9–10), 1469–1495 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  35. Pietruczuk, L., Rutkowski, L., Jaworski, M., Duda, P.: How to adjust an ensemble size in stream data mining? Inform. Sci. 381, 46–54 (2017)

    Article  MathSciNet  Google Scholar 

  36. Silva, J., Faria, E., Barros, R., Hruschka, E.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 125–134 (2013)

    Article  MATH  Google Scholar 

  37. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM international conference on data mining, pp. 328–339 (2006)

  38. Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: evolution-based technique for stream clustering. In: International conference on advanced data mining and applications, pp. 605–615 (2007)

  39. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44 (2014)

    Article  MATH  Google Scholar 

  40. Laohakiat, S., Phimoltares, S., Lursinsap, C.: A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction. Inform. Sci. 381, 104–123 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by the National Natural Science Foundation of China (61772282, 61772454, 61373134, 61402234). It was also supported by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0901) and Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET). It was also funded by the open research fund of Key Lab of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education. Professor Seungwook Min is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seungwook Min.

Ethics declarations

Conflict of interest

Ruxia Sun declares that she has no conflict of interest. Sun Zhang declares that he has no conflict of interest. Chunyong Yin declares that he has no conflict of interest. Jin Wang declares that he has no conflict of interest. Seungwook Min declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, R., Zhang, S., Yin, C. et al. Strategies for data stream mining method applied in anomaly detection. Cluster Comput 22, 399–408 (2019). https://doi.org/10.1007/s10586-018-2835-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-2835-2

Keywords

Navigation