Skip to main content
Log in

Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Ever since the internet became part of the everyday lives of humans providing network security has been considered of utmost importance. Over the years lot of time and energy has been devoted by people in the research community and industry to provide better, improved and secure mechanisms to ensure secure communications on the internet. Amongst the many fields of study, the most prominent and ever evolving one has been the study of network traffic for attack detection and mitigation. The advent of new technologies has led to an increase in the pace of network based attacks and therefore novel modified approaches are needed to be able to cope with these latest trends. Distributed machine learning with the development of new tools and frameworks like RDD structure in Apache Spark provides an immense scope of growth in this direction. Moreover, the dynamic nature of present day network traffic called concept drift has also necessitated studying solutions from a different angle. We, therefore, in this paper have worked on distributed machine learning based ensemble techniques to detect the presence of concept drift in network traffic and detect network based attacks. The work has been done in three parts. Firstly, two classifiers, namely, Random Forest and Logistic Regression have been used as level ‘0′ learners and Support Vector Machine has been used as level ‘1′ learner. Secondly, to handle the process of concept drift we have used a sliding window based K-means clustering. And thirdly ensemble based techniques for detection of attacks in the traffic. The experiments have been performed on three datasets, namely, the NSL-KDD dataset, the CIDDS-2017 dataset and generated Testbed dataset. These tests have been conducted on different machines by varying the number of executor cores to study time latency in a distributed environment. An accuracy of 93% on NSL-KDD, 98% on CIDDS-2017 and 97% on Testbed datasets for SVM based blending model have been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Choras, M., Kozik, R., Bruna, M.P.T., Yautsiukhin, A., Churchill, A., Maciejewska, I., Eguinoa, I., Jomni, A.: Comprehensive approach to increase cyber security and resilience. In: 2015 10th International Conference on Availability, Reliability and Security, pp. 686–692 (2015). https://doi.org/10.1109/ARES.2015.30

  2. Zlomislić, V., Fertalj, K., Sruk, V.: Denial of service attacks, defences and research challenges. Clust. Comput. 20(1), 661–671 (2017). https://doi.org/10.1007/s10586-017-0730-x

    Article  Google Scholar 

  3. Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2013). https://doi.org/10.1109/SURV.2013.052213.00046

    Article  Google Scholar 

  4. Jain, M., Kaur, G.: A study of feature reduction techniques and classification for network anomaly detection. J. Comput. Inf. Technol. 27(4), 1–16 (2019). https://doi.org/10.20532/cit.2019.1004591

    Article  Google Scholar 

  5. Yang, C.: Anomaly network traffic detection algorithm based on information entropy measurement under the cloud computing environment. Clust. Comput. 22(4), 8309–8317 (2019). https://doi.org/10.1007/s10586-018-1755-5

    Article  Google Scholar 

  6. Rajendran, R., Kumar, S.S., Palanichamy, Y., Arputharaj, K.: Detection of DoS attacks in cloud networks using intelligent rule based classification system. Clust. Comput. 22(1), 423–434 (2019). https://doi.org/10.1007/s10586-018-2181-4

    Article  Google Scholar 

  7. Satam, P., Alipour, H., Al-Nashif, Y.B., Hariri, S.: Anomaly behavior analysis of DNS protocol. J. Internet Serv. Inf. Secur. 5(4), 85–97 (2015). https://doi.org/10.22667/JISIS.2015.11.31.085

    Article  Google Scholar 

  8. Jaber, A.N., Rehman, S.U.: FCM–SVM based intrusion detection system for cloud computing environment. Clust. Comput. 23, 3221–3231 (2020). https://doi.org/10.1007/s10586-020-03082-6

    Article  Google Scholar 

  9. Bhuvaneswari Amma, N.G., Selvakumar, S.: A statistical class center based triangle area vector method for detection of denial of service attacks. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03120-3

    Article  Google Scholar 

  10. Pacheco, J., Hariri, S.: Anomaly behavior analysis for IoT sensors. Trans. Emerg. Telecommun. Technol. 29(4), e3188 (2018). https://doi.org/10.1002/ett.3188

    Article  Google Scholar 

  11. Stiawan, D., Heryanto, A., Berdadi, A., Rini, D.P., Subroto, I.M.I., Idris, M.Y., Abdullah, A.H., Kerim, B., Budiarto, R.: An approach for optimizing ensemble intrusion detection systems. IEEE Access 9, 6930–6947 (2020). https://doi.org/10.1109/ACCESS.2020.3046246

    Article  Google Scholar 

  12. Prasad, K.M., Reddy, A.R.M., Rao, K.V.: Defad: ensemble classifier for DDoS enabled flood attack defense in distributed network environment. Clust. Comput. 21(4), 1765 (2018). https://doi.org/10.1007/s10586-018-2808-5

    Article  Google Scholar 

  13. Choras, M., Wozniak, M.: Concept drift analysis for improving anomaly detection systems in cybersecurity. In: Central European Cybersecurity Conference, CECC, pp. 35–42 (2017). https://doi.org/10.18690/978-961-286-114-8.3

  14. Karimi, A.M., Niyaz, Q., Sun, W., Javaid, A.Y., Devabhaktuni, V.K.: Distributed network traffic feature extraction for a real-time IDS. In: 2016 IEEE International Conference on Electro Information Technology (EIT), pp. 0522–0526 (2016). https://doi.org/10.1109/EIT.2016.7535295

  15. Kato, K., Klyuev, V.: Development of a network intrusion detection system using Apache Hadoop and Spark. In: 2017 IEEE Conference on Dependable and Secure Computing, pp. 416–423 (2017). https://doi.org/10.1109/DESEC.2017.8073860

  16. Csaba, B.: Processing intrusion data with machine learning and MapReduce. Acad. Appl. Res. Mil. Sci. 16(1), 37–52 (2017). https://folyoirat.ludovika.hu/index.php/aarms/article/view/1612

  17. Apache Spark RDD. https://spark.apache.org/docs/latest/rdd-programming-guide.html. Accessed 3 May 2020

  18. Jain, M., Kaur, G.: A novel distributed semi-supervised approach for detection of network based attacks. In: 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 120–125 (2019). https://doi.org/10.1109/CONFLUENCE.2019.8776616

  19. Resende, P.A.A., Drummond, A.C.: HTTP and contact-based features for Botnet detection. Secur. Priv. 1(5), e41 (2018). https://doi.org/10.1002/spy2.41

    Article  Google Scholar 

  20. Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using Apache Spark. Procedia Comput. Sci. 93, 824–831 (2016). https://doi.org/10.1016/j.procs.2016.07.238

    Article  Google Scholar 

  21. Alnafessah, A., Casale, G.: Artificial neural networks based techniques for anomaly detection in Apache Spark. Clust. Comput. 23(2), 1345–1360 (2020). https://doi.org/10.1007/s10586-019-02998-y

    Article  Google Scholar 

  22. NSL-KDD. http://nsl.cs.unb.ca/NSL-KDD/. Accessed 29 Apr 2020

  23. CIDDS-2017 (2017). https://www.hs-coburg.de/forschung/forschungsprojekte-oeffentlich/informationstechnologie/cidds-coburg-intrusion-detection-data-sets.html. Accessed 25 Apr 2020

  24. Kaur, G., Jain, M.: A comparison of two blending-based ensemble techniques for network anomaly detection in Spark distributed environment. Int. J. Ad Hoc Ubiquitous Comput. 35(2), 71–83 (2020). https://doi.org/10.1504/IJAHUC.2020.109794

    Article  Google Scholar 

  25. Aburomman, A.A., Reaz, M.B.I.: A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 65, 135–152 (2017). https://doi.org/10.1016/j.cose.2016.11.004

    Article  Google Scholar 

  26. Tama, B.A., Comuzzi, M., Rhee, K.H.: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly based intrusion detection system. IEEE Access 7, 94497–94507 (2019). https://doi.org/10.1109/ACCESS.2019.2928048

    Article  Google Scholar 

  27. Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. (2020). https://doi.org/10.1155/2020/4586875

    Article  Google Scholar 

  28. Spinosa, E.J., de Leon, F., de Carvalho, A.P., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 976–980 (2008). https://doi.org/10.1145/1363686.1363912

  29. Wankhade, K.K., Dongre, S.S.: A new adaptive ensemble boosting classifier for concept drifting stream data. Int. J. Model. Opt. 2(4), 493 (2012). https://doi.org/10.7763/IJMO.2012.V2.169

    Article  Google Scholar 

  30. Yuan, X., Wang, R., Zhuang, Y., Zhu, K., Hao, J.: A concept drift based ensemble incremental learning approach for intrusion detection. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 350–357 (2018). https://doi.org/10.1109/Cybermatics_2018.2018.00087

  31. Hameed, S., Ali, U.: HADEC: Hadoop-based live DDoS detection framework. EURASIP J. Inf. Secur. 1, 1–9 (2018). https://doi.org/10.1186/s13635-018-0081-z

    Article  Google Scholar 

  32. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  33. Kozik, R., Choraś, M.: Pattern extraction algorithm for netflow-based botnet activities detection. Secur. Commun. Netw. (2017). https://doi.org/10.1155/2017/6047053

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meenal Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jain, M., Kaur, G. Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data. Cluster Comput 24, 2099–2114 (2021). https://doi.org/10.1007/s10586-021-03249-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-021-03249-9

Keyword

Navigation