Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

Jain, Meenal; Kaur, Gagandeep

doi:10.1007/s10586-021-03249-9

Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

Published: 15 February 2021

Volume 24, pages 2099–2114, (2021)
Cite this article

Cluster Computing Aims and scope Submit manuscript

1119 Accesses
24 Citations
Explore all metrics

Abstract

Ever since the internet became part of the everyday lives of humans providing network security has been considered of utmost importance. Over the years lot of time and energy has been devoted by people in the research community and industry to provide better, improved and secure mechanisms to ensure secure communications on the internet. Amongst the many fields of study, the most prominent and ever evolving one has been the study of network traffic for attack detection and mitigation. The advent of new technologies has led to an increase in the pace of network based attacks and therefore novel modified approaches are needed to be able to cope with these latest trends. Distributed machine learning with the development of new tools and frameworks like RDD structure in Apache Spark provides an immense scope of growth in this direction. Moreover, the dynamic nature of present day network traffic called concept drift has also necessitated studying solutions from a different angle. We, therefore, in this paper have worked on distributed machine learning based ensemble techniques to detect the presence of concept drift in network traffic and detect network based attacks. The work has been done in three parts. Firstly, two classifiers, namely, Random Forest and Logistic Regression have been used as level ‘0′ learners and Support Vector Machine has been used as level ‘1′ learner. Secondly, to handle the process of concept drift we have used a sliding window based K-means clustering. And thirdly ensemble based techniques for detection of attacks in the traffic. The experiments have been performed on three datasets, namely, the NSL-KDD dataset, the CIDDS-2017 dataset and generated Testbed dataset. These tests have been conducted on different machines by varying the number of executor cores to study time latency in a distributed environment. An accuracy of 93% on NSL-KDD, 98% on CIDDS-2017 and 97% on Testbed datasets for SVM based blending model have been achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Identifying the most accurate machine learning classification technique to detect network threats

Article Open access 05 March 2024

Research trends in deep learning and machine learning for cloud computing security

Article Open access 02 May 2024

Review: machine learning techniques applied to cybersecurity

Article 04 January 2019

References

Choras, M., Kozik, R., Bruna, M.P.T., Yautsiukhin, A., Churchill, A., Maciejewska, I., Eguinoa, I., Jomni, A.: Comprehensive approach to increase cyber security and resilience. In: 2015 10th International Conference on Availability, Reliability and Security, pp. 686–692 (2015). https://doi.org/10.1109/ARES.2015.30
Zlomislić, V., Fertalj, K., Sruk, V.: Denial of service attacks, defences and research challenges. Clust. Comput. 20(1), 661–671 (2017). https://doi.org/10.1007/s10586-017-0730-x
Article Google Scholar
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K.: Network anomaly detection: methods, systems and tools. IEEE Commun. Surv. Tutor. 16(1), 303–336 (2013). https://doi.org/10.1109/SURV.2013.052213.00046
Article Google Scholar
Jain, M., Kaur, G.: A study of feature reduction techniques and classification for network anomaly detection. J. Comput. Inf. Technol. 27(4), 1–16 (2019). https://doi.org/10.20532/cit.2019.1004591
Article Google Scholar
Yang, C.: Anomaly network traffic detection algorithm based on information entropy measurement under the cloud computing environment. Clust. Comput. 22(4), 8309–8317 (2019). https://doi.org/10.1007/s10586-018-1755-5
Article Google Scholar
Rajendran, R., Kumar, S.S., Palanichamy, Y., Arputharaj, K.: Detection of DoS attacks in cloud networks using intelligent rule based classification system. Clust. Comput. 22(1), 423–434 (2019). https://doi.org/10.1007/s10586-018-2181-4
Article Google Scholar
Satam, P., Alipour, H., Al-Nashif, Y.B., Hariri, S.: Anomaly behavior analysis of DNS protocol. J. Internet Serv. Inf. Secur. 5(4), 85–97 (2015). https://doi.org/10.22667/JISIS.2015.11.31.085
Article Google Scholar
Jaber, A.N., Rehman, S.U.: FCM–SVM based intrusion detection system for cloud computing environment. Clust. Comput. 23, 3221–3231 (2020). https://doi.org/10.1007/s10586-020-03082-6
Article Google Scholar
Bhuvaneswari Amma, N.G., Selvakumar, S.: A statistical class center based triangle area vector method for detection of denial of service attacks. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03120-3
Article Google Scholar
Pacheco, J., Hariri, S.: Anomaly behavior analysis for IoT sensors. Trans. Emerg. Telecommun. Technol. 29(4), e3188 (2018). https://doi.org/10.1002/ett.3188
Article Google Scholar
Stiawan, D., Heryanto, A., Berdadi, A., Rini, D.P., Subroto, I.M.I., Idris, M.Y., Abdullah, A.H., Kerim, B., Budiarto, R.: An approach for optimizing ensemble intrusion detection systems. IEEE Access 9, 6930–6947 (2020). https://doi.org/10.1109/ACCESS.2020.3046246
Article Google Scholar
Prasad, K.M., Reddy, A.R.M., Rao, K.V.: Defad: ensemble classifier for DDoS enabled flood attack defense in distributed network environment. Clust. Comput. 21(4), 1765 (2018). https://doi.org/10.1007/s10586-018-2808-5
Article Google Scholar
Choras, M., Wozniak, M.: Concept drift analysis for improving anomaly detection systems in cybersecurity. In: Central European Cybersecurity Conference, CECC, pp. 35–42 (2017). https://doi.org/10.18690/978-961-286-114-8.3
Karimi, A.M., Niyaz, Q., Sun, W., Javaid, A.Y., Devabhaktuni, V.K.: Distributed network traffic feature extraction for a real-time IDS. In: 2016 IEEE International Conference on Electro Information Technology (EIT), pp. 0522–0526 (2016). https://doi.org/10.1109/EIT.2016.7535295
Kato, K., Klyuev, V.: Development of a network intrusion detection system using Apache Hadoop and Spark. In: 2017 IEEE Conference on Dependable and Secure Computing, pp. 416–423 (2017). https://doi.org/10.1109/DESEC.2017.8073860
Csaba, B.: Processing intrusion data with machine learning and MapReduce. Acad. Appl. Res. Mil. Sci. 16(1), 37–52 (2017). https://folyoirat.ludovika.hu/index.php/aarms/article/view/1612
Apache Spark RDD. https://spark.apache.org/docs/latest/rdd-programming-guide.html. Accessed 3 May 2020
Jain, M., Kaur, G.: A novel distributed semi-supervised approach for detection of network based attacks. In: 2019 9th International Conference on Cloud Computing, Data Science and Engineering (Confluence), pp. 120–125 (2019). https://doi.org/10.1109/CONFLUENCE.2019.8776616
Resende, P.A.A., Drummond, A.C.: HTTP and contact-based features for Botnet detection. Secur. Priv. 1(5), e41 (2018). https://doi.org/10.1002/spy2.41
Article Google Scholar
Gupta, G.P., Kulariya, M.: A framework for fast and efficient cyber security network intrusion detection using Apache Spark. Procedia Comput. Sci. 93, 824–831 (2016). https://doi.org/10.1016/j.procs.2016.07.238
Article Google Scholar
Alnafessah, A., Casale, G.: Artificial neural networks based techniques for anomaly detection in Apache Spark. Clust. Comput. 23(2), 1345–1360 (2020). https://doi.org/10.1007/s10586-019-02998-y
Article Google Scholar
NSL-KDD. http://nsl.cs.unb.ca/NSL-KDD/. Accessed 29 Apr 2020
CIDDS-2017 (2017). https://www.hs-coburg.de/forschung/forschungsprojekte-oeffentlich/informationstechnologie/cidds-coburg-intrusion-detection-data-sets.html. Accessed 25 Apr 2020
Kaur, G., Jain, M.: A comparison of two blending-based ensemble techniques for network anomaly detection in Spark distributed environment. Int. J. Ad Hoc Ubiquitous Comput. 35(2), 71–83 (2020). https://doi.org/10.1504/IJAHUC.2020.109794
Article Google Scholar
Aburomman, A.A., Reaz, M.B.I.: A survey of intrusion detection systems based on ensemble and hybrid classifiers. Comput. Secur. 65, 135–152 (2017). https://doi.org/10.1016/j.cose.2016.11.004
Article Google Scholar
Tama, B.A., Comuzzi, M., Rhee, K.H.: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly based intrusion detection system. IEEE Access 7, 94497–94507 (2019). https://doi.org/10.1109/ACCESS.2019.2928048
Article Google Scholar
Rajagopal, S., Kundapur, P.P., Hareesha, K.S.: A stacking ensemble for network intrusion detection using heterogeneous datasets. Secur. Commun. Netw. (2020). https://doi.org/10.1155/2020/4586875
Article Google Scholar
Spinosa, E.J., de Leon, F., de Carvalho, A.P., Gama, J.: Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 976–980 (2008). https://doi.org/10.1145/1363686.1363912
Wankhade, K.K., Dongre, S.S.: A new adaptive ensemble boosting classifier for concept drifting stream data. Int. J. Model. Opt. 2(4), 493 (2012). https://doi.org/10.7763/IJMO.2012.V2.169
Article Google Scholar
Yuan, X., Wang, R., Zhuang, Y., Zhu, K., Hao, J.: A concept drift based ensemble incremental learning approach for intrusion detection. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 350–357 (2018). https://doi.org/10.1109/Cybermatics_2018.2018.00087
Hameed, S., Ali, U.: HADEC: Hadoop-based live DDoS detection framework. EURASIP J. Inf. Secur. 1, 1–9 (2018). https://doi.org/10.1186/s13635-018-0081-z
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Article Google Scholar
Kozik, R., Choraś, M.: Pattern extraction algorithm for netflow-based botnet activities detection. Secur. Commun. Netw. (2017). https://doi.org/10.1155/2017/6047053
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Information Technology, Jaypee Institute of Information Technology, Noida Sector 62, Noida, 201309, India
Meenal Jain & Gagandeep Kaur

Authors

Meenal Jain
View author publications
You can also search for this author in PubMed Google Scholar
Gagandeep Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meenal Jain.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, M., Kaur, G. Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data. Cluster Comput 24, 2099–2114 (2021). https://doi.org/10.1007/s10586-021-03249-9

Download citation

Received: 24 February 2020
Revised: 26 January 2021
Accepted: 31 January 2021
Published: 15 February 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s10586-021-03249-9

Keyword

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

Abstract

Access this article

Similar content being viewed by others

Identifying the most accurate machine learning classification technique to detect network threats

Research trends in deep learning and machine learning for cloud computing security

Review: machine learning techniques applied to cybersecurity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keyword

Navigation

Distributed anomaly detection using concept drift detection based hybrid ensemble techniques in streamed network data

Abstract

Access this article

Similar content being viewed by others

Identifying the most accurate machine learning classification technique to detect network threats

Research trends in deep learning and machine learning for cloud computing security

Review: machine learning techniques applied to cybersecurity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keyword

Search

Navigation