Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Sah, Gulab; Banerjee, Subhasish; Singh, Sweety

doi:10.1007/s10207-022-00616-4

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Regular contribution
Published: 06 October 2022

Volume 22, pages 1–27, (2023)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

931 Accesses
5 Citations
Explore all metrics

Abstract

The intrusion detection system (IDS) plays an important role in extracting and analysing the network traffics to detect aberrant activity. However, emerging technologies, like cloud computing, Internet of Things, etc., generate a large volume of traffics, which may carry the irrelevant attributes that do not have any impact on classification or in detection of assaults. Hence, it’s became an open challenge for the researchers to extract the meaningful data from huge amounts of traffic and also to examine whether the selected features could increase IDS performance or not. To solve these issues, features selection approaches (FSA) have been used in this research to remove non-relevant features and find the important ones. Later, the various classifiers have been used to investigate the best classifier which could increase the performance of IDS’s detection-engine on the NSL-KDD datasets. However, to validate, the investigated best-performing classifier with the suitable features selection technique (FST) has also been implemented on a real-time dataset, i.e. combined CICIDS2017. The experiment results in this research suggest that the acquired subset of relevant features under the proposed model's (Decision Tree + Recursive Feature Elimination) could increase the IDS performance with average accuracy of 99.21% and 99.94% on the well-known NSL-KDD and CICIDS2017 datasets, respectively, and could also minimize the computation cost, in parallel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cybersecurity data science: an overview from machine learning perspective

Article Open access 01 July 2020

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

Article Open access 19 September 2022

A systematic literature review for network intrusion detection system (IDS)

Article 27 March 2023

Availability of data and material

The data and material that support the findings of this study are available from the corresponding author, Subhasish Banerjee, upon reasonable request. This research work utilizes the CICIDS2017 and NSL-KDD datasets which are publicly available online.

Code availability

The data and material that support the findings of this study are available from the corresponding author, Subhasish Banerjee, upon reasonable request.

References

Larson, D.: Distributed denial of service attacks–holding back the flood. Netw. Secur. 2016(3), 5–7 (2016)
Article Google Scholar
Almseidin, M., Alzubi, M., Kovacs, S., Alkasassbeh, M.: Evaluation of machine learning algorithms for intrusion detection system. In: 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000277–000282. IEEE (2017)
Kok, S.H., Abdullah, A., Jhanjhi, N.Z., Supramaniam, M.: A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 12(1), 8–15 (2019)
Google Scholar
Al-Jarrah, O.Y., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 177–181. IEEE (2014)
Thanh, H.N., Van Lang, T.: An approach to reduce data dimension in building effective network intrusion detection systems. EAI Endorsed Trans. Context Aware Syst. Appl. 6(18), 162633 (2019)
Article Google Scholar
Chomboon, K., Chujai, P., Teerarassamee, P., Kerdprasop, K., Kerdprasop, N.: An empirical study of distance metrics for k-nearest neighbor algorithm. In: Proceedings of the 3rd International Conference on Industrial Application Engineering, pp. 280–285 (2015)
Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: a review. Appl. Soft Comput. 10(1), 1–35 (2010)
Article Google Scholar
Mukkamala, S., Sung, A.H.: Feature selection for intrusion detection with neural networks and support vector machines. Transp. Res. Rec. 1822(1), 33–39 (2003)
Article Google Scholar
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5(9), 1531–1555 (2004)
MATH Google Scholar
Chebrolu, S., Abraham, A., Thomas, J.P.: Feature deduction and ensemble design of intrusion detection systems. Comput. Secur. 24(4), 295–307 (2005)
Article Google Scholar
Chou, T.-S., Yen, K.K., Luo, J.: Network intrusion detection design using feature selection of soft computing paradigms. Int. J. Comput. Intell. 4(3), 196–208 (2008)
Google Scholar
Heba, F.E., Darwish, A., Hassanien, A.E., Abraham, A.: Principle components analysis and support vector machine based intrusion detection system. In: 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 363–367. IEEE (2010)
Zainal, A., Maarof, M.A., Shamsuddin, S.M.: Ensemble classifiers for network intrusion detection system. J. Inf. Assur. Secur. 4(3), 217–225 (2009)
Google Scholar
Revathi, S., Malathi, A.: A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. (IJERT) 2(12), 1848–1853 (2013)
Google Scholar
Kim, G., Lee, S., Kim, S.: A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst. Appl. 41(4), 1690–1700 (2014)
Article Google Scholar
Kocher, G., Kumar, G.: Machine learning and deep learning methods for intrusion detection systems: recent developments and challenges. Soft. Comput. 25(15), 9731–9763 (2021)
Article Google Scholar
Jo, S., Sung, H., Ahn, B.: A comparative study on the performance of intrusion detection using decision tree and artificial neural network models. J. Korea Soc. Digit. Ind. Inf. Manag. 11(4), 33–45 (2015)
Google Scholar
Jebur, S.A., Nasereddin, H.O.: Enhanced solutions for misuse network intrusion detection system using sga and ssga. IJCSNS Int. J. Comput. Sci. Netw. Secur. 15(5), 12–18 (2015)
Google Scholar
Mishra, P., Pilli, E.S., Varadharajan, V., Tupakula, U.: PSI-NetVisor: program semantic aware intrusion detection at network and hypervisor layer in cloud. J. Intell. Fuzzy Syst. 32(4), 2909–2921 (2017)
Article Google Scholar
Mousavi, S.M., Majidnezhad, V., Naghipour, A.: A new intelligent intrusion detector based on ensemble of decision trees. J. Ambient Intell. Humaniz. Comput. (2019). https://doi.org/10.1007/s12652-019-01596-5
Article Google Scholar
Sah, G., Banerjee, S.: Feature reduction and classifications techniques for intrusion detection system. In: 2020 International Conference on Communication and Signal Processing (ICCSP), pp. 1543–1547. IEEE (2020)
Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. J. Ambient. Intell. Humaniz. Comput. 12(1), 1249–1266 (2021). https://doi.org/10.1007/s12652-020-02167-9
Article Google Scholar
Gu, J., Shan, Lu.: An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput. Secur. 103, 102158 (2021)
Article Google Scholar
https://www.unb.ca/cic/datasets/nsl.html
Intrusion Detection Evaluation Dataset (CICIDS2017) (2017). https://www.unb.ca/cic/datasets/ids-2017.html
Engelen, G., Rimmer, V., Joosen, W.: Troubleshooting an intrusion detection dataset: the CICIDS2017 case study. In: 2021 IEEE Security and Privacy Workshops (SPW), pp. 7–12. IEEE (2021)
Panigrahi, R., Borah, S.: A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7(3.24), 479–482 (2018)
Google Scholar
Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018). https://doi.org/10.1109/TNNLS.2017.2771290
Article Google Scholar
Moustafa, N., Jiankun, Hu., Slay, J.: A holistic review of network anomaly detection systems: a comprehensive survey. J. Netw. Comput. Appl. 128, 33–55 (2019)
Article Google Scholar
Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 25(1–3), 18–31 (2016)
Article Google Scholar
Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B. Principal component analysis. In: Robust Data Mining, pp. 21–26. Springer, New York, NY (2013)
Saeys, Y., Abeel, T., Van de PeerY.: "Robust feature selection using ensemble feature selection techniques. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 313–325. Springer, Berlin (2008)
Doan, D.M., Jeong, D.H., Ji, S.-Y.: Designing a feature selection technique for analyzing mixed data. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0046–0052. IEEE (2020)
Powell, A., Bates, D., Van Wyk, C., de Abreu, D.: A cross-comparison of feature selection algorithms on multiple cyber security data-sets. In: FAIR, pp. 196–207 (2019)
Chen, X., Jeong, J.C.: Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 429–435. IEEE (2007)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Article MATH Google Scholar
Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)
Article Google Scholar
Alwateer, M., Almars, A.M., Areed, K.N., Elhosseini, M.A., Haikal, A.Y., Badawy, M.: Ambient healthcare approach with hybrid whale optimization algorithm and Naïve Bayes classifier. Sensors 21(13), 4579 (2021)
Article Google Scholar
Sen, P.C., Hajra, M., Ghosh, M.: Supervised classification algorithms in machine learning: a survey and review. In: Emerging Technology in Modelling and Graphics, pp. 99–111. Springer, Singapore (2020)
Chung, Y.Y., Wahid, N.: A hybrid network intrusion detection system using simplified swarm optimization (SSO). Appl. Soft. Comput. 12(9), 3014–3022 (2012)
Article Google Scholar
Espíndola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. WIT Trans. Inf. Commun. Technol. 35 (2005)
Sah, G., Goswami, R.S., Nandi, S.K.: Machine learning methods for predicting the popularity of forthcoming objects. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(2S), 645–652 (2019)
Article Google Scholar
Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 9, 22351–22370 (2021)
Article Google Scholar
Scikit-Learn (2010). http://scikit-learn.org/stable/index.html. Accessed January 2020
Zhang, F., Wang, D.: An effective feature selection approach for network intrusion detection. In: 2013 IEEE Eighth International Conference on Networking, Architecture and Storage, pp. 307–311. IEEE (2013)
Javaid, A., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. 2016 Eai Endorsed Trans. Secur. Saf. 3(9), 21–26 (2015)
Google Scholar
Masarat, S., Sharifian, S., Taheri, H.: Modified parallel random forest for intrusion detection systems. J. Supercomput. 72(6), 2235–2258 (2016)
Article Google Scholar
Ikram, S.T., Cherukuri, A.K.: Improving accuracy of intrusion detection model using PCA and optimized SVM. J. Comput. Inf. Technol. 24(2), 133–148 (2016)
Article Google Scholar
Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 446–452 (2015)
Google Scholar
Jyothsna, V., Rama Prasad, V.V.: FCAAIS: anomaly based network intrusion detection through feature correlation analysis and association impact scale. ICT Express 2(3), 103–116 (2016)
Article Google Scholar
Subba, B., Biswas, S., Karmakar, S.: Enhancing performance of anomaly based intrusion detection systems through dimensionality reduction using principal component analysis. In: 2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2016)
Mohammadi, S., Mirvaziri, H., Ghazizadeh-Ahsaee, M.: Multivariate correlation coefficient and mutual information-based feature selection in intrusion detection. Inf. Secur. J. Glob. Perspect. 26(5), 229–239 (2017)
Article Google Scholar
Chahar, V., Chhikara, R., Gigras, Y., Singh, L.: Significance of hybrid feature selection technique for intrusion detection systems. Indian J. Sci. Technol. 9(48), 1–7 (2017)
Article Google Scholar
Mehmod, T., Md Rais, H.B.: Ant colony optimization and feature selection for intrusion detection. In: Advances in machine learning and signal processing, pp. 305–312. Springer, Cham (2016)
Gurung, S., Ghose, M.K., Subedi, A.: Deep learning approach on network intrusion detection system using NSL-KDD dataset. Int. J. Comput. Netw. Inf. Secur. 11(3), 8–14 (2019)
Google Scholar
Natesan, P., Rajalaxmi, R.R., Gowrison, G., Balasubramanie, P.: Hadoop based parallel binary bat algorithm for network intrusion detection. Int. J. Parallel Prog. 45(5), 1194–1213 (2017)
Article Google Scholar
Lee, J., Kim, J., Kim, I., Han, K.: Cyber threat detection based on artificial neural networks using event profiles. IEEE Access 7, 165607–165626 (2019)
Article Google Scholar
Cepheli, Ö., Büyükçorak, S., Kurt, G.K.: Hybrid intrusion detection system for ddos attacks. J. Electr. Comput. Eng. 2016, 1–8 (2016)
Google Scholar
Ferrag, M.A., Maglaras, L.: DeepCoin: a novel deep learning and blockchain-based energy exchange framework for smart grids. IEEE Trans. Eng. Manag. 67(4), 1285–1297 (2019)
Article Google Scholar
Hosseini, S., Seilani, H.: Anomaly process detection using negative selection algorithm and classification techniques. Evol. Syst. 12(3), 769–778 (2021)
Article Google Scholar
Singh Panwar, S., Raiwani, Y.P., Singh Panwar, L.: "Evaluation of network intrusion detection with features selection and machine learning algorithms on CICIDS-2017 dataset. In: International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India (2019)
Alrowaily, M., Alenezi, F., Lu, Z.: Effectiveness of machine learning based intrusion detection systems. In: International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, pp. 277–288. Springer, Cham (2019)
Abdulrahman, A.A., Ibrahem, M.K.: Evaluation of DDoS attacks detection in a CICIDS2017 dataset based on classification algorithms. Iraqi J. Inf. Commun. Technol. (IJICT) 1(3), 49–55 (2018)
Google Scholar
Chen, L., Gao, S., Liu, B., Zhigang, Lu., Jiang, Z.: FEW-NNN: a fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection. China Commun. 17(5), 151–167 (2020)
Article Google Scholar
Wanjau, S.K., Wambugu, G.M., Kamau, G.N.: SSH-brute force attack detection model based on deep learning (2021)

Download references

Funding

This research did not receive any specific funding, and it is carried out as part of the employment and higher degree of the authors.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology, Arunachal Pradesh, Jote, India
Gulab Sah, Subhasish Banerjee & Sweety Singh

Authors

Gulab Sah
View author publications
You can also search for this author in PubMed Google Scholar
Subhasish Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Sweety Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by GS, SB, and SS. The first draft of the manuscript was written by GS and SB, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Subhasish Banerjee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain any studies with human participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Appendix A: the classifiers parameters setting

The ML classifiers utilized in the experiment are displayed in Table

Table 17 The ML classifiers and their parameters setting

Full size table

17 along with the parameter settings. In addition, based on the experimental results, a decent classifier with FST that perform well on the NSL-KDD dataset have been selected i.e. DT + RFE and have been used to assess the combined CICIDS 2017 dataset. Therefore, the DT classifier and associated parameter settings are provided only for CICIDS2017 dataset in Table 17.

1.2 Appendix B: the analysis of datasets generated at each phase of the experiment

The datasets (NSL-KDD and CICIDS2017) generated by each phase are shown in Table

Table 18 The explanation of CICIDS2017 and NSL-KDD datasets generated at each phase

Full size table

18. Further, Table 18 shows that the number of objects in the standard dataset and the reduction dataset is equal because after applying FSA, the number of attributes (column) in standard dataset is reduced. Consequently, the number of rows will remain the same (objects).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sah, G., Banerjee, S. & Singh, S. Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches. Int. J. Inf. Secur. 22, 1–27 (2023). https://doi.org/10.1007/s10207-022-00616-4

Download citation

Accepted: 14 September 2022
Published: 06 October 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10207-022-00616-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

A systematic literature review for network intrusion detection system (IDS)

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

1.1 Appendix A: the classifiers parameters setting

1.2 Appendix B: the analysis of datasets generated at each phase of the experiment

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

Abstract

Access this article

Similar content being viewed by others

Cybersecurity data science: an overview from machine learning perspective

Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects

A systematic literature review for network intrusion detection system (IDS)

Availability of data and material

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix A: the classifiers parameters setting

1.2 Appendix B: the analysis of datasets generated at each phase of the experiment

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation