Skip to main content
Log in

Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches

  • Regular contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

The intrusion detection system (IDS) plays an important role in extracting and analysing the network traffics to detect aberrant activity. However, emerging technologies, like cloud computing, Internet of Things, etc., generate a large volume of traffics, which may carry the irrelevant attributes that do not have any impact on classification or in detection of assaults. Hence, it’s became an open challenge for the researchers to extract the meaningful data from huge amounts of traffic and also to examine whether the selected features could increase IDS performance or not. To solve these issues, features selection approaches (FSA) have been used in this research to remove non-relevant features and find the important ones. Later, the various classifiers have been used to investigate the best classifier which could increase the performance of IDS’s detection-engine on the NSL-KDD datasets. However, to validate, the investigated best-performing classifier with the suitable features selection technique (FST) has also been implemented on a real-time dataset, i.e. combined CICIDS2017. The experiment results in this research suggest that the acquired subset of relevant features under the proposed model's (Decision Tree + Recursive Feature Elimination) could increase the IDS performance with average accuracy of 99.21% and 99.94% on the well-known NSL-KDD and CICIDS2017 datasets, respectively, and could also minimize the computation cost, in parallel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Availability of data and material

The data and material that support the findings of this study are available from the corresponding author, Subhasish Banerjee, upon reasonable request. This research work utilizes the CICIDS2017 and NSL-KDD datasets which are publicly available online.

Code availability

The data and material that support the findings of this study are available from the corresponding author, Subhasish Banerjee, upon reasonable request.

References

  1. Larson, D.: Distributed denial of service attacks–holding back the flood. Netw. Secur. 2016(3), 5–7 (2016)

    Article  Google Scholar 

  2. Almseidin, M., Alzubi, M., Kovacs, S., Alkasassbeh, M.: Evaluation of machine learning algorithms for intrusion detection system. In: 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), pp. 000277–000282. IEEE (2017)

  3. Kok, S.H., Abdullah, A., Jhanjhi, N.Z., Supramaniam, M.: A review of intrusion detection system using machine learning approach. Int. J. Eng. Res. Technol. 12(1), 8–15 (2019)

    Google Scholar 

  4. Al-Jarrah, O.Y., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 177–181. IEEE (2014)

  5. Thanh, H.N., Van Lang, T.: An approach to reduce data dimension in building effective network intrusion detection systems. EAI Endorsed Trans. Context Aware Syst. Appl. 6(18), 162633 (2019)

    Article  Google Scholar 

  6. Chomboon, K., Chujai, P., Teerarassamee, P., Kerdprasop, K., Kerdprasop, N.: An empirical study of distance metrics for k-nearest neighbor algorithm. In: Proceedings of the 3rd International Conference on Industrial Application Engineering, pp. 280–285 (2015)

  7. Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: a review. Appl. Soft Comput. 10(1), 1–35 (2010)

    Article  Google Scholar 

  8. Mukkamala, S., Sung, A.H.: Feature selection for intrusion detection with neural networks and support vector machines. Transp. Res. Rec. 1822(1), 33–39 (2003)

    Article  Google Scholar 

  9. Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5(9), 1531–1555 (2004)

    MATH  Google Scholar 

  10. Chebrolu, S., Abraham, A., Thomas, J.P.: Feature deduction and ensemble design of intrusion detection systems. Comput. Secur. 24(4), 295–307 (2005)

    Article  Google Scholar 

  11. Chou, T.-S., Yen, K.K., Luo, J.: Network intrusion detection design using feature selection of soft computing paradigms. Int. J. Comput. Intell. 4(3), 196–208 (2008)

    Google Scholar 

  12. Heba, F.E., Darwish, A., Hassanien, A.E., Abraham, A.: Principle components analysis and support vector machine based intrusion detection system. In: 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 363–367. IEEE (2010)

  13. Zainal, A., Maarof, M.A., Shamsuddin, S.M.: Ensemble classifiers for network intrusion detection system. J. Inf. Assur. Secur. 4(3), 217–225 (2009)

    Google Scholar 

  14. Revathi, S., Malathi, A.: A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection. Int. J. Eng. Res. Technol. (IJERT) 2(12), 1848–1853 (2013)

    Google Scholar 

  15. Kim, G., Lee, S., Kim, S.: A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst. Appl. 41(4), 1690–1700 (2014)

    Article  Google Scholar 

  16. Kocher, G., Kumar, G.: Machine learning and deep learning methods for intrusion detection systems: recent developments and challenges. Soft. Comput. 25(15), 9731–9763 (2021)

    Article  Google Scholar 

  17. Jo, S., Sung, H., Ahn, B.: A comparative study on the performance of intrusion detection using decision tree and artificial neural network models. J. Korea Soc. Digit. Ind. Inf. Manag. 11(4), 33–45 (2015)

    Google Scholar 

  18. Jebur, S.A., Nasereddin, H.O.: Enhanced solutions for misuse network intrusion detection system using sga and ssga. IJCSNS Int. J. Comput. Sci. Netw. Secur. 15(5), 12–18 (2015)

    Google Scholar 

  19. Mishra, P., Pilli, E.S., Varadharajan, V., Tupakula, U.: PSI-NetVisor: program semantic aware intrusion detection at network and hypervisor layer in cloud. J. Intell. Fuzzy Syst. 32(4), 2909–2921 (2017)

    Article  Google Scholar 

  20. Mousavi, S.M., Majidnezhad, V., Naghipour, A.: A new intelligent intrusion detector based on ensemble of decision trees. J. Ambient Intell. Humaniz. Comput. (2019). https://doi.org/10.1007/s12652-019-01596-5

    Article  Google Scholar 

  21. Sah, G., Banerjee, S.: Feature reduction and classifications techniques for intrusion detection system. In: 2020 International Conference on Communication and Signal Processing (ICCSP), pp. 1543–1547. IEEE (2020)

  22. Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. J. Ambient. Intell. Humaniz. Comput. 12(1), 1249–1266 (2021). https://doi.org/10.1007/s12652-020-02167-9

    Article  Google Scholar 

  23. Gu, J., Shan, Lu.: An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput. Secur. 103, 102158 (2021)

    Article  Google Scholar 

  24. https://www.unb.ca/cic/datasets/nsl.html

  25. Intrusion Detection Evaluation Dataset (CICIDS2017) (2017). https://www.unb.ca/cic/datasets/ids-2017.html

  26. Engelen, G., Rimmer, V., Joosen, W.: Troubleshooting an intrusion detection dataset: the CICIDS2017 case study. In: 2021 IEEE Security and Privacy Workshops (SPW), pp. 7–12. IEEE (2021)

  27. Panigrahi, R., Borah, S.: A detailed analysis of CICIDS2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7(3.24), 479–482 (2018)

    Google Scholar 

  28. Wang, S., Minku, L.L., Yao, X.: A systematic study of online class imbalance learning with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4802–4821 (2018). https://doi.org/10.1109/TNNLS.2017.2771290

    Article  Google Scholar 

  29. Moustafa, N., Jiankun, Hu., Slay, J.: A holistic review of network anomaly detection systems: a comprehensive survey. J. Netw. Comput. Appl. 128, 33–55 (2019)

    Article  Google Scholar 

  30. Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 25(1–3), 18–31 (2016)

    Article  Google Scholar 

  31. Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B. Principal component analysis. In: Robust Data Mining, pp. 21–26. Springer, New York, NY (2013)

  32. Saeys, Y., Abeel, T., Van de PeerY.: "Robust feature selection using ensemble feature selection techniques. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 313–325. Springer, Berlin (2008)

  33. Doan, D.M., Jeong, D.H., Ji, S.-Y.: Designing a feature selection technique for analyzing mixed data. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0046–0052. IEEE (2020)

  34. Powell, A., Bates, D., Van Wyk, C., de Abreu, D.: A cross-comparison of feature selection algorithms on multiple cyber security data-sets. In: FAIR, pp. 196–207 (2019)

  35. Chen, X., Jeong, J.C.: Enhanced recursive feature elimination. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 429–435. IEEE (2007)

  36. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    Article  MATH  Google Scholar 

  37. Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991)

    Article  Google Scholar 

  38. Alwateer, M., Almars, A.M., Areed, K.N., Elhosseini, M.A., Haikal, A.Y., Badawy, M.: Ambient healthcare approach with hybrid whale optimization algorithm and Naïve Bayes classifier. Sensors 21(13), 4579 (2021)

    Article  Google Scholar 

  39. Sen, P.C., Hajra, M., Ghosh, M.: Supervised classification algorithms in machine learning: a survey and review. In: Emerging Technology in Modelling and Graphics, pp. 99–111. Springer, Singapore (2020)

  40. Chung, Y.Y., Wahid, N.: A hybrid network intrusion detection system using simplified swarm optimization (SSO). Appl. Soft. Comput. 12(9), 3014–3022 (2012)

    Article  Google Scholar 

  41. Espíndola, R.P., Ebecken, N.F.F.: On extending f-measure and g-mean metrics to multi-class problems. WIT Trans. Inf. Commun. Technol. 35 (2005)

  42. Sah, G., Goswami, R.S., Nandi, S.K.: Machine learning methods for predicting the popularity of forthcoming objects. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 9(2S), 645–652 (2019)

    Article  Google Scholar 

  43. Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access 9, 22351–22370 (2021)

    Article  Google Scholar 

  44. Scikit-Learn (2010). http://scikit-learn.org/stable/index.html. Accessed January 2020

  45. Zhang, F., Wang, D.: An effective feature selection approach for network intrusion detection. In: 2013 IEEE Eighth International Conference on Networking, Architecture and Storage, pp. 307–311. IEEE (2013)

  46. Javaid, A., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. 2016 Eai Endorsed Trans. Secur. Saf. 3(9), 21–26 (2015)

    Google Scholar 

  47. Masarat, S., Sharifian, S., Taheri, H.: Modified parallel random forest for intrusion detection systems. J. Supercomput. 72(6), 2235–2258 (2016)

    Article  Google Scholar 

  48. Ikram, S.T., Cherukuri, A.K.: Improving accuracy of intrusion detection model using PCA and optimized SVM. J. Comput. Inf. Technol. 24(2), 133–148 (2016)

    Article  Google Scholar 

  49. Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 446–452 (2015)

    Google Scholar 

  50. Jyothsna, V., Rama Prasad, V.V.: FCAAIS: anomaly based network intrusion detection through feature correlation analysis and association impact scale. ICT Express 2(3), 103–116 (2016)

    Article  Google Scholar 

  51. Subba, B., Biswas, S., Karmakar, S.: Enhancing performance of anomaly based intrusion detection systems through dimensionality reduction using principal component analysis. In: 2016 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), pp. 1–6. IEEE (2016)

  52. Mohammadi, S., Mirvaziri, H., Ghazizadeh-Ahsaee, M.: Multivariate correlation coefficient and mutual information-based feature selection in intrusion detection. Inf. Secur. J. Glob. Perspect. 26(5), 229–239 (2017)

    Article  Google Scholar 

  53. Chahar, V., Chhikara, R., Gigras, Y., Singh, L.: Significance of hybrid feature selection technique for intrusion detection systems. Indian J. Sci. Technol. 9(48), 1–7 (2017)

    Article  Google Scholar 

  54. Mehmod, T., Md Rais, H.B.: Ant colony optimization and feature selection for intrusion detection. In: Advances in machine learning and signal processing, pp. 305–312. Springer, Cham (2016)

  55. Gurung, S., Ghose, M.K., Subedi, A.: Deep learning approach on network intrusion detection system using NSL-KDD dataset. Int. J. Comput. Netw. Inf. Secur. 11(3), 8–14 (2019)

    Google Scholar 

  56. Natesan, P., Rajalaxmi, R.R., Gowrison, G., Balasubramanie, P.: Hadoop based parallel binary bat algorithm for network intrusion detection. Int. J. Parallel Prog. 45(5), 1194–1213 (2017)

    Article  Google Scholar 

  57. Lee, J., Kim, J., Kim, I., Han, K.: Cyber threat detection based on artificial neural networks using event profiles. IEEE Access 7, 165607–165626 (2019)

    Article  Google Scholar 

  58. Cepheli, Ö., Büyükçorak, S., Kurt, G.K.: Hybrid intrusion detection system for ddos attacks. J. Electr. Comput. Eng. 2016, 1–8 (2016)

    Google Scholar 

  59. Ferrag, M.A., Maglaras, L.: DeepCoin: a novel deep learning and blockchain-based energy exchange framework for smart grids. IEEE Trans. Eng. Manag. 67(4), 1285–1297 (2019)

    Article  Google Scholar 

  60. Hosseini, S., Seilani, H.: Anomaly process detection using negative selection algorithm and classification techniques. Evol. Syst. 12(3), 769–778 (2021)

    Article  Google Scholar 

  61. Singh Panwar, S., Raiwani, Y.P., Singh Panwar, L.: "Evaluation of network intrusion detection with features selection and machine learning algorithms on CICIDS-2017 dataset. In: International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India (2019)

  62. Alrowaily, M., Alenezi, F., Lu, Z.: Effectiveness of machine learning based intrusion detection systems. In: International Conference on Security, Privacy and Anonymity in Computation, Communication and Storage, pp. 277–288. Springer, Cham (2019)

  63. Abdulrahman, A.A., Ibrahem, M.K.: Evaluation of DDoS attacks detection in a CICIDS2017 dataset based on classification algorithms. Iraqi J. Inf. Commun. Technol. (IJICT) 1(3), 49–55 (2018)

    Google Scholar 

  64. Chen, L., Gao, S., Liu, B., Zhigang, Lu., Jiang, Z.: FEW-NNN: a fuzzy entropy weighted natural nearest neighbor method for flow-based network traffic attack detection. China Commun. 17(5), 151–167 (2020)

    Article  Google Scholar 

  65. Wanjau, S.K., Wambugu, G.M., Kamau, G.N.: SSH-brute force attack detection model based on deep learning (2021)

Download references

Funding

This research did not receive any specific funding, and it is carried out as part of the employment and higher degree of the authors.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by GS, SB, and SS. The first draft of the manuscript was written by GS and SB, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Subhasish Banerjee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain any studies with human participants.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Appendix A: the classifiers parameters setting

The ML classifiers utilized in the experiment are displayed in Table

Table 17 The ML classifiers and their parameters setting

17 along with the parameter settings. In addition, based on the experimental results, a decent classifier with FST that perform well on the NSL-KDD dataset have been selected i.e. DT + RFE and have been used to assess the combined CICIDS 2017 dataset. Therefore, the DT classifier and associated parameter settings are provided only for CICIDS2017 dataset in Table 17.

1.2 Appendix B: the analysis of datasets generated at each phase of the experiment

The datasets (NSL-KDD and CICIDS2017) generated by each phase are shown in Table

Table 18 The explanation of CICIDS2017 and NSL-KDD datasets generated at each phase

18. Further, Table 18 shows that the number of objects in the standard dataset and the reduction dataset is equal because after applying FSA, the number of attributes (column) in standard dataset is reduced. Consequently, the number of rows will remain the same (objects).

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sah, G., Banerjee, S. & Singh, S. Intrusion detection system over real-time data traffic using machine learning methods with feature selection approaches. Int. J. Inf. Secur. 22, 1–27 (2023). https://doi.org/10.1007/s10207-022-00616-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-022-00616-4

Keywords

Navigation