Skip to main content
Log in

A tree-based stacking ensemble technique with feature selection for network intrusion detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Several studies have used machine learning algorithms to develop intrusion systems (IDS), which differentiate anomalous behaviours from the normal activities of network systems. Due to the ease of automated data collection and subsequently an increased size of collected data on network traffic and activities, the complexity of intrusion analysis is increasing exponentially. A particular issue, due to statistical and computation limitations, a single classifier may not perform well for large scale data as existent in modern IDS contexts. Ensemble methods have been explored in literature in such big data contexts. Although more complicated and requiring additional computation, literature has a note that ensemble methods can result in better accuracy than single classifiers in different large scale data classification contexts, and it is interesting to explore how ensemble approaches can perform in IDS. In this research, we introduce a tree-based stacking ensemble technique (SET) and test the effectiveness of the proposed model on two intrusion datasets (NSL-KDD and UNSW-NB15). We further enhance incorporate feature selection techniques to select the best relevant features with the proposed SET. A comprehensive performance analysis shows that our proposed model can better identify the normal and anomaly traffic in network than other existing IDS models. This implies the potentials of our proposed system for cybersecurity in Internet of Things (IoT) and large scale networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A (2020) Cybersecurity data science: an overview from machine learning perspective. J Big Data 7(1):1–29

    Article  Google Scholar 

  2. Av-test institute, germany, https://www.av-test.org/en/statistics/malware/. Accessed 19 Jan 2021

  3. Juniper research. https://www.juniperresearch.com/. White paper: Cybercrime & the Internet of Threats 2019. Accessed on 19 Jan 2021

  4. Rashid M, Kamruzzaman J, Ahmed M, Islam N, Wibowo S, Gordon S (2020) performance enhancement of intrusion detection system using bagging ensemble technique with feature selection, 7thieee asia-pacific conference on computer science and data engineering 16-18 December, vol 2020. Gold Coast, Australia

    Google Scholar 

  5. Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: A review. Expert Syst. Appl. 36:11994–12000

    Article  Google Scholar 

  6. Buczak AL, Guven E (2015) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutor 18:1153–1176

    Article  Google Scholar 

  7. Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381

    Article  Google Scholar 

  8. Sommer R, Paxson V (2010) Outside the closed world: On using machine learning for network intrusion detection. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy, Berkeley/Oakland, CA USA, 16–19 May, 2010, pp 305–316

  9. Garg A, Maheshwari P (2016) A hybrid intrusion detection system: A review, 10th International Conference on Intelligent Systems and Control (ISCO), pp 1-5

  10. Biswas SK (2018) Intrusion detection using machine learning: a comparison study. Int J Pure Appl Math 118(19):101–114

    Google Scholar 

  11. Saxena AK, Sinha S, Shukla P (2017) General study of intrusion detection system and survey of agent-based intrusion detection system, 2017. International Conference on Computing Communication and Automation (ICCCA), pp 471–421

  12. Sarker IH, Abushark YB, Alsolami F, Khan AI (2020) IntruDTree: A Machine Learning Based Cyber Security Intrusion Detection Model. Symmetry 12(5):754

    Article  Google Scholar 

  13. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Machine Intell 12(10):993–1001

    Article  Google Scholar 

  14. Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HFM (2018) Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp 1–6

  15. Panigrah A, Patra MR (2016) Fuzzy rough classification models for network intrusion detection. Trans Machine Learn Artif Intell 4(2):07–07

    Article  Google Scholar 

  16. Panigrahi A, Patra M (2019) Anomaly based network intrusion detection using bayes net classifiers. Int J Scientif Technol Res 8(9):481–485

    Google Scholar 

  17. Tama BA, Comuzzi M, Rhee KH (2019) TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7:94497–94507

    Article  Google Scholar 

  18. Tama BA, Rhee KH (2017) An extensive empirical evaluation of classifier ensembles for intrusion detection task. Comput Syst Sci Eng 32(2):149–158

    Google Scholar 

  19. Smitha R, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Hindawi security and communication networks 1–9

  20. Paulauskas N, Auskalnis J (2017) Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset. In: 2017 open conference of electrical, electronic and information sciences (eStream), pp 1–5

  21. Moustafa N, Turnbull B, Choo KKR (2019) An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet of Things Journal 6(3):4815–4830

    Article  Google Scholar 

  22. Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with ig-pca and ensemble classifier for network intrusion detection. Comput Netw 148:164–175

    Article  Google Scholar 

  23. Zhou Y, Cheng G, Jiang S, Dai M (2020) Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput Netw p 107247

  24. Yang XS (2010) A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010). ISBN 978-3-642-24094-2. Springer, Madrid, pp 65–74

  25. Rashid MM, Kamruzzaman J, Hassan MM, Imam T, Gordon S (2020) Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques. International Journal of Environmental Research and Public Health 17 (24): 9347

    Article  Google Scholar 

  26. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, pp 1–6

  27. NSL-KDD dataset. Available on http://www.unb.ca/cic/research/datasets/

  28. Moustafa N, Slay J (2016) The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf Secur J A Glob Perspectvol 25:18–31

    Article  Google Scholar 

  29. Moustafa N (2017) Reliable statistical anomaly detection framework for dealing with large High-Speed network traffic. Ph.D. thesis designing an online university of new south wales. Canberra, Australia

    Google Scholar 

  30. Scikit-Learn Developers. Available online: sklearn.preprocessing. LabelEncoder accessed on 10 June 2020 (2020)

  31. Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1:111–117

    Google Scholar 

  32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. J Machine Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  33. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001

    Article  Google Scholar 

  34. Wolpert DH (1992) Stacked generalization. Neural Netw. 5:241–259

    Article  Google Scholar 

  35. Bansal A, Kaur S (2018) Extreme gradient boosting based tuning for classification in intrusion detection systems, International Conference on Advances in Computing and Data Sciences. Springer, Berlin, pp 372–380

    Google Scholar 

  36. Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HFH (2018) Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp 1–6

  37. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J Machine Learn Res 15(90):3133–3181. Accessed: Mar. 21, 2021. [Online]. Available: http://jmlr.org/papers/v15/delgado14a.html

    MathSciNet  MATH  Google Scholar 

  38. Esmaily H, Tayefi M, Doosti H, Ghayour-Mobarhan M, Nezami H, Amirabadizadeh A (2018) A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes. J Res Health Sci 18(2):412. Accessed: Mar. 21, 2021. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204421/

    Google Scholar 

  39. Ali J, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comput Sci Issues 9(5):272–27

    Google Scholar 

  40. Berhane TM, et al. (2018) Decision-Tree, Rule-Based, And random forest classification of High-Resolution multispectral imagery for wetland mapping and inventory. Remote Sens (Basel) 10(4):580. https://doi.org/10.3390/rs10040580

    Article  Google Scholar 

  41. Prajwala TR (2015) A comparative study on decision tree and random forest using r tool. IJARCCE 4(1):196–199. https://doi.org/10.17148/IJARCCE.2015.4142

    Google Scholar 

  42. Chen T, Guestrin C (2016) XGBOost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785

  43. Dhaliwal SS, Nahid A, Abbas R (2018) Effective Intrusion Detection System Using XGBoost, Information, vol. 9, no. 7. Art. no. 7. https://doi.org/10.3390/info9070149

  44. Chen Z, Jiang F, Cheng Y, Gu X, Liu W, Peng J (2018) XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud. In: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp 251–256. https://doi.org/10.1109/BigComp.2018.00044

  45. Law A, et al. (2020) Secure Collaborative Training and Inference for XGBoost. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, New York, NY, USA, pp 21–26. https://doi.org/10.1145/3411501.3419420

  46. Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset. IEEE Commun Surv Tutor 18:184–208

    Article  Google Scholar 

  47. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic characterization,ICISSP, pp. 108–116, Jan 22–24. Funchal, Portugal

    Google Scholar 

  48. Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl 148:113–249

    Article  Google Scholar 

  49. Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374

    Article  Google Scholar 

  50. Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statistical analysis of honeypot data and building of kyoto 2006+ dataset for nids evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, ACM (2011), pp 29–36

  51. The-NIMS-Dataset, Available: https://projects.cs.dal.ca/projectx/Download.html

  52. Mahfouz A, Abuhussein A, Venugopal D, Shiva S (2020) Ensemble classifiers for network intrusion detection using a novel network attack dataset. Future Internet 12(11):180

    Article  Google Scholar 

  53. Taneja M, Davy A (2017) Resource aware placement of IoT application modules in Fog-Cloud Computing Paradigm. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), May, 2017, pp 1222–1228. https://doi.org/10.23919/INM.2017.7987464

  54. Chao LW, Shih-Wen K, Chih-Fon T (Jan. 2017) 10 data mining techniques in business applications:brief survey. Kybernetes 46(7):1158–1170. https://doi.org/10.1108/K-10-2016-0302

    Article  Google Scholar 

  55. Noor U, Anwar Z, Amjad T, Choo K-KR (2019) A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise. Futur Gener Comput Syst 96:227–242. https://doi.org/10.1016/j.future.2019.02.013

    Article  Google Scholar 

  56. Džeroski S, ženko B. (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3):255–273

    Article  Google Scholar 

  57. Wilcoxon Rank-Sum Test, https://www.stat.auckland.ac.nz/wild/ChanceEnc/Ch10.wilcoxon.pdf

  58. Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Series 1168(2):022022. IOP Publishing

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mamunur Rashid.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rashid, M., Kamruzzaman, J., Imam, T. et al. A tree-based stacking ensemble technique with feature selection for network intrusion detection. Appl Intell 52, 9768–9781 (2022). https://doi.org/10.1007/s10489-021-02968-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02968-1

Keywords

Navigation