Abstract
Several studies have used machine learning algorithms to develop intrusion systems (IDS), which differentiate anomalous behaviours from the normal activities of network systems. Due to the ease of automated data collection and subsequently an increased size of collected data on network traffic and activities, the complexity of intrusion analysis is increasing exponentially. A particular issue, due to statistical and computation limitations, a single classifier may not perform well for large scale data as existent in modern IDS contexts. Ensemble methods have been explored in literature in such big data contexts. Although more complicated and requiring additional computation, literature has a note that ensemble methods can result in better accuracy than single classifiers in different large scale data classification contexts, and it is interesting to explore how ensemble approaches can perform in IDS. In this research, we introduce a tree-based stacking ensemble technique (SET) and test the effectiveness of the proposed model on two intrusion datasets (NSL-KDD and UNSW-NB15). We further enhance incorporate feature selection techniques to select the best relevant features with the proposed SET. A comprehensive performance analysis shows that our proposed model can better identify the normal and anomaly traffic in network than other existing IDS models. This implies the potentials of our proposed system for cybersecurity in Internet of Things (IoT) and large scale networks.
Similar content being viewed by others
References
Sarker IH, Kayes ASM, Badsha S, Alqahtani H, Watters P, Ng A (2020) Cybersecurity data science: an overview from machine learning perspective. J Big Data 7(1):1–29
Av-test institute, germany, https://www.av-test.org/en/statistics/malware/. Accessed 19 Jan 2021
Juniper research. https://www.juniperresearch.com/. White paper: Cybercrime & the Internet of Threats 2019. Accessed on 19 Jan 2021
Rashid M, Kamruzzaman J, Ahmed M, Islam N, Wibowo S, Gordon S (2020) performance enhancement of intrusion detection system using bagging ensemble technique with feature selection, 7thieee asia-pacific conference on computer science and data engineering 16-18 December, vol 2020. Gold Coast, Australia
Tsai CF, Hsu YF, Lin CY, Lin WY (2009) Intrusion detection by machine learning: A review. Expert Syst. Appl. 36:11994–12000
Buczak AL, Guven E (2015) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun. Surv. Tutor 18:1153–1176
Xin Y, Kong L, Liu Z, Chen Y, Li Y, Zhu H, Gao M, Hou H, Wang C (2018) Machine learning and deep learning methods for cybersecurity. IEEE Access 6:35365–35381
Sommer R, Paxson V (2010) Outside the closed world: On using machine learning for network intrusion detection. In: Proceedings of the 2010 IEEE Symposium on Security and Privacy, Berkeley/Oakland, CA USA, 16–19 May, 2010, pp 305–316
Garg A, Maheshwari P (2016) A hybrid intrusion detection system: A review, 10th International Conference on Intelligent Systems and Control (ISCO), pp 1-5
Biswas SK (2018) Intrusion detection using machine learning: a comparison study. Int J Pure Appl Math 118(19):101–114
Saxena AK, Sinha S, Shukla P (2017) General study of intrusion detection system and survey of agent-based intrusion detection system, 2017. International Conference on Computing Communication and Automation (ICCCA), pp 471–421
Sarker IH, Abushark YB, Alsolami F, Khan AI (2020) IntruDTree: A Machine Learning Based Cyber Security Intrusion Detection Model. Symmetry 12(5):754
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Machine Intell 12(10):993–1001
Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HFM (2018) Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp 1–6
Panigrah A, Patra MR (2016) Fuzzy rough classification models for network intrusion detection. Trans Machine Learn Artif Intell 4(2):07–07
Panigrahi A, Patra M (2019) Anomaly based network intrusion detection using bayes net classifiers. Int J Scientif Technol Res 8(9):481–485
Tama BA, Comuzzi M, Rhee KH (2019) TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7:94497–94507
Tama BA, Rhee KH (2017) An extensive empirical evaluation of classifier ensembles for intrusion detection task. Comput Syst Sci Eng 32(2):149–158
Smitha R, Kundapur PP, Hareesha KS (2020) A stacking ensemble for network intrusion detection using heterogeneous datasets. Hindawi security and communication networks 1–9
Paulauskas N, Auskalnis J (2017) Analysis of data pre-processing influence on intrusion detection using NSL-KDD dataset. In: 2017 open conference of electrical, electronic and information sciences (eStream), pp 1–5
Moustafa N, Turnbull B, Choo KKR (2019) An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of internet of things. IEEE Internet of Things Journal 6(3):4815–4830
Salo F, Nassif AB, Essex A (2019) Dimensionality reduction with ig-pca and ensemble classifier for network intrusion detection. Comput Netw 148:164–175
Zhou Y, Cheng G, Jiang S, Dai M (2020) Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput Netw p 107247
Yang XS (2010) A new metaheuristic bat-inspired algorithm. In: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010). ISBN 978-3-642-24094-2. Springer, Madrid, pp 65–74
Rashid MM, Kamruzzaman J, Hassan MM, Imam T, Gordon S (2020) Cyberattacks Detection in IoT-Based Smart City Applications Using Machine Learning Techniques. International Journal of Environmental Research and Public Health 17 (24): 9347
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense applications, pp 1–6
NSL-KDD dataset. Available on http://www.unb.ca/cic/research/datasets/
Moustafa N, Slay J (2016) The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf Secur J A Glob Perspectvol 25:18–31
Moustafa N (2017) Reliable statistical anomaly detection framework for dealing with large High-Speed network traffic. Ph.D. thesis designing an online university of new south wales. Canberra, Australia
Scikit-Learn Developers. Available online: sklearn.preprocessing. LabelEncoder accessed on 10 June 2020 (2020)
Kotsiantis S, Kanellopoulos D, Pintelas P (2006) Data preprocessing for supervised leaning. Int J Comput Sci 1:111–117
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. J Machine Learn Res 12:2825–2830
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Wolpert DH (1992) Stacked generalization. Neural Netw. 5:241–259
Bansal A, Kaur S (2018) Extreme gradient boosting based tuning for classification in intrusion detection systems, International Conference on Advances in Computing and Data Sciences. Springer, Berlin, pp 372–380
Pham NT, Foo E, Suriadi S, Jeffrey H, Lahza HFH (2018) Improving performance of intrusion detection system using ensemble methods and feature selection. In: Proceedings of the Australasian Computer Science Week Multiconference, pp 1–6
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? J Machine Learn Res 15(90):3133–3181. Accessed: Mar. 21, 2021. [Online]. Available: http://jmlr.org/papers/v15/delgado14a.html
Esmaily H, Tayefi M, Doosti H, Ghayour-Mobarhan M, Nezami H, Amirabadizadeh A (2018) A Comparison between Decision Tree and Random Forest in Determining the Risk Factors Associated with Type 2 Diabetes. J Res Health Sci 18(2):412. Accessed: Mar. 21, 2021. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7204421/
Ali J, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comput Sci Issues 9(5):272–27
Berhane TM, et al. (2018) Decision-Tree, Rule-Based, And random forest classification of High-Resolution multispectral imagery for wetland mapping and inventory. Remote Sens (Basel) 10(4):580. https://doi.org/10.3390/rs10040580
Prajwala TR (2015) A comparative study on decision tree and random forest using r tool. IJARCCE 4(1):196–199. https://doi.org/10.17148/IJARCCE.2015.4142
Chen T, Guestrin C (2016) XGBOost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp 785–794. https://doi.org/10.1145/2939672.2939785
Dhaliwal SS, Nahid A, Abbas R (2018) Effective Intrusion Detection System Using XGBoost, Information, vol. 9, no. 7. Art. no. 7. https://doi.org/10.3390/info9070149
Chen Z, Jiang F, Cheng Y, Gu X, Liu W, Peng J (2018) XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud. In: 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp 251–256. https://doi.org/10.1109/BigComp.2018.00044
Law A, et al. (2020) Secure Collaborative Training and Inference for XGBoost. In: Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, New York, NY, USA, pp 21–26. https://doi.org/10.1145/3411501.3419420
Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset. IEEE Commun Surv Tutor 18:184–208
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic characterization,ICISSP, pp. 108–116, Jan 22–24. Funchal, Portugal
Alazzam H, Sharieh A, Sabri KE (2020) A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer. Expert Syst Appl 148:113–249
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374
Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statistical analysis of honeypot data and building of kyoto 2006+ dataset for nids evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, ACM (2011), pp 29–36
The-NIMS-Dataset, Available: https://projects.cs.dal.ca/projectx/Download.html
Mahfouz A, Abuhussein A, Venugopal D, Shiva S (2020) Ensemble classifiers for network intrusion detection using a novel network attack dataset. Future Internet 12(11):180
Taneja M, Davy A (2017) Resource aware placement of IoT application modules in Fog-Cloud Computing Paradigm. In: 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), May, 2017, pp 1222–1228. https://doi.org/10.23919/INM.2017.7987464
Chao LW, Shih-Wen K, Chih-Fon T (Jan. 2017) 10 data mining techniques in business applications:brief survey. Kybernetes 46(7):1158–1170. https://doi.org/10.1108/K-10-2016-0302
Noor U, Anwar Z, Amjad T, Choo K-KR (2019) A machine learning-based FinTech cyber threat attribution framework using high-level indicators of compromise. Futur Gener Comput Syst 96:227–242. https://doi.org/10.1016/j.future.2019.02.013
Džeroski S, ženko B. (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3):255–273
Wilcoxon Rank-Sum Test, https://www.stat.auckland.ac.nz/wild/ChanceEnc/Ch10.wilcoxon.pdf
Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Series 1168(2):022022. IOP Publishing
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rashid, M., Kamruzzaman, J., Imam, T. et al. A tree-based stacking ensemble technique with feature selection for network intrusion detection. Appl Intell 52, 9768–9781 (2022). https://doi.org/10.1007/s10489-021-02968-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02968-1