Abstract
During the last decade network infrastructures have been in a constant evolution. And, at the same time, attacks and attack vectors become increasingly sophisticated. Hence, networks contain a lot of different features that can be used to identify attacks. Machine learning are particularly useful at dealing with large and varied datasets, which are crucial to develop an accurate intrusion detection system. Thus, the huge challenge that intrusion detection represents can be supported by machine learning techniques. In this work, several feature selection and ensemble methods are applied to the recent CICIDS2017 dataset in order to develop valid models to detect intrusions as soon as they occur. Using permutation importance the original 69 features in the dataset have been reduced to only 10 features, which allows the reduction of models execution time, and leads to faster intrusion detection systems. The reduced dataset was evaluated using Random Forest algorithm, and the obtained results show that the optimized dataset maintains a high detection rate performance.
This work has received funding from European Union’s H2020 research and innovation programme under SAFECARE Project, grant agreement no. 787002.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Jarrah, O.Y., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW). pp. 177–181, June 2014
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9, 1545–1588 (1997)
Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis, pp. 129–138, December 2012
Boukhamla, A., Coronel, J.: Cicids 2017 dataset: performance improvements and validation as a robust intrusion detection system testbed. Int. J. Inf. Comput. Secur. (2018)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
University of California Irvine: Kdd cup 1999 data, March 2018. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Chebrolu, S., Abraham, A., Thomas, J.P.: Feature deduction and ensemble design of intrusion detection systems. Comput. Secur. 24(4), 295–307 (2005)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system, pp. 785–794, August 2016
Cyber Intelligence (CI) for Cybersecurity: Intrusion detection evaluation dataset (cicids2017), March 2018. https://www.unb.ca/cic/datasets/ids-2017.html
Cyber Intelligence (CI) for Cybersecurity: Network traffic flow analyzer, March 2018. http://www.netflowmeter.ca/netflowmeter.html
Dhaliwal, S.S., Nahid, A.A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Goeschel, K.: Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis, pp. 1–6, March 2016
Gulati, P.: Intrusion detection system using gradient boosted trees for VANETs. Int. J. Res. Appl. Sci. Eng. Technol. 482–488 (2017)
Hastie, T.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2009). https://doi.org/10.1007/BF02985802
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., Atkinson, R.: Shallow and deep networks intrusion detection system: a taxonomy and survey. Workingpaper, January 2017
Iglesias, F., Zseby, T.: Analysis of network traffic features for anomaly detection. Mach. Learn. 101(1), 59–84 (2015)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4614-7138-7
Jyothsna, V., Rama Prasad, V.V., Munivara Prasad, K.: A review of anomaly based intrusion detection systems. Int. J. Comput. Appl. 28, 26–35 (2011)
Ke, G.,et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, December 2017
keitakurita: LightGBM and XGBoost explained, October 1999. http://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/
Kim, E., Kim, S.: A novel hierarchical detection method for enhancing anomaly detection efficiency, pp. 1018–1022, December 2015
Kruegel, C., Toth, T.: Using decision trees to improve signature-based intrusion detection. In: Vigna, G., Kruegel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, pp. 173–191. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45248-5_10
Mandayam Comar, P., Liu, L., Saha, S., Tan, P.N., Nucci, A.: Combining supervised and unsupervised learning for zero-day malware detection, pp. 2022–2030, April 2013
Mukkamala, S., Sung, A., Abraham, A.: Cyber security challenges: designing efficient intrusion detection systems and antivirus tools, January 2005
Panigrahi, R., Borah, S.: A detailed analysis of cicids2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7(3.24), 479–482 (2018)
Parr, T., Turgutlu, K., Csiszar, C., Howard, J.: Beware default random forest importances, March 2018. https://explained.ai/rf-importance/
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). revision #186077
Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6, 21–45 (2006)
Resende, P.A.A., Drummond, A.C.: A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 51(3), 48:1–48:36 (2018)
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, Vol. 1, ICISSP, pp. 108–116. INSTICC, SciTePress (2018)
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 357–374 (2012)
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 25 (2007)
Haines, J.W., Lippmann, R.P., Fried, D.J., Zissman, M.A., Tran, E.: 1999 DARPA intrusion detection evaluation: design and procedures, p. 188, February 2001
Yin, M., Yao, D., Luo, J., Liu, X., Ma, J.: Network backbone anomaly detection using double random forests based on non-extensive entropy feature extraction, pp. 80–84, July 2013
Zhi, T., Luo, H., Liu, Y.: A gini impurity based interest flooding attack defence mechanism in NDN. IEEE Commun. Lett. 22(3), 1 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Reis, B., Maia, E., Praça, I. (2020). Selection and Performance Analysis of CICIDS2017 Features Importance. In: Benzekri, A., Barbeau, M., Gong, G., Laborde, R., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2019. Lecture Notes in Computer Science(), vol 12056. Springer, Cham. https://doi.org/10.1007/978-3-030-45371-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-45371-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45370-1
Online ISBN: 978-3-030-45371-8
eBook Packages: Computer ScienceComputer Science (R0)