Selection and Performance Analysis of CICIDS2017 Features Importance

Reis, Bruno; Maia, Eva; Praça, Isabel

doi:10.1007/978-3-030-45371-8_4

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12056))

Included in the following conference series:

International Symposium on Foundations and Practice of Security

1326 Accesses
10 Citations

Abstract

During the last decade network infrastructures have been in a constant evolution. And, at the same time, attacks and attack vectors become increasingly sophisticated. Hence, networks contain a lot of different features that can be used to identify attacks. Machine learning are particularly useful at dealing with large and varied datasets, which are crucial to develop an accurate intrusion detection system. Thus, the huge challenge that intrusion detection represents can be supported by machine learning techniques. In this work, several feature selection and ensemble methods are applied to the recent CICIDS2017 dataset in order to develop valid models to detect intrusions as soon as they occur. Using permutation importance the original 69 features in the dataset have been reduced to only 10 features, which allows the reduction of models execution time, and leads to faster intrusion detection systems. The reduced dataset was evaluated using Random Forest algorithm, and the obtained results show that the optimized dataset maintains a high detection rate performance.

This work has received funding from European Union’s H2020 research and innovation programme under SAFECARE Project, grant agreement no. 787002.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al-Jarrah, O.Y., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW). pp. 177–181, June 2014
Google Scholar
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9, 1545–1588 (1997)
Article Google Scholar
Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis, pp. 129–138, December 2012
Google Scholar
Boukhamla, A., Coronel, J.: Cicids 2017 dataset: performance improvements and validation as a robust intrusion detection system testbed. Int. J. Inf. Comput. Secur. (2018)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
University of California Irvine: Kdd cup 1999 data, March 2018. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Chebrolu, S., Abraham, A., Thomas, J.P.: Feature deduction and ensemble design of intrusion detection systems. Comput. Secur. 24(4), 295–307 (2005)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system, pp. 785–794, August 2016
Google Scholar
Cyber Intelligence (CI) for Cybersecurity: Intrusion detection evaluation dataset (cicids2017), March 2018. https://www.unb.ca/cic/datasets/ids-2017.html
Cyber Intelligence (CI) for Cybersecurity: Network traffic flow analyzer, March 2018. http://www.netflowmeter.ca/netflowmeter.html
Dhaliwal, S.S., Nahid, A.A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018)
Article Google Scholar
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)
Article MathSciNet Google Scholar
Goeschel, K.: Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis, pp. 1–6, March 2016
Google Scholar
Gulati, P.: Intrusion detection system using gradient boosted trees for VANETs. Int. J. Res. Appl. Sci. Eng. Technol. 482–488 (2017)
Google Scholar
Hastie, T.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2009). https://doi.org/10.1007/BF02985802
Book MATH Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., Atkinson, R.: Shallow and deep networks intrusion detection system: a taxonomy and survey. Workingpaper, January 2017
Google Scholar
Iglesias, F., Zseby, T.: Analysis of network traffic features for anomaly detection. Mach. Learn. 101(1), 59–84 (2015)
Article MathSciNet Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4614-7138-7
Book MATH Google Scholar
Jyothsna, V., Rama Prasad, V.V., Munivara Prasad, K.: A review of anomaly based intrusion detection systems. Int. J. Comput. Appl. 28, 26–35 (2011)
Google Scholar
Ke, G.,et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, December 2017
Google Scholar
keitakurita: LightGBM and XGBoost explained, October 1999. http://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/
Kim, E., Kim, S.: A novel hierarchical detection method for enhancing anomaly detection efficiency, pp. 1018–1022, December 2015
Google Scholar
Kruegel, C., Toth, T.: Using decision trees to improve signature-based intrusion detection. In: Vigna, G., Kruegel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, pp. 173–191. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45248-5_10
Chapter Google Scholar
Mandayam Comar, P., Liu, L., Saha, S., Tan, P.N., Nucci, A.: Combining supervised and unsupervised learning for zero-day malware detection, pp. 2022–2030, April 2013
Google Scholar
Mukkamala, S., Sung, A., Abraham, A.: Cyber security challenges: designing efficient intrusion detection systems and antivirus tools, January 2005
Google Scholar
Panigrahi, R., Borah, S.: A detailed analysis of cicids2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7(3.24), 479–482 (2018)
Google Scholar
Parr, T., Turgutlu, K., Csiszar, C., Howard, J.: Beware default random forest importances, March 2018. https://explained.ai/rf-importance/
Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). revision #186077
Article Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6, 21–45 (2006)
Article Google Scholar
Resende, P.A.A., Drummond, A.C.: A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 51(3), 48:1–48:36 (2018)
Article Google Scholar
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, Vol. 1, ICISSP, pp. 108–116. INSTICC, SciTePress (2018)
Google Scholar
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 357–374 (2012)
Article Google Scholar
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 25 (2007)
Article Google Scholar
Haines, J.W., Lippmann, R.P., Fried, D.J., Zissman, M.A., Tran, E.: 1999 DARPA intrusion detection evaluation: design and procedures, p. 188, February 2001
Google Scholar
Yin, M., Yao, D., Luo, J., Liu, X., Ma, J.: Network backbone anomaly detection using double random forests based on non-extensive entropy feature extraction, pp. 80–84, July 2013
Google Scholar
Zhi, T., Luo, H., Liu, Y.: A gini impurity based interest flooding attack defence mechanism in NDN. IEEE Commun. Lett. 22(3), 1 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

GECAD - Research Group on Intelligent Engineering and Computing for Advanced Innovation and Development, School of Engineering of the Polytechnic of Porto (ISEP), Porto, Portugal
Bruno Reis, Eva Maia & Isabel Praça

Authors

Bruno Reis
View author publications
You can also search for this author in PubMed Google Scholar
Eva Maia
View author publications
You can also search for this author in PubMed Google Scholar
Isabel Praça
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eva Maia .

Editor information

Editors and Affiliations

Université Paul Sabatier (CNRS IRIT), Toulouse, France
Abdelmalek Benzekri
Carleton University, Ottawa, ON, Canada
Michel Barbeau
University of Waterloo, Waterloo, ON, Canada
Guang Gong
Université Paul Sabatier (CNRS IRIT), Toulouse, France
Romain Laborde
Telecom SudParis, IMT, Palaiseau, France
Joaquin Garcia-Alfaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reis, B., Maia, E., Praça, I. (2020). Selection and Performance Analysis of CICIDS2017 Features Importance. In: Benzekri, A., Barbeau, M., Gong, G., Laborde, R., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2019. Lecture Notes in Computer Science(), vol 12056. Springer, Cham. https://doi.org/10.1007/978-3-030-45371-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-45371-8_4
Published: 17 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45370-1
Online ISBN: 978-3-030-45371-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics