Skip to main content

Selection and Performance Analysis of CICIDS2017 Features Importance

  • Conference paper
  • First Online:
Foundations and Practice of Security (FPS 2019)

Abstract

During the last decade network infrastructures have been in a constant evolution. And, at the same time, attacks and attack vectors become increasingly sophisticated. Hence, networks contain a lot of different features that can be used to identify attacks. Machine learning are particularly useful at dealing with large and varied datasets, which are crucial to develop an accurate intrusion detection system. Thus, the huge challenge that intrusion detection represents can be supported by machine learning techniques. In this work, several feature selection and ensemble methods are applied to the recent CICIDS2017 dataset in order to develop valid models to detect intrusions as soon as they occur. Using permutation importance the original 69 features in the dataset have been reduced to only 10 features, which allows the reduction of models execution time, and leads to faster intrusion detection systems. The reduced dataset was evaluated using Random Forest algorithm, and the obtained results show that the optimized dataset maintains a high detection rate performance.

This work has received funding from European Union’s H2020 research and innovation programme under SAFECARE Project, grant agreement no. 787002.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al-Jarrah, O.Y., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW). pp. 177–181, June 2014

    Google Scholar 

  2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9, 1545–1588 (1997)

    Article  Google Scholar 

  3. Bilge, L., Balzarotti, D., Robertson, W., Kirda, E., Kruegel, C.: Disclosure: detecting botnet command and control servers through large-scale netflow analysis, pp. 129–138, December 2012

    Google Scholar 

  4. Boukhamla, A., Coronel, J.: Cicids 2017 dataset: performance improvements and validation as a robust intrusion detection system testbed. Int. J. Inf. Comput. Secur. (2018)

    Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  6. University of California Irvine: Kdd cup 1999 data, March 2018. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  7. Chebrolu, S., Abraham, A., Thomas, J.P.: Feature deduction and ensemble design of intrusion detection systems. Comput. Secur. 24(4), 295–307 (2005)

    Article  Google Scholar 

  8. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system, pp. 785–794, August 2016

    Google Scholar 

  9. Cyber Intelligence (CI) for Cybersecurity: Intrusion detection evaluation dataset (cicids2017), March 2018. https://www.unb.ca/cic/datasets/ids-2017.html

  10. Cyber Intelligence (CI) for Cybersecurity: Network traffic flow analyzer, March 2018. http://www.netflowmeter.ca/netflowmeter.html

  11. Dhaliwal, S.S., Nahid, A.A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018)

    Article  Google Scholar 

  12. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2000)

    Article  MathSciNet  Google Scholar 

  13. Goeschel, K.: Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis, pp. 1–6, March 2016

    Google Scholar 

  14. Gulati, P.: Intrusion detection system using gradient boosted trees for VANETs. Int. J. Res. Appl. Sci. Eng. Technol. 482–488 (2017)

    Google Scholar 

  15. Hastie, T.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Heidelberg (2009). https://doi.org/10.1007/BF02985802

    Book  MATH  Google Scholar 

  16. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  17. Hodo, E., Bellekens, X., Hamilton, A., Tachtatzis, C., Atkinson, R.: Shallow and deep networks intrusion detection system: a taxonomy and survey. Workingpaper, January 2017

    Google Scholar 

  18. Iglesias, F., Zseby, T.: Analysis of network traffic features for anomaly detection. Mach. Learn. 101(1), 59–84 (2015)

    Article  MathSciNet  Google Scholar 

  19. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning: With Applications in R. Springer, Heidelberg (2014). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  20. Jyothsna, V., Rama Prasad, V.V., Munivara Prasad, K.: A review of anomaly based intrusion detection systems. Int. J. Comput. Appl. 28, 26–35 (2011)

    Google Scholar 

  21. Ke, G.,et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, December 2017

    Google Scholar 

  22. keitakurita: LightGBM and XGBoost explained, October 1999. http://mlexplained.com/2018/01/05/lightgbm-and-xgboost-explained/

  23. Kim, E., Kim, S.: A novel hierarchical detection method for enhancing anomaly detection efficiency, pp. 1018–1022, December 2015

    Google Scholar 

  24. Kruegel, C., Toth, T.: Using decision trees to improve signature-based intrusion detection. In: Vigna, G., Kruegel, C., Jonsson, E. (eds.) RAID 2003. LNCS, vol. 2820, pp. 173–191. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45248-5_10

    Chapter  Google Scholar 

  25. Mandayam Comar, P., Liu, L., Saha, S., Tan, P.N., Nucci, A.: Combining supervised and unsupervised learning for zero-day malware detection, pp. 2022–2030, April 2013

    Google Scholar 

  26. Mukkamala, S., Sung, A., Abraham, A.: Cyber security challenges: designing efficient intrusion detection systems and antivirus tools, January 2005

    Google Scholar 

  27. Panigrahi, R., Borah, S.: A detailed analysis of cicids2017 dataset for designing intrusion detection systems. Int. J. Eng. Technol. 7(3.24), 479–482 (2018)

    Google Scholar 

  28. Parr, T., Turgutlu, K., Csiszar, C., Howard, J.: Beware default random forest importances, March 2018. https://explained.ai/rf-importance/

  29. Polikar, R.: Ensemble learning. Scholarpedia 4(1), 2776 (2009). revision #186077

    Article  Google Scholar 

  30. Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6, 21–45 (2006)

    Article  Google Scholar 

  31. Resende, P.A.A., Drummond, A.C.: A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 51(3), 48:1–48:36 (2018)

    Article  Google Scholar 

  32. Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, Vol. 1, ICISSP, pp. 108–116. INSTICC, SciTePress (2018)

    Google Scholar 

  33. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 357–374 (2012)

    Article  Google Scholar 

  34. Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform. 8(1), 25 (2007)

    Article  Google Scholar 

  35. Haines, J.W., Lippmann, R.P., Fried, D.J., Zissman, M.A., Tran, E.: 1999 DARPA intrusion detection evaluation: design and procedures, p. 188, February 2001

    Google Scholar 

  36. Yin, M., Yao, D., Luo, J., Liu, X., Ma, J.: Network backbone anomaly detection using double random forests based on non-extensive entropy feature extraction, pp. 80–84, July 2013

    Google Scholar 

  37. Zhi, T., Luo, H., Liu, Y.: A gini impurity based interest flooding attack defence mechanism in NDN. IEEE Commun. Lett. 22(3), 1 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eva Maia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Reis, B., Maia, E., Praça, I. (2020). Selection and Performance Analysis of CICIDS2017 Features Importance. In: Benzekri, A., Barbeau, M., Gong, G., Laborde, R., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2019. Lecture Notes in Computer Science(), vol 12056. Springer, Cham. https://doi.org/10.1007/978-3-030-45371-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45371-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45370-1

  • Online ISBN: 978-3-030-45371-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics