Errors in the CICIDS2017 Dataset and the Significant Differences in Detection Performances It Makes

Lanvin, Maxime; Gimenez, Pierre-François; Han, Yufei; Majorczyk, Frédéric; Mé, Ludovic; Totel, Éric

doi:10.1007/978-3-031-31108-6_2

Maxime Lanvin¹⁴,
Pierre-François Gimenez¹⁴,
Yufei Han¹³,
Frédéric Majorczyk¹⁵,
Ludovic Mé¹³ &
…
Éric Totel¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13857))

Included in the following conference series:

International Conference on Risks and Security of Internet and Systems

274 Accesses
6 Citations

Abstract

Among the difficulties encountered in building datasets to evaluate intrusion detection tools, a tricky part is the process of labelling the events into malicious and benign classes. The labelling correctness is paramount for the quality of the evaluation of intrusion detection systems but is often considered as the ground truth by practitioners and is rarely verified. Another difficulty lies in the correct capture of the network packets. If it is not the case, the characteristics of the network flows generated from the capture could be modified and lead to false results. In this paper, we present several flaws we identified in the labelling of the CICIDS2017 dataset and in the traffic capture, such as packet misorder, packet duplication and attack that were performed but not correctly labelled. Finally, we assess the impact of these different corrections on the evaluation of supervised intrusion detection approaches.

This work has been partly realised thanks to a doctoral grant from Creach Labs (DGA, Brittany Region).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Google Scholar
CSE-CIC: A realistic cyber defense dataset (CSE-CIC-IDS2018) (2018). https://registry.opendata.aws/cse-cic-ids2018
Engelen, G., Rimmer, V., Joosen, W.: Troubleshooting an intrusion detection dataset: the CICIDS2017 case study. In: SPW, pp. 7–12 (2021). https://doi.org/10.1109/SPW53761.2021.00009
Kumar, V., Das, A.K., Sinha, D.: Statistical analysis of the UNSW-NB15 dataset for intrusion detection. In: Das, A.K., Nayak, J., Naik, B., Pati, S.K., Pelusi, D. (eds.) Computational Intelligence in Pattern Recognition. AISC, vol. 999, pp. 279–294. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9042-5_24
Chapter Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Google Scholar
Leevy, J.L., Khoshgoftaar, T.M.: A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 big data. J. Big Data 7(1), 1–19 (2020). https://doi.org/10.1186/s40537-020-00382-x
Article Google Scholar
Lippmann, R., et al.: Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation. In: Proceedings DARPA Information Survivability Conference and Exposition. DISCEX2000, vol. 2, pp. 12–26 (2000)
Google Scholar
Maseer, Z.K., Yusof, R., Bahaman, N., Mostafa, S.A., Foozy, C.F.M.: Benchmarking of machine learning for anomaly based intrusion detection systems in the cicids2017 dataset. IEEE Access 9, 22351–22370 (2021)
Article Google Scholar
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems. In: MilCIS, pp. 1–6 (2015). https://doi.org/10.1109/MilCIS.2015.7348942
Panigrahi, R., et al.: Performance assessment of supervised classifiers for designing intrusion detection systems: a comprehensive review and recommendations for future research. Mathematics 9(6), 690 (2021)
Article Google Scholar
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection data sets. Comput. Secur. 86, 147–167 (2019)
Article Google Scholar
Rosay, A., Cheval, E., Carlier, F., Leroux, P.: Network intrusion detection: a comprehensive analysis of CIC-ids2017. In: ICISSP (2022)
Google Scholar
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSP (2018)
Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6 (2009)
Google Scholar
Wang, Y., Yang, K., Jing, X., Jin, H.L.: Problems of KDD cup 99 dataset existed and data preprocessing. In: Applied Mechanics and Materials, vol. 667, pp. 218–225. Trans Tech Publications (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Inria, Univ. Rennes, IRISA, Rennes, France
Yufei Han & Ludovic Mé
CentraleSupélec, Univ. Rennes, IRISA, Rennes, France
Maxime Lanvin & Pierre-François Gimenez
DGA-MI, Univ. Rennes, IRISA, Rennes, France
Frédéric Majorczyk
Samovar, Télécom SudParis, Institut Polytechnique de Paris, Palaiseau, France
Éric Totel

Authors

Maxime Lanvin
View author publications
You can also search for this author in PubMed Google Scholar
Pierre-François Gimenez
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Han
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Majorczyk
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Mé
View author publications
You can also search for this author in PubMed Google Scholar
Éric Totel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxime Lanvin .

Editor information

Editors and Affiliations

University of Sfax, Sfax, Tunisia
Slim Kallel
University of Sfax, Sfax, Tunisia
Mohamed Jmaiel
Queen's University, Kingston, ON, Canada
Mohammad Zulkernine
University of Sfax, Sfax, Tunisia
Ahmed Hadj Kacem
Polytechnique Montréal, Montréal, QC, Canada
Frédéric Cuppens
Polytechnique Montréal, Montréal, QC, Canada
Nora Cuppens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lanvin, M., Gimenez, PF., Han, Y., Majorczyk, F., Mé, L., Totel, É. (2023). Errors in the CICIDS2017 Dataset and the Significant Differences in Detection Performances It Makes. In: Kallel, S., Jmaiel, M., Zulkernine, M., Hadj Kacem, A., Cuppens, F., Cuppens, N. (eds) Risks and Security of Internet and Systems. CRiSIS 2022. Lecture Notes in Computer Science, vol 13857. Springer, Cham. https://doi.org/10.1007/978-3-031-31108-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-31108-6_2
Published: 14 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31107-9
Online ISBN: 978-3-031-31108-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Errors in the CICIDS2017 Dataset and the Significant Differences in Detection Performances It Makes