Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study

Catillo, Marta; Del Vecchio, Andrea; Pecchia, Antonio; Villano, Umberto

doi:10.1007/s11219-022-09587-0

Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study

Published: 19 March 2022

Volume 30, pages 955–981, (2022)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Marta Catillo ORCID: orcid.org/0000-0002-5025-7969¹^na1,
Andrea Del Vecchio¹^na1,
Antonio Pecchia¹^na1 &
…
Umberto Villano¹^na1

1187 Accesses
16 Citations
Explore all metrics

Abstract

Intrusion detection is a primary concern in any modern computer system due to the ever-growing number of intrusions. Machine learning represents an effective solution to detect and prevent network intrusions. Many existing intrusion detection approaches capitalize on machine learning models learned on the top of individual public datasets and achieve detection accuracy close to 1. These highly performing detectors strongly depend on the training data, which may not be representative of real-life production environments. This paper aims to explore this proposition in the context of denial of service attacks. Different intrusion detectors learned on the top of CICIDS2017 (an established public dataset widely used as a benchmark) are tested against an unseen, although closely related, dataset. The test dataset is based on the same mixture of denial of service attacks in CICIDS2017 and some additional variants. The results indicate that the perfect detection figures obtained in the context of a public dataset may not transfer in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection

A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data

Article Open access 23 November 2020

Joffrey L. Leevy & Taghi M. Khoshgoftaar

Machine Learning for Network-Based Intrusion Detection Systems: An Analysis of the CIDDS-001 Dataset

Notes

References

Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32, e4150.
Ahmim, A., Maglaras, L., Ferrag, M. A., Derdour, M., & Janicke, H. (2019). A novel hierarchical intrusion detection system based on decision tree and rules-based models. In Proc. International Conference on Distributed Computing in Sensor Systems (pp. 228–233). IEEE.
Ali, O., & Cotae, P. (2018). Towards DoS/DDoS attack detection using artificial neural networks. In Proc. Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (pp. 229–234). IEEE.
Beer, F., Hofer, T., Karimi, D., & Bühler, U. (2017). A new attack composition for network security. In 10. DFN-Forum Kommunikationstechnologien (pp. 11–20). Gesellschaft fur Informatik e.V.
Bowen, T., Poylisher, A., Serban, C., Chadha, R., Jason Chiang, C., & Marvel, L. M. (2016). Enabling reproducible cyber research - Four labeled datasets. In Proc. Military Communications Conference (pp. 539–544). IEEE.
Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., & Villano, U. (2021a). USB-IDS-1: A public multilayer dataset of labeled network flows for IDS evaluation. In Proc. International Conference on Dependable Systems and Networks Workshops (pp. 1–6). IEEE.
Catillo, M., Del Vecchio, A., Pecchia, A., & Villano, U. (2021b). A critique on the use of machine learning on public datasets for intrusion detection. In A. C. R. Paiva, A. R. Cavalli, P. Ventura Martins, & R. Pérez-Castillo (Eds.), Quality of information and communications technology (pp. 253–266). Springer.
Catillo, M., Pecchia, A., Rak, M., & Villano, U. (2021). Demystifying the role of public intrusion datasets: A replication study of DoS network traffic data. Computers & Security, 108, 102341.
Article Google Scholar
Catillo, M., Pecchia, A., & Villano, U. (2022). AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications, 191, 116263.
Engelen, G., Rimmer, V., & Joosen, W. (2021). Troubleshooting an intrusion detection dataset: The CICIDS2017 case study. In Proc. Security and Privacy Workshops (pp. 7–12). IEEE.
Filho, F., Silveira, F., Junior, A., Vargas-Solar, G., & Silveira, L. (2019). Smart detection: An online approach for DoS/DDoS attack detection using machine learning. Security and Communication Networks, 2019, 1574749.
Google Scholar
Kayacık, H. G., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. D. Zeng, F. Y. Wang, H. Chen, & R. C. Merkle (Eds.), Intelligence and security informatics (pp. 362–367). Springer.
Kenyon, A., Deka, L., & Elizondo, D. (2020). Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets. Computers & Security, 99, 102022.
Kshirsagar, D., & Kumar, S. (2021). An efficient feature reduction method for the detection of DoS attack. ICT Express, 7, 371–375.
Article Google Scholar
Lee, J., Kim, J., Kim, I., & Han, K. (2019). Cyber threat detection based on artificial neural networks using event profiles. IEEE Access, 7, 165607–165626.
Article Google Scholar
Li, X., & Ye, N. (2003). Decision tree classifiers for computer intrusion detection. In Real-time system security (p. 77-93). Nova Science Publishers, Inc.
Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9, 4396.
Article Google Scholar
Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. (2017). UGR’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computer & Security, 73, 411–424.
Article Google Scholar
McHugh, J. (2000). Testing Intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3, 262–294.
Article Google Scholar
Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proc. Military Communications and Information Systems Conference (pp. 1–6). IEEE.
Nguyen, S., Nguyen, V., Choi, J., & Kim, K. (2018). Design and implementation of intrusion detection system using convolutional neural network for DoS detection. In Proc. International Conference on Machine Learning and Soft Computing (p. 34-38). ACM.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.
Article Google Scholar
Qu, X., Yang, L., Guo, K., Ma, L., Feng, T., Ren, S., & Sun, M. (2019). Statistics-enhanced direct batch growth self-organizing mapping for efficient DoS attack detection. IEEE Access, 7, 78434–78441.
Article Google Scholar
Resende, P. A. A., & Drummond, A. C. (2018). A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys, 51, 48.
Google Scholar
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computer & Security, 86, 147–167.
Article Google Scholar
Sacramento, L., Medeiros, I., Bota, J., & Correia, M. (2018). FlowHacker: Detecting unknown network attacks in big traffic data using network flows. In Proc. International Conference On Trust, Security And Privacy In Computing And Communications / International Conference On Big Data Science And Engineering (pp. 567–572). IEEE.
Sharafaldin, I., Lashkari, A. H., & Ghorbani., A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. International Conference on Information Systems Security and Privacy (pp. 108–116). SciTePress.
Shenfield, A., Day, D., & Ayesh, A. (2018). Intelligent intrusion detection systems using artificial neural networks. ICT Express, 4, 95–99.
Article Google Scholar
Silva, J. V. V., Lopez, M. A., & Mattos, D. M. F. (2020). Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset. In Proc. Conference on Cloud and Internet of Things (pp. 1–8). IEEE.
Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. In Proc. Symposium on Security and Privacy (pp. 305–316). IEEE.
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proc. Symposium on Computational Intelligence for Security and Defense Applications (pp. 1–6). IEEE.
Tavallaee, M., Stakhanova, N., & Ghorbani, A. A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews), 40, 516–524.
Google Scholar
Verkerken, M., D’hooge, L., Wauters, T., Volckaert, B., & De Turck, F. (2021). Towards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30, 12.
Viegas, E. K., Santin, A. O., & Oliveira, L. S. (2017). Toward a reliable anomaly-based intrusion detection in real-world environments. Computer Networks, 127, 200–216.
Article Google Scholar
Wankhede, S., & Kshirsagar, D. (2018). DoS attack detection using machine learning and neural network. In Proc. International Conference on Computing Communication Control and Automation (pp. 1–5). IEEE.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic.

Download references

Acknowledgements

Andrea Del Vecchio contributed to this work at the time he was hosted by the Department of Engineering at the University of Sannio under support by the “Orio Carlini” 2020 GARR Consortium Fellowship.

Author information

All authors contributed equally to this work.

Authors and Affiliations

Dipartimento di Ingegneria, Università degli Studi del Sannio, Pal.zo Bosco Lucarelli C.so Garibaldi 107, Benevento, 82100, Italy
Marta Catillo, Andrea Del Vecchio, Antonio Pecchia & Umberto Villano

Authors

Marta Catillo
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Del Vecchio
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Pecchia
View author publications
You can also search for this author in PubMed Google Scholar
Umberto Villano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta Catillo.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Catillo, M., Del Vecchio, A., Pecchia, A. et al. Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study. Software Qual J 30, 955–981 (2022). https://doi.org/10.1007/s11219-022-09587-0

Download citation

Accepted: 15 February 2022
Published: 19 March 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11219-022-09587-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study

Abstract

Access this article

Similar content being viewed by others

A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection

A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data

Machine Learning for Network-Based Intrusion Detection Systems: An Analysis of the CIDDS-001 Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study

Abstract

Access this article

Similar content being viewed by others

A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection

A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data

Machine Learning for Network-Based Intrusion Detection Systems: An Analysis of the CIDDS-001 Dataset

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation