Skip to main content

Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study

Abstract

Intrusion detection is a primary concern in any modern computer system due to the ever-growing number of intrusions. Machine learning represents an effective solution to detect and prevent network intrusions. Many existing intrusion detection approaches capitalize on machine learning models learned on the top of individual public datasets and achieve detection accuracy close to 1. These highly performing detectors strongly depend on the training data, which may not be representative of real-life production environments. This paper aims to explore this proposition in the context of denial of service attacks. Different intrusion detectors learned on the top of CICIDS2017 (an established public dataset widely used as a benchmark) are tested against an unseen, although closely related, dataset. The test dataset is based on the same mixture of denial of service attacks in CICIDS2017 and some additional variants. The results indicate that the perfect detection figures obtained in the context of a public dataset may not transfer in practice.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  2. https://www.unb.ca/cic/datasets/nsl.html

  3. https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/

  4. https://cve.mitre.org

  5. https://nesg.ugr.es/nesg-ugr16/

  6. https://www2.hs-fulda.de/NDSec/NDSec-1/Files/

  7. https://www.netresec.com/?page=ACS_MILCOM_2016

  8. https://secplab.ppgia.pucpr.br/?q=trabid

  9. https://www.unb.ca/cic/datasets/ids-2017.html

  10. https://github.com/httperf/httperf

  11. https://github.com/grafov/hulk

  12. https://github.com/Leeon123/TCP-UDP-Flood

  13. https://github.com/gkbrk/slowloris

  14. https://tools.kali.org/stress-testing/slowhttptest

  15. https://phoenixnap.com/kb/apache-mod-evasive

  16. https://httpd.apache.org/docs/2.4/mod/mod_reqtimeout.html

  17. http://idsdata.ding.unisannio.it/

  18. https://github.com/CanadianInstituteForCybersecurity/CICFlowMeter

  19. https://scikit-learn.org/stable/

  20. https://keras.io

References

  • Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32, e4150.

  • Ahmim, A., Maglaras, L., Ferrag, M. A., Derdour, M., & Janicke, H. (2019). A novel hierarchical intrusion detection system based on decision tree and rules-based models. In Proc. International Conference on Distributed Computing in Sensor Systems (pp. 228–233). IEEE.

  • Ali, O., & Cotae, P. (2018). Towards DoS/DDoS attack detection using artificial neural networks. In Proc. Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (pp. 229–234). IEEE.

  • Beer, F., Hofer, T., Karimi, D., & Bühler, U. (2017). A new attack composition for network security. In 10. DFN-Forum Kommunikationstechnologien (pp. 11–20). Gesellschaft fur Informatik e.V.

  • Bowen, T., Poylisher, A., Serban, C., Chadha, R., Jason Chiang, C., & Marvel, L. M. (2016). Enabling reproducible cyber research - Four labeled datasets. In Proc. Military Communications Conference (pp. 539–544). IEEE.

  • Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., & Villano, U. (2021a). USB-IDS-1: A public multilayer dataset of labeled network flows for IDS evaluation. In Proc. International Conference on Dependable Systems and Networks Workshops (pp. 1–6). IEEE.

  • Catillo, M., Del Vecchio, A., Pecchia, A., & Villano, U. (2021b). A critique on the use of machine learning on public datasets for intrusion detection. In A. C. R. Paiva, A. R. Cavalli, P. Ventura Martins, & R. Pérez-Castillo (Eds.), Quality of information and communications technology (pp. 253–266). Springer.

  • Catillo, M., Pecchia, A., Rak, M., & Villano, U. (2021). Demystifying the role of public intrusion datasets: A replication study of DoS network traffic data. Computers & Security, 108, 102341.

    Article  Google Scholar 

  • Catillo, M., Pecchia, A., & Villano, U. (2022). AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications, 191, 116263.

  • Engelen, G., Rimmer, V., & Joosen, W. (2021). Troubleshooting an intrusion detection dataset: The CICIDS2017 case study. In Proc. Security and Privacy Workshops (pp. 7–12). IEEE.

  • Filho, F., Silveira, F., Junior, A., Vargas-Solar, G., & Silveira, L. (2019). Smart detection: An online approach for DoS/DDoS attack detection using machine learning. Security and Communication Networks, 2019, 1574749.

    Google Scholar 

  • Kayacık, H. G., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. D. Zeng, F. Y. Wang, H. Chen, & R. C. Merkle (Eds.), Intelligence and security informatics (pp. 362–367). Springer.

  • Kenyon, A., Deka, L., & Elizondo, D. (2020). Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets. Computers & Security, 99, 102022.

  • Kshirsagar, D., & Kumar, S. (2021). An efficient feature reduction method for the detection of DoS attack. ICT Express, 7, 371–375.

    Article  Google Scholar 

  • Lee, J., Kim, J., Kim, I., & Han, K. (2019). Cyber threat detection based on artificial neural networks using event profiles. IEEE Access, 7, 165607–165626.

    Article  Google Scholar 

  • Li, X., & Ye, N. (2003). Decision tree classifiers for computer intrusion detection. In Real-time system security (p. 77-93). Nova Science Publishers, Inc.

  • Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9, 4396.

    Article  Google Scholar 

  • Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. (2017). UGR’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computer & Security, 73, 411–424.

    Article  Google Scholar 

  • McHugh, J. (2000). Testing Intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3, 262–294.

    Article  Google Scholar 

  • Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proc. Military Communications and Information Systems Conference (pp. 1–6). IEEE.

  • Nguyen, S., Nguyen, V., Choi, J., & Kim, K. (2018). Design and implementation of intrusion detection system using convolutional neural network for DoS detection. In Proc. International Conference on Machine Learning and Soft Computing (p. 34-38). ACM.

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.

    Article  Google Scholar 

  • Qu, X., Yang, L., Guo, K., Ma, L., Feng, T., Ren, S., & Sun, M. (2019). Statistics-enhanced direct batch growth self-organizing mapping for efficient DoS attack detection. IEEE Access, 7, 78434–78441.

    Article  Google Scholar 

  • Resende, P. A. A., & Drummond, A. C. (2018). A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys, 51, 48.

    Google Scholar 

  • Ring, M., Wunderlich, S., Scheuring, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computer & Security, 86, 147–167.

    Article  Google Scholar 

  • Sacramento, L., Medeiros, I., Bota, J., & Correia, M. (2018). FlowHacker: Detecting unknown network attacks in big traffic data using network flows. In Proc. International Conference On Trust, Security And Privacy In Computing And Communications / International Conference On Big Data Science And Engineering (pp. 567–572). IEEE.

  • Sharafaldin, I., Lashkari, A. H., & Ghorbani., A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. International Conference on Information Systems Security and Privacy (pp. 108–116). SciTePress.

  • Shenfield, A., Day, D., & Ayesh, A. (2018). Intelligent intrusion detection systems using artificial neural networks. ICT Express, 4, 95–99.

    Article  Google Scholar 

  • Silva, J. V. V., Lopez, M. A., & Mattos, D. M. F. (2020). Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset. In Proc. Conference on Cloud and Internet of Things (pp. 1–8). IEEE.

  • Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. In Proc. Symposium on Security and Privacy (pp. 305–316). IEEE.

  • Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proc. Symposium on Computational Intelligence for Security and Defense Applications (pp. 1–6). IEEE.

  • Tavallaee, M., Stakhanova, N., & Ghorbani, A. A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews), 40, 516–524.

    Google Scholar 

  • Verkerken, M., D’hooge, L., Wauters, T., Volckaert, B., & De Turck, F. (2021). Towards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30, 12.

  • Viegas, E. K., Santin, A. O., & Oliveira, L. S. (2017). Toward a reliable anomaly-based intrusion detection in real-world environments. Computer Networks, 127, 200–216.

    Article  Google Scholar 

  • Wankhede, S., & Kshirsagar, D. (2018). DoS attack detection using machine learning and neural network. In Proc. International Conference on Computing Communication Control and Automation (pp. 1–5). IEEE.

  • Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic.

Download references

Acknowledgements

Andrea Del Vecchio contributed to this work at the time he was hosted by the Department of Engineering at the University of Sannio under support by the “Orio Carlini” 2020 GARR Consortium Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Catillo.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Catillo, M., Del Vecchio, A., Pecchia, A. et al. Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study. Software Qual J (2022). https://doi.org/10.1007/s11219-022-09587-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11219-022-09587-0

Keywords

  • Denial of service
  • Machine learning
  • Transfer learning
  • Intrusion detection
  • Public intrusion datasets