Skip to main content
Log in

Successful intrusion detection with a single deep autoencoder: theory and practice

  • Research
  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Intrusion detection is a key topic in computer security. Due to the ever-increasing number of network attacks, several accurate anomaly-based techniques have been proposed for intrusion detection, wherein pattern recognition through machine learning techniques is typically used. Many proposals rely on the use of autoencoders, due to their capability to analyze complex, high-dimensional, and large-scale data. They capitalize on composite architectures and accurate learning approaches, possibly in combination with sophisticated feature selection techniques. However, due to their high complexity and lack of transferability of the impressive intrusion detection results, they are hardly ever used in production environments. This paper is developed around the intuition that complexity is not necessarily justified because a single autoencoder is enough to obtain similar, if not better, intrusion detection results compared to related proposals. The wide study presented here addresses the effect of the seed, a deep investigation on the training loss, and feature selection across the use of different hardware platforms. The best practices presented, regarding set-up and training, threshold setting, and possible use of feature selection techniques for performance improvement, can be valuable for any future work on the use of autoencoders for successful intrusion detection purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

Publicly available datasets were analyzed in this study. The datasets can be found at the URLs mentioned in the paper.

Notes

  1. The seed is the initial point of the sequence of values generated by the pseudorandom number generator (PRNG).

  2. https://github.com/ahlashkari/CICFlowMeter

  3. https://downloads.distrinet-research.be/WTMC2021/tools_datasets.html

  4. http://idsdata.ding.unisannio.it/tools.html

  5. This can be obtained by calling tf.config.threading.set_inter_op_parallelism_threads(1) and tf.config.threading.set_intra_op_parallelism_threads(1), but this is detrimental for learning times.

  6. https://github.com/NVIDIA/framework-determinism

  7. http://idsdata.ding.unisannio.it/

References

  • Apruzzese, G., Pajola, L., & Conti, M. (2022). The cross-evaluation of machine learning-based network intrusion detection systems. IEEE Transactions on Network and Service Management, 19, 5152–5169.

    Article  Google Scholar 

  • Binbusayyis, A., & Vaiyapuri, T. (2020). Comprehensive analysis and recommendation of feature evaluation measures for intrusion detection. Heliyon, 6, e04262.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70–79.

    Article  Google Scholar 

  • Catillo, M., Rak, M., & Villano, U. (2019). Discovery of DoS attacks by the ZED-IDS anomaly detector. Journal of High Speed Networks, 25, 349–365.

    Article  Google Scholar 

  • Catillo, M., Rak, M., & Villano, U. (2020). 2L-ZED-IDS: A two-level anomaly detector for multiple attack classes. In Web, artificial intelligence and network applications (pp. 687–696). Springer International Publishing.

  • Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., & Villano, U. (2021a). USB-IDS-1: A public multilayer dataset of labeled network flows for IDS evaluation. In Proc. International Conference on Dependable Systems and Networks Workshops (pp. 1–6). IEEE.

  • Catillo, M., Pecchia, A., Rak, M., & Villano, U. (2021b). Demystifying the role of public intrusion datasets: A replication study of DoS network traffic data. Computers & Security, 108,

  • Catillo, M., Del Vecchio, A., Pecchia, A., & Villano, U. (2022a). Transferability of machine learning models learned from public intrusion detection datasets: The CICIDS2017 case study. Software Quality Journal, 30, 955–981.

  • Catillo, M., Pecchia, A., & Villano, U. (2022b). Simpler is better: On the use of autoencoders for intrusion detection. In Quality of information and communications technology (pp. 223–238). Springer International Publishing.

  • Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41, 15.

    Article  Google Scholar 

  • Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40, 16–28.

    Article  Google Scholar 

  • de Carvalho Bertoli, G., Junior, Alves Pereira, L., Saotome, O., & dos Santos, A. L. (2023). Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach. Computers & Security, 127, 103106.

  • Dina, A. S., & Manivannan, D. (2021). Intrusion detection based on machine learning techniques in computer networks. Internet of Things, 16, 100462.

    Article  Google Scholar 

  • Engelen, G., Rimmer, V., & Joosen, W. (2021). Troubleshooting an intrusion detection dataset: The CICIDS2017 case study. In Proc. Security and Privacy Workshops (pp. 7–12). IEEE.

  • Jiang, J., Han, G., Liu, L., Shu, L., & Guizani, M. (2020). Outlier detection approaches based on machine learning in the Internet-of-Things. IEEE Wireless Communications, 27, 53–59.

    Article  Google Scholar 

  • Kilincer, I., Ertam, F., & Sengur, A. (2021). Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks, 188, 107840.

    Article  Google Scholar 

  • Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37, 233–243.

    Article  ADS  CAS  Google Scholar 

  • Kshirsagar, D., & Kumar, S. (2021). An efficient feature reduction method for the detection of DoS attack. ICT Express, 7, 371–375.

    Article  Google Scholar 

  • Kunang, Y. N., Nurmaini, S., Stiawan, D., Zarkasi, A., Firdaus, & Jasmir (2018). Automatic features extraction using autoencoder in intrusion detection system. In Proc. International Conference on Electrical Engineering and Computer Science (pp. 219–224). IEEE.

  • Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature selection: A data perspective. ACM Comput. Surv., 50, 1–45.

    Article  Google Scholar 

  • Liu, F. T., Ting, K. M., & Zhou, Z. (2008). Isolation forest. In Proc. International Conference on Data Mining (pp. 413–422). IEEE.

  • Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. (2017). UGR’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computer & Security, 73, 411–424.

    Article  Google Scholar 

  • Maseer, Z. K., Yusof, R., Bahaman, N., Mostafa, S. A., & Foozy, C. F. M. (2021). Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access, 9, 22351–22370.

    Article  Google Scholar 

  • Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breitenbacher, D., & Elovici, Y. (2018). N-BaIoT-network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Computing, 17, 12–22.

    Article  Google Scholar 

  • Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. (2018). Kitsune: An ensemble of autoencoders for online network intrusion detection. In Proc. International Conference of Network and Distributed System Security Symposium.

  • Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proc. International Conference Military Communications and Information Systems Conference (pp. 1–6). IEEE.

  • Panigrahi, R., Borah, S., Bhoi, A. K., Ijaz, M. F., Pramanik, M., Jhaveri, R. H., & Chowdhary, C. L. (2021). Performance assessment of supervised classifiers for designing intrusion detection systems: A comprehensive review and recommendations for future research. Mathematics, 9, 690.

    Article  Google Scholar 

  • Ring, M., Wunderlich, S., Scheuring, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computer & Security, 86, 147–167.

    Article  Google Scholar 

  • Roesch, M. (1999). Snort - Lightweight intrusion detection for networks. In Proc. International USENIX Conference on System Administration (p. 229-238). USENIX Association.

  • Rosay, A., Carlier, F., Cheval, E., & Leroux, P. (2021). From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance. In Proc. International Conference on Web Intelligence (pp. 570–575). ACM.

  • Sharafaldin, I., Lashkari, A. H., & Ghorbani., A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. International Conference on Information Systems Security and Privacy (pp. 108–116). SciTePress.

  • Solorio-Fernández, S., Carrasco-Ochoa, J. A., & Martìnez-Trinidad, J. F. (2020). A review of unsupervised feature selection methods. Artificial Intelligence Review, 53, 907–948.

    Article  Google Scholar 

  • Taher, K. A., Mohammed Yasin Jisan, B., & Rahman, M. M. (2019). Network intrusion detection using supervised machine learning technique with feature selection. In Proc. International Conference on Robotics, Electrical and Signal Processing Techniques (pp. 643–646). IEEE.

  • Verkerken, M., D’Hooge, L., Wauters, T., Volckaert, B., & De Turck, F. (2021). Towards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30, 12.

    Article  PubMed Central  Google Scholar 

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

    MathSciNet  Google Scholar 

  • Wei-Chao, L., Shih-Wen, K., & Chih-Fong, T. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 78, 13–21.

    Article  Google Scholar 

  • Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic.

  • Wu, J., Wu, Y., Niu, N., & Zhou, M. (2021). MHCPDP: Multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder. IEEE Pervasive Computing, 29, 405–430.

    Google Scholar 

  • XuKui, L., Wei, C., Qianru, Z., & Lifa, W. (2020). Building auto-encoder intrusion detection system based on random forest feature selection. Computers & Security, 95, 101851.

    Article  Google Scholar 

  • Zhang, Y., Lee, W., & Huang, Y. (2003). Intrusion detection techniques for mobile wireless networks. Wireless Networks, 9, 545–556.

    Article  Google Scholar 

  • Zhong, Y., Chen, W., Wang, Z., Chen, Y., Wang, K., Li, Y., Yin, X., Shi, X., Yang, J., & Li, K. (2020). HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning. Computer Networks, 169, 107049.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Marta Catillo, Antonio Pecchia, and Umberto Villano contributed equally to this work.

Corresponding author

Correspondence to Marta Catillo.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Marta Catillo, Antonio Pecchia and Umberto Villano contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Catillo, M., Pecchia, A. & Villano, U. Successful intrusion detection with a single deep autoencoder: theory and practice. Software Qual J 32, 95–123 (2024). https://doi.org/10.1007/s11219-023-09636-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-023-09636-2

Keywords

Navigation