Abstract
Intrusion detection is a key topic in computer security. Due to the ever-increasing number of network attacks, several accurate anomaly-based techniques have been proposed for intrusion detection, wherein pattern recognition through machine learning techniques is typically used. Many proposals rely on the use of autoencoders, due to their capability to analyze complex, high-dimensional, and large-scale data. They capitalize on composite architectures and accurate learning approaches, possibly in combination with sophisticated feature selection techniques. However, due to their high complexity and lack of transferability of the impressive intrusion detection results, they are hardly ever used in production environments. This paper is developed around the intuition that complexity is not necessarily justified because a single autoencoder is enough to obtain similar, if not better, intrusion detection results compared to related proposals. The wide study presented here addresses the effect of the seed, a deep investigation on the training loss, and feature selection across the use of different hardware platforms. The best practices presented, regarding set-up and training, threshold setting, and possible use of feature selection techniques for performance improvement, can be valuable for any future work on the use of autoencoders for successful intrusion detection purposes.
Similar content being viewed by others
Data availability
Publicly available datasets were analyzed in this study. The datasets can be found at the URLs mentioned in the paper.
Notes
The seed is the initial point of the sequence of values generated by the pseudorandom number generator (PRNG).
This can be obtained by calling tf.config.threading.set_inter_op_parallelism_threads(1) and tf.config.threading.set_intra_op_parallelism_threads(1), but this is detrimental for learning times.
References
Apruzzese, G., Pajola, L., & Conti, M. (2022). The cross-evaluation of machine learning-based network intrusion detection systems. IEEE Transactions on Network and Service Management, 19, 5152–5169.
Binbusayyis, A., & Vaiyapuri, T. (2020). Comprehensive analysis and recommendation of feature evaluation measures for intrusion detection. Heliyon, 6, e04262.
Cai, J., Luo, J., Wang, S., & Yang, S. (2018). Feature selection in machine learning: A new perspective. Neurocomputing, 300, 70–79.
Catillo, M., Rak, M., & Villano, U. (2019). Discovery of DoS attacks by the ZED-IDS anomaly detector. Journal of High Speed Networks, 25, 349–365.
Catillo, M., Rak, M., & Villano, U. (2020). 2L-ZED-IDS: A two-level anomaly detector for multiple attack classes. In Web, artificial intelligence and network applications (pp. 687–696). Springer International Publishing.
Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., & Villano, U. (2021a). USB-IDS-1: A public multilayer dataset of labeled network flows for IDS evaluation. In Proc. International Conference on Dependable Systems and Networks Workshops (pp. 1–6). IEEE.
Catillo, M., Pecchia, A., Rak, M., & Villano, U. (2021b). Demystifying the role of public intrusion datasets: A replication study of DoS network traffic data. Computers & Security, 108,
Catillo, M., Del Vecchio, A., Pecchia, A., & Villano, U. (2022a). Transferability of machine learning models learned from public intrusion detection datasets: The CICIDS2017 case study. Software Quality Journal, 30, 955–981.
Catillo, M., Pecchia, A., & Villano, U. (2022b). Simpler is better: On the use of autoencoders for intrusion detection. In Quality of information and communications technology (pp. 223–238). Springer International Publishing.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41, 15.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40, 16–28.
de Carvalho Bertoli, G., Junior, Alves Pereira, L., Saotome, O., & dos Santos, A. L. (2023). Generalizing intrusion detection for heterogeneous networks: A stacked-unsupervised federated learning approach. Computers & Security, 127, 103106.
Dina, A. S., & Manivannan, D. (2021). Intrusion detection based on machine learning techniques in computer networks. Internet of Things, 16, 100462.
Engelen, G., Rimmer, V., & Joosen, W. (2021). Troubleshooting an intrusion detection dataset: The CICIDS2017 case study. In Proc. Security and Privacy Workshops (pp. 7–12). IEEE.
Jiang, J., Han, G., Liu, L., Shu, L., & Guizani, M. (2020). Outlier detection approaches based on machine learning in the Internet-of-Things. IEEE Wireless Communications, 27, 53–59.
Kilincer, I., Ertam, F., & Sengur, A. (2021). Machine learning methods for cyber security intrusion detection: Datasets and comparative study. Computer Networks, 188, 107840.
Kramer, M. A. (1991). Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 37, 233–243.
Kshirsagar, D., & Kumar, S. (2021). An efficient feature reduction method for the detection of DoS attack. ICT Express, 7, 371–375.
Kunang, Y. N., Nurmaini, S., Stiawan, D., Zarkasi, A., Firdaus, & Jasmir (2018). Automatic features extraction using autoencoder in intrusion detection system. In Proc. International Conference on Electrical Engineering and Computer Science (pp. 219–224). IEEE.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature selection: A data perspective. ACM Comput. Surv., 50, 1–45.
Liu, F. T., Ting, K. M., & Zhou, Z. (2008). Isolation forest. In Proc. International Conference on Data Mining (pp. 413–422). IEEE.
Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. (2017). UGR’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computer & Security, 73, 411–424.
Maseer, Z. K., Yusof, R., Bahaman, N., Mostafa, S. A., & Foozy, C. F. M. (2021). Benchmarking of machine learning for anomaly based intrusion detection systems in the CICIDS2017 dataset. IEEE Access, 9, 22351–22370.
Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breitenbacher, D., & Elovici, Y. (2018). N-BaIoT-network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervasive Computing, 17, 12–22.
Mirsky, Y., Doitshman, T., Elovici, Y., & Shabtai, A. (2018). Kitsune: An ensemble of autoencoders for online network intrusion detection. In Proc. International Conference of Network and Distributed System Security Symposium.
Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proc. International Conference Military Communications and Information Systems Conference (pp. 1–6). IEEE.
Panigrahi, R., Borah, S., Bhoi, A. K., Ijaz, M. F., Pramanik, M., Jhaveri, R. H., & Chowdhary, C. L. (2021). Performance assessment of supervised classifiers for designing intrusion detection systems: A comprehensive review and recommendations for future research. Mathematics, 9, 690.
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computer & Security, 86, 147–167.
Roesch, M. (1999). Snort - Lightweight intrusion detection for networks. In Proc. International USENIX Conference on System Administration (p. 229-238). USENIX Association.
Rosay, A., Carlier, F., Cheval, E., & Leroux, P. (2021). From CIC-IDS2017 to LYCOS-IDS2017: A corrected dataset for better performance. In Proc. International Conference on Web Intelligence (pp. 570–575). ACM.
Sharafaldin, I., Lashkari, A. H., & Ghorbani., A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. International Conference on Information Systems Security and Privacy (pp. 108–116). SciTePress.
Solorio-Fernández, S., Carrasco-Ochoa, J. A., & Martìnez-Trinidad, J. F. (2020). A review of unsupervised feature selection methods. Artificial Intelligence Review, 53, 907–948.
Taher, K. A., Mohammed Yasin Jisan, B., & Rahman, M. M. (2019). Network intrusion detection using supervised machine learning technique with feature selection. In Proc. International Conference on Robotics, Electrical and Signal Processing Techniques (pp. 643–646). IEEE.
Verkerken, M., D’Hooge, L., Wauters, T., Volckaert, B., & De Turck, F. (2021). Towards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30, 12.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
Wei-Chao, L., Shih-Wen, K., & Chih-Fong, T. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 78, 13–21.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic.
Wu, J., Wu, Y., Niu, N., & Zhou, M. (2021). MHCPDP: Multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder. IEEE Pervasive Computing, 29, 405–430.
XuKui, L., Wei, C., Qianru, Z., & Lifa, W. (2020). Building auto-encoder intrusion detection system based on random forest feature selection. Computers & Security, 95, 101851.
Zhang, Y., Lee, W., & Huang, Y. (2003). Intrusion detection techniques for mobile wireless networks. Wireless Networks, 9, 545–556.
Zhong, Y., Chen, W., Wang, Z., Chen, Y., Wang, K., Li, Y., Yin, X., Shi, X., Yang, J., & Li, K. (2020). HELAD: A novel network anomaly detection model based on heterogeneous ensemble learning. Computer Networks, 169, 107049.
Author information
Authors and Affiliations
Contributions
Marta Catillo, Antonio Pecchia, and Umberto Villano contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Marta Catillo, Antonio Pecchia and Umberto Villano contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Catillo, M., Pecchia, A. & Villano, U. Successful intrusion detection with a single deep autoencoder: theory and practice. Software Qual J 32, 95–123 (2024). https://doi.org/10.1007/s11219-023-09636-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-023-09636-2