Machine learning-based intrusion detection: feature selection versus feature extraction

Ngo, Vu-Duc; Vuong, Tuan-Cuong; Van Luong, Thien; Tran, Hung

doi:10.1007/s10586-023-04089-5

Machine learning-based intrusion detection: feature selection versus feature extraction

Published: 05 July 2023

(2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Vu-Duc Ngo^1,2,
Tuan-Cuong Vuong³,
Thien Van Luong³ &
…
Hung Tran³

405 Accesses
2 Citations
Explore all metrics

Abstract

Internet of Things (IoTs) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as \(K=4\). Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Table 14 at the end of Sect. 4. Note that such the comparison between feature selection and feature extraction over UNSW-NB15 as well as theoretical guideline have been overlooked in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Article Open access 24 February 2024

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Article 08 March 2021

Towards optimized machine-learning-driven intrusion detection for Internet of Things applications

Article 08 May 2024

Data availability

The paper does not include any supporting data.

Notes

Note that several recent works that apply deep learning and blockchain to secure IoT networks can be found in [5,6,7], in the fields of healthcare system, unmanned aerial vehicle and Android malware.
Note that several matrix factorization-based dimensionality reduction methods were developed for gene expression analysis in [21, 22].

References

Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of Things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. 17(4), 2347–2376 (2015)
Article Google Scholar
Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., Faruki, P.: Network intrusion detection for IoT security based on learning techniques. IEEE Commun. Surv. Tutor. 21(3), 2671–2701 (2019)
Article Google Scholar
Kumar, P., Kumar, R., Garg, S., Kaur, K., Zhang, Y., Guizani, M.: A secure data dissemination scheme for IoT-based e-health systems using AI and blockchain. In: GLOBECOM 2022—2022 IEEE Global Communications Conference, 2022, pp. 1397–1403. IEEE (2022)
Mishra, P., Varadharajan, V., Tupakula, U., Pilli, E.S.: A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Commun. Surv. Tutor. 21(1), 686–728 (2019)
Article Google Scholar
Kumar, R., Kumar, P., Aloqaily, M., Aljuhani, A.: Deep learning-based blockchain for secure zero touch networks. IEEE Commun. Mag. 61(2), 96–102 (2022)
Article Google Scholar
Kumar, P., Kumar, R., Gupta, G.P., Tripathi, R., Jolfaei, A., Islam, A.N.: A blockchain-orchestrated deep learning approach for secure data transmission in IoT-enabled healthcare system. J. Parallel Distrib. Comput. 172, 69–83 (2023)
Article Google Scholar
D’Angelo, G., Palmieri, F., Robustelli, A., Castiglione, A.: Effective classification of Android Malware families through dynamic features and neural networks. Connect. Sci. 33(3), 786–801 (2021)
Article Google Scholar
Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016)
Article MathSciNet MATH Google Scholar
KDD Cup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 10 Oct 2022
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009, pp. 1–6 (2009)
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation, pp. 29–36. Association for Computing Machinery, New York (2011)
Google Scholar
Amiri, F., Yousefi, M.R., Lucas, C., Shakery, A., Yazdani, N.: Mutual information-based feature selection for intrusion detection systems. J. Netw. Comput. Appl. 34(4), 1184–1199 (2011)
Article Google Scholar
Khammassi, C., Krichen, S.: A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 70, 255–277 (2017)
Article Google Scholar
Aslahi-Shahri, B.M., Rahmani, R., Chizari, M., Maralani, A., Eslami, M., Golkar, M.J., Ebrahimi, A.: A hybrid method consisting of GA and SVM for intrusion detection system. Neural Comput. Appl. 27(6), 1669–1676 (2016)
Article Google Scholar
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6 (2015)
Moustafa, N., Slay, J.: A hybrid feature selection for network intrusion detection systems: central points. arXiv e-prints (2017). arXiv:1707.05505
Tama, B.A., Comuzzi, M., Rhee, K.-H.: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7, 94497–94507 (2019)
Article Google Scholar
Alazzam, H., Sharieh, A., Sabri, K.E.: A feature selection algorithm for intrusion detection system based on Pigeon inspired optimizer. Expert Syst. Appl. 148, 113249 (2020)
Article Google Scholar
Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Sec. J. Glob. Perspect. 25(1–3), 18–31 (2016)
Article Google Scholar
Moustafa, N., Turnbull, B., Choo, K.-K.R.: An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of Internet of Things. IEEE Internet Things J. 6(3), 4815–4830 (2019)
Article Google Scholar
Saberi-Movahed, F., Rostami, M., Berahmand, K., Karami, S., Tiwari, P., Oussalah, M., Band, S.S.: Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl. Based Syst. 256, 109884 (2022)
Article Google Scholar
Azadifar, S., Rostami, M., Berahmand, K., Moradi, P., Oussalah, M.: Graph-based relevancy–redundancy gene selection method for cancer diagnosis. Comput. Biol. Med. 147, 105766 (2022)
Article Google Scholar
Xu, X., Wang, X.: An adaptive network intrusion detection method based on PCA and support vector machines. In: Proceedings of the First International Conference on Advanced Data Mining and Applications, 2005, pp. 696–703 (2005)
Liu, G., Yi, Z., Yang, S.: A hierarchical intrusion detection model based on the PCA neural networks. Neurocomputing 70(7–9), 1561–1568 (2007)
Article Google Scholar
Kuang, F., Xu, W., Zhang, S.: A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl. Soft Comput. 18, 178–184 (2014)
Article Google Scholar
Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy—ICISSP, 2018, pp. 108–116 (2018)
Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A.: Efficient network intrusion detection using PCA-based dimensionality reduction of features. In: International Symposium on Networks, Computers and Communications (ISNCC), 2019, pp. 1–6 (2019)
Qi, L., Yang, Y., Zhou, X., Rafique, W., Ma, J.: Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure Industry 4.0. IEEE Trans. Ind. Inform. 18(9), 6503–6511 (2022)
Article Google Scholar
Tan, Z., Jamdagni, A., He, X., Nanda, P.: Network intrusion detection based on LDA for payload feature selection. In: IEEE GLOBECOM Workshops, 2010, pp. 1545–1549 (2010)
Pajouh, H.H., Javidan, R., Khayami, R., Dehghantanha, A., Choo, K.-K.R.: A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Trans. Emerg. Top. Comput. 7(2), 314–323 (2019)
Article Google Scholar
Pajouh, H.H., Dastghaibyfard, G., Hashemi, S.: Two-tier network anomaly detection model: a machine learning approach. J. Intell. Inf. Syst. 48(1), 61–74 (2017)
Article Google Scholar
Yan, B., Han, G.: Effective feature extraction via stacked sparse autoencoder to improve intrusion detection system. IEEE Access 6, 41238–41248 (2018)
Article Google Scholar
Khan, F.A., Gumaei, A., Derhab, A., Hussain, A.: A novel two-stage deep learning model for efficient network intrusion detection. IEEE Access 7, 30373–30385 (2019)
Article Google Scholar
Popoola, S.I., Adebisi, B., Hammoudeh, M., Gui, G., Gacanin, H.: Hybrid deep learning for botnet attack detection in the Internet-of-Things networks. IEEE Internet Things J. 8(6), 4944–4956 (2021)
Article Google Scholar
Zhou, X., Hu, Y., Liang, W., Ma, J., Jin, Q.: Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Ind. Inform. 17(5), 3469–3477 (2021)
Article Google Scholar
Dao, T.-N., Lee, H.: Stacked autoencoder-based probabilistic feature extraction for on-device network intrusion detection. IEEE Internet Things J. 9(16), 14438–14451 (2022)
Article Google Scholar
D’Angelo, G., Palmieri, F.: Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial–temporal features extraction. J. Netw. Comput. Appl. 173, 102890 (2021)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. PhD Dissertation, The University of Waikato (1999)
Kotsiantis, S.B., et al.: Data preprocessing for supervised learning. Int. J. Comput. Electr. Autom. Control Inf. Eng (2006). https://doi.org/10.5281/zenodo.1082415
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the SSF Framework Grant Serendipity and R &D Project of Brighter Gates AB, Sweden.

Funding

Not applicable.

Author information

Authors and Affiliations

Research and Development Center, MobiFone Corporation, Hanoi, 11312, Vietnam
Vu-Duc Ngo
School of Electronics and Electrical Engineering, Hanoi University of Science and Technology, Hanoi, 11657, Vietnam
Vu-Duc Ngo
Faculty of Computer Science, Phenikaa University, Hanoi, 12116, Vietnam
Tuan-Cuong Vuong, Thien Van Luong & Hung Tran

Authors

Vu-Duc Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Tuan-Cuong Vuong
View author publications
You can also search for this author in PubMed Google Scholar
Thien Van Luong
View author publications
You can also search for this author in PubMed Google Scholar
Hung Tran
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V-DN and T-CV wrote the main manuscript. T VL and HT reviewed and corrected the manuscript.

Corresponding author

Correspondence to Hung Tran.

Ethics declarations

Conflict of interest

All authors declare that they do not have any conflict of interest.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ngo, VD., Vuong, TC., Van Luong, T. et al. Machine learning-based intrusion detection: feature selection versus feature extraction. Cluster Comput (2023). https://doi.org/10.1007/s10586-023-04089-5

Download citation

Received: 02 December 2022
Revised: 05 June 2023
Accepted: 11 June 2023
Published: 05 July 2023
DOI: https://doi.org/10.1007/s10586-023-04089-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning-based intrusion detection: feature selection versus feature extraction

Abstract

Access this article

Similar content being viewed by others

Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Towards optimized machine-learning-driven intrusion detection for Internet of Things applications

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning-based intrusion detection: feature selection versus feature extraction

Abstract

Access this article

Similar content being viewed by others

Optimizing IoT intrusion detection system: feature selection versus feature extraction in machine learning

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Towards optimized machine-learning-driven intrusion detection for Internet of Things applications

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation