Skip to main content
Log in

Machine learning-based intrusion detection: feature selection versus feature extraction

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Internet of Things (IoTs) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as \(K=4\). Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Table 14 at the end of Sect. 4. Note that such the comparison between feature selection and feature extraction over UNSW-NB15 as well as theoretical guideline have been overlooked in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

The paper does not include any supporting data.

Notes

  1. Note that several recent works that apply deep learning and blockchain to secure IoT networks can be found in [5,6,7], in the fields of healthcare system, unmanned aerial vehicle and Android malware.

  2. Note that several matrix factorization-based dimensionality reduction methods were developed for gene expression analysis in [21, 22].

References

  1. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., Ayyash, M.: Internet of Things: a survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. 17(4), 2347–2376 (2015)

    Article  Google Scholar 

  2. Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., Faruki, P.: Network intrusion detection for IoT security based on learning techniques. IEEE Commun. Surv. Tutor. 21(3), 2671–2701 (2019)

    Article  Google Scholar 

  3. Kumar, P., Kumar, R., Garg, S., Kaur, K., Zhang, Y., Guizani, M.: A secure data dissemination scheme for IoT-based e-health systems using AI and blockchain. In: GLOBECOM 2022—2022 IEEE Global Communications Conference, 2022, pp. 1397–1403. IEEE (2022)

  4. Mishra, P., Varadharajan, V., Tupakula, U., Pilli, E.S.: A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Commun. Surv. Tutor. 21(1), 686–728 (2019)

    Article  Google Scholar 

  5. Kumar, R., Kumar, P., Aloqaily, M., Aljuhani, A.: Deep learning-based blockchain for secure zero touch networks. IEEE Commun. Mag. 61(2), 96–102 (2022)

    Article  Google Scholar 

  6. Kumar, P., Kumar, R., Gupta, G.P., Tripathi, R., Jolfaei, A., Islam, A.N.: A blockchain-orchestrated deep learning approach for secure data transmission in IoT-enabled healthcare system. J. Parallel Distrib. Comput. 172, 69–83 (2023)

    Article  Google Scholar 

  7. D’Angelo, G., Palmieri, F., Robustelli, A., Castiglione, A.: Effective classification of Android Malware families through dynamic features and neural networks. Connect. Sci. 33(3), 786–801 (2021)

    Article  Google Scholar 

  8. Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  9. KDD Cup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 10 Oct 2022

  10. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, 2009, pp. 1–6 (2009)

  11. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation, pp. 29–36. Association for Computing Machinery, New York (2011)

    Google Scholar 

  12. Amiri, F., Yousefi, M.R., Lucas, C., Shakery, A., Yazdani, N.: Mutual information-based feature selection for intrusion detection systems. J. Netw. Comput. Appl. 34(4), 1184–1199 (2011)

    Article  Google Scholar 

  13. Khammassi, C., Krichen, S.: A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 70, 255–277 (2017)

    Article  Google Scholar 

  14. Aslahi-Shahri, B.M., Rahmani, R., Chizari, M., Maralani, A., Eslami, M., Golkar, M.J., Ebrahimi, A.: A hybrid method consisting of GA and SVM for intrusion detection system. Neural Comput. Appl. 27(6), 1669–1676 (2016)

    Article  Google Scholar 

  15. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6 (2015)

  16. Moustafa, N., Slay, J.: A hybrid feature selection for network intrusion detection systems: central points. arXiv e-prints (2017). arXiv:1707.05505

  17. Tama, B.A., Comuzzi, M., Rhee, K.-H.: TSE-IDS: a two-stage classifier ensemble for intelligent anomaly-based intrusion detection system. IEEE Access 7, 94497–94507 (2019)

    Article  Google Scholar 

  18. Alazzam, H., Sharieh, A., Sabri, K.E.: A feature selection algorithm for intrusion detection system based on Pigeon inspired optimizer. Expert Syst. Appl. 148, 113249 (2020)

    Article  Google Scholar 

  19. Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Sec. J. Glob. Perspect. 25(1–3), 18–31 (2016)

    Article  Google Scholar 

  20. Moustafa, N., Turnbull, B., Choo, K.-K.R.: An ensemble intrusion detection technique based on proposed statistical flow features for protecting network traffic of Internet of Things. IEEE Internet Things J. 6(3), 4815–4830 (2019)

    Article  Google Scholar 

  21. Saberi-Movahed, F., Rostami, M., Berahmand, K., Karami, S., Tiwari, P., Oussalah, M., Band, S.S.: Dual regularized unsupervised feature selection based on matrix factorization and minimum redundancy with application in gene selection. Knowl. Based Syst. 256, 109884 (2022)

    Article  Google Scholar 

  22. Azadifar, S., Rostami, M., Berahmand, K., Moradi, P., Oussalah, M.: Graph-based relevancy–redundancy gene selection method for cancer diagnosis. Comput. Biol. Med. 147, 105766 (2022)

    Article  Google Scholar 

  23. Xu, X., Wang, X.: An adaptive network intrusion detection method based on PCA and support vector machines. In: Proceedings of the First International Conference on Advanced Data Mining and Applications, 2005, pp. 696–703 (2005)

  24. Liu, G., Yi, Z., Yang, S.: A hierarchical intrusion detection model based on the PCA neural networks. Neurocomputing 70(7–9), 1561–1568 (2007)

    Article  Google Scholar 

  25. Kuang, F., Xu, W., Zhang, S.: A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl. Soft Comput. 18, 178–184 (2014)

    Article  Google Scholar 

  26. Sharafaldin, I., Habibi Lashkari, A., Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy—ICISSP, 2018, pp. 108–116 (2018)

  27. Abdulhammed, R., Faezipour, M., Musafer, H., Abuzneid, A.: Efficient network intrusion detection using PCA-based dimensionality reduction of features. In: International Symposium on Networks, Computers and Communications (ISNCC), 2019, pp. 1–6 (2019)

  28. Qi, L., Yang, Y., Zhou, X., Rafique, W., Ma, J.: Fast anomaly identification based on multiaspect data streams for intelligent intrusion detection toward secure Industry 4.0. IEEE Trans. Ind. Inform. 18(9), 6503–6511 (2022)

    Article  Google Scholar 

  29. Tan, Z., Jamdagni, A., He, X., Nanda, P.: Network intrusion detection based on LDA for payload feature selection. In: IEEE GLOBECOM Workshops, 2010, pp. 1545–1549 (2010)

  30. Pajouh, H.H., Javidan, R., Khayami, R., Dehghantanha, A., Choo, K.-K.R.: A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Trans. Emerg. Top. Comput. 7(2), 314–323 (2019)

    Article  Google Scholar 

  31. Pajouh, H.H., Dastghaibyfard, G., Hashemi, S.: Two-tier network anomaly detection model: a machine learning approach. J. Intell. Inf. Syst. 48(1), 61–74 (2017)

    Article  Google Scholar 

  32. Yan, B., Han, G.: Effective feature extraction via stacked sparse autoencoder to improve intrusion detection system. IEEE Access 6, 41238–41248 (2018)

    Article  Google Scholar 

  33. Khan, F.A., Gumaei, A., Derhab, A., Hussain, A.: A novel two-stage deep learning model for efficient network intrusion detection. IEEE Access 7, 30373–30385 (2019)

    Article  Google Scholar 

  34. Popoola, S.I., Adebisi, B., Hammoudeh, M., Gui, G., Gacanin, H.: Hybrid deep learning for botnet attack detection in the Internet-of-Things networks. IEEE Internet Things J. 8(6), 4944–4956 (2021)

    Article  Google Scholar 

  35. Zhou, X., Hu, Y., Liang, W., Ma, J., Jin, Q.: Variational LSTM enhanced anomaly detection for industrial big data. IEEE Trans. Ind. Inform. 17(5), 3469–3477 (2021)

    Article  Google Scholar 

  36. Dao, T.-N., Lee, H.: Stacked autoencoder-based probabilistic feature extraction for on-device network intrusion detection. IEEE Internet Things J. 9(16), 14438–14451 (2022)

    Article  Google Scholar 

  37. D’Angelo, G., Palmieri, F.: Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial–temporal features extraction. J. Netw. Comput. Appl. 173, 102890 (2021)

    Article  Google Scholar 

  38. Hall, M.A.: Correlation-based feature selection for machine learning. PhD Dissertation, The University of Waikato (1999)

  39. Kotsiantis, S.B., et al.: Data preprocessing for supervised learning. Int. J. Comput. Electr. Autom. Control Inf. Eng (2006). https://doi.org/10.5281/zenodo.1082415

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the SSF Framework Grant Serendipity and R &D Project of Brighter Gates AB, Sweden.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

V-DN and T-CV wrote the main manuscript. T VL and HT reviewed and corrected the manuscript.

Corresponding author

Correspondence to Hung Tran.

Ethics declarations

Conflict of interest

All authors declare that they do not have any conflict of interest.

Ethical Approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ngo, VD., Vuong, TC., Van Luong, T. et al. Machine learning-based intrusion detection: feature selection versus feature extraction. Cluster Comput (2023). https://doi.org/10.1007/s10586-023-04089-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-023-04089-5

Keywords

Navigation