Skip to main content

Towards a Standard Feature Set for Network Intrusion Detection System Datasets

Abstract

Network Intrusion Detection Systems (NIDSs) are important tools for the protection of computer networks against increasingly frequent and sophisticated cyber attacks. Recently, a lot of research effort has been dedicated to the development of Machine Learning (ML) based NIDSs. As in any ML-based application, the availability of high-quality datasets is critical for the training and evaluation of ML-based NIDS. One of the key problems with the currently available NIDS datasets is the lack of a standard feature set. The use of a unique and proprietary set of features for each of the publicly available datasets makes it virtually impossible to compare the performance of ML-based traffic classifiers on different datasets, and hence to evaluate the ability of these systems to generalise across different network scenarios. To address that limitation, this paper proposes and evaluates standard NIDS feature sets based on the NetFlow network meta-data collection protocol and system. We evaluate and compare two NetFlow-based feature set variants, a version with 12 features, and another one with 43 features. For our evaluation, we converted four widely used NIDS datasets (UNSW-NB15, BoT-IoT, ToN-IoT, CSE-CIC-IDS2018) into new variants with our proposed NetFlow based feature sets. Based on an Extra Tree classifier, we compared the classification performance of the NetFlow-based feature sets with the proprietary feature sets provided with the original datasets. While the smaller feature set cannot match the classification performance of the proprietary feature sets, the larger set with 43 NetFlow features, surprisingly achieves a consistently higher classification performance compared to the original feature set, which was tailored to each of the considered NIDS datasets. The proposed NetFlow-based NIDS feature set, together with four benchmark datasets, made available to the research community, allow a fair comparison of ML-based network traffic classifiers across different NIDS datasets. We believe that having a standard feature set is critical for allowing a more rigorous and thorough evaluation of ML-based NIDSs and that it can help bridge the gap between academic research and the practical deployment of such systems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Alsaedi A, Moustafa N, Tari Z, Mahmood A, Anwar A (2020) Ton_iot telemetry dataset: A new generation dataset of iot and iiot for data-driven intrusion detection systems. IEEE Access 8:165130–165150. https://doi.org/10.1109/ACCESS.2020.3022862

    Article  Google Scholar 

  2. 2.

    Binbusayyis A, Vaiyapuri T (2019) Identifying and benchmarking key features for cyber intrusion detection: An ensemble approach, vol 7. https://doi.org/10.1109/access.2019.2929487

  3. 3.

    Cisco Systems (2011) Cisco IOS NetFlow Version 9 Flow-Record Format - White Paper https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.pdf

  4. 4.

    Claise B, Sadasivan G, Valluri V, Djernaes M (2004) Cisco systems netflow services export version, 9

  5. 5.

    Garcia-Teodoro P, Diaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput Secur 28(1-2):18–28

    Article  Google Scholar 

  6. 6.

    Kerr DR, Bruins BL (2001) Network flow switching and flow data export

  7. 7.

    Koroniotis N, Moustafa N, Sitnikova E, Turnbull B (2018) Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset. arXiv:1811.00701

  8. 8.

    Li B, Springer J, Bebis G, Hadi Gunes M (2013) A survey of network flow applications. J Netw Comput Appl 36(2):567–581. https://doi.org/10.1016/j.jnca.2012.12.020

    Article  Google Scholar 

  9. 9.

    Modi CN, Patel DR, Patel A, Muttukrishnan R (2012) Bayesian classifier and snort based network intrusion detection system in cloud computing. In: 2012 Third international conference on computing, communication and networking technologies (ICCCNT’12). IEEE, pp 1–7

  10. 10.

    Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set) 2015 Military Communications and Information Systems Conference (MilCIS). https://doi.org/10.1109/milcis.2015.7348942

  11. 11.

    Ntop (2017) nProbe, An Extensible NetFlow v5/v9/IPFIX Probe for IPv4/v6. https://www.ntop.org/guides/nprobe/cli_options.html

  12. 12.

    Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167. https://doi.org/10.1016/j.cose.2019.06.005

    Article  Google Scholar 

  13. 13.

    Sahu SK, Sarangi S, Jena SK (2014) A detail analysis on intrusion detection datasets. In: 2014 IEEE International advance computing conference (IACC). https://doi.org/10.1109/IAdCC.2014.6779523, pp 1348–1353

  14. 14.

    Sarhan M (2020) Netflow datasets. http://staff.itee.uq.edu.au/marius/NIDS_datasets/

  15. 15.

    Sarhan M, Layeghy S, Moustafa N, Portmann M (2020) Netflow datasets for machine learning-based network intrusion detection systems. arXiv:2011.09144

  16. 16.

    Sharafaldin I, Habibi Lashkari A, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th international conference on information systems security and privacy. https://doi.org/10.5220/0006639801080116. https://registry.opendata.aws/cse-cic-ids2018/

  17. 17.

    Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 31(3):357–374. https://doi.org/10.1016/j.cose.2011.12.012. http://www.sciencedirect.com/science/article/pii/S0167404811001672

    Article  Google Scholar 

  18. 18.

    Sommer R, Paxson V (2010) Outside the closed world: On using machine learning for network intrusion detection. In: 2010 IEEE Symposium on security and privacy. https://doi.org/10.1109/sp.2010.25

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mohanad Sarhan.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sarhan, M., Layeghy, S. & Portmann, M. Towards a Standard Feature Set for Network Intrusion Detection System Datasets. Mobile Netw Appl (2021). https://doi.org/10.1007/s11036-021-01843-0

Download citation

Keywords

  • Machine learning
  • NetFlow
  • Network intrusion detection system