Skip to main content

The Effect of Sampling Methods on the CICIDS2017 Network Intrusion Data Set

  • Conference paper
  • First Online:
IT Convergence and Security

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 782))

Abstract

Handling unbalanced intrusion detection data sets are difficult as minority intrusion classes may not be easy to detect. One of the possible causes of the problem is the characteristic of learning algorithms that usually favour majority classes in data sets. The contribution of this study is to improve the detection rate for intrusions in the unbalanced CICIDS2017 data set by using sampling techniques. We evaluated Random Under-Sampling (RUS), Synthetic Minority Over-sampling Technique (SMOTE) and the combination of RUS and SMOTE. After applying the sampling techniques, we performed intrusion detection and used the accuracy plus True Positive Rate (TPR) as the evaluation metrics for the detection results. The results showed that RUS gave the best detection performance overall. Besides, 12 out of the 15 classes, including some hard-to-detect minority classes, were detected with result improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterisation. In: ICISSP 2018–Proc. 4th Int. Conf. Inf. Syst. Secur. Priv. 2018-Janua. pp 108–116

    Google Scholar 

  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  3. Kalid SN, Ng K, Tong G, Khor K (2020) A multiple classifiers system for anomaly detection in credit card data with unbalanced and overlapped classes. IEEE Access 8:28210–28221

    Article  Google Scholar 

  4. Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6:7–19

    Article  Google Scholar 

  5. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Prog Artif Intell 5:221–232

    Article  Google Scholar 

  6. Das B, Krishnan NC, Cook DJ (2014) Handling imbalanced and overlapping classes in smart environments prompting dataset

    Google Scholar 

  7. Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity : why under-sampling beats over-sampling

    Google Scholar 

  8. Sáez JA, Luengo J, Stefanowski J, Herrera F (2014) Managing borderline and noisy examples in imbalanced classification by combining smote with ensemble filtering. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 8669 LNCS. pp 61–68

    Google Scholar 

  9. Dal Pozzolo A, Caelen O, Bontempi G (2010) Comparison of balancing techniques for unbalanced datasets. Mach Learn Gr Univ Libr Bruxelles Belgium 16:732–735

    Google Scholar 

  10. Seiffert C, Khoshgoftaar TM, Van Hulse J (2009) Hybrid sampling for imbalanced data. Integr Comput Aided Eng 16:193–210

    Article  Google Scholar 

  11. Abdulrahman AA, Ibrahem MK (2019) Evaluation of DDoS attacks detection in a new intrusion dataset based on classification algorithms. Iraqi J Inform Commun Technol 1:49–55

    Article  Google Scholar 

  12. Toupas P, Chamou D, Giannoutakis KM, Drosou A, Tzovaras D (2019) An intrusion detection system for multi-class classification based on deep neural networks. In: Proc.–18th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2019. pp 1253–1258

    Google Scholar 

  13. Yong Y (2012) The research of imbalanced data set of sample sampling method based on K-Means cluster and genetic algorithm. Energy Procedia 17:164–170

    Article  Google Scholar 

  14. Zhang Y, Chen XU, Jin LEI, Wang X, Guo DA (2019) Network intrusion detection: based on deep hierarchical network and original flow data. IEEE Access 7:37004–37016

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the UTAR Research Grant (IPSR/RMC/UTARRF/2019-C2/K02) provided by the Universiti Tunku Abdul Rahman, Malaysia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan-Bing Ho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ho, YB., Yap, WS., Khor, KC. (2021). The Effect of Sampling Methods on the CICIDS2017 Network Intrusion Data Set. In: Kim, H., Kim, K.J. (eds) IT Convergence and Security. Lecture Notes in Electrical Engineering, vol 782. Springer, Singapore. https://doi.org/10.1007/978-981-16-4118-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-4118-3_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-4117-6

  • Online ISBN: 978-981-16-4118-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics