Skip to main content

The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set

  • Conference paper
Recent Advances on Soft Computing and Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

Abstract

One of the countermeasures taken by security experts against network attacks is by implementing Intrusion Detection Systems (IDS) in computer networks. Researchers often utilize the de facto network intrusion detection data set, KDD Cup 1999, to evaluate proposed IDS in the context of data mining. However, the imbalanced class distribution of the data set leads to a rare class problem. The problem causes low detection (classification) rates for the rare classes, particularly R2L and U2R. Two commonly used sampling methods to mitigate the rare class problem were evaluated in this research, namely, (1) under-sampling and (2) over-sampling. However, these two methods were less effective in mitigating the problem. The reasons of such performance are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. ACM KDD Cup 1999. Computer Network Intrusion Detection (1999), http://www.sigkdd.org/kddcup/

  2. Chawla, N.V., Japckowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)

    Article  Google Scholar 

  3. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling Imbalanced Data sets: A Review. GESTS International Transactions on Computer Science and Engineering 30, 25–36 (2006)

    Google Scholar 

  4. Chawla, N.V.: Data Mining for Imbalanced Data sets: An Overview. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 875–886. Springer Science + Business Media (2000)

    Google Scholar 

  5. McHugh, J.: Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory. ACM Transactions on Information and System Security 3(4), 262–294 (2000)

    Article  Google Scholar 

  6. Brugger, S.T., Chow, J.: An assessment of the DARPA IDS Evaluation Data set using Snort. Technical Report CSE-2007-1, University of California, Department of Computer Science, Davis, CA (2007)

    Google Scholar 

  7. Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A Novel Intrusion Detection System Based on Hierachical Clustering and Support Vector Machine. Expert Systems with Applications 38, 306–313 (2011)

    Article  Google Scholar 

  8. Gupta, K.K., Nath, B.: Layered Approach Using Conditional Random Fields for Intrusion Detection. IEEE Transaction on Dependable and Secure Computing 7(1), 35–49 (2010)

    Article  Google Scholar 

  9. Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: A Cascaded Classifier Approach for Improving Detection Rates on Rare Attack Categories in Network Intrusion Detection. Applied Intelligence 36, 320–329 (2012)

    Article  Google Scholar 

  10. Li, Y., Wang, J.L., Tian, Z.H., Lu, T.B., Young, C.: Building Lightweight Intrusion Detection System Using Wrapper-based Feature Selection Mechanisms. Computers & Security 28(6), 466–475 (2009)

    Article  Google Scholar 

  11. Depren, O., Topallar, M., Anarim, E., Kemal Ciliz, M.: An Intelligent Intrusion Detection System (IDS) for Anomaly and Misuse Detection in Computer Networks. Expert Systems with Applications 29(4), 713–722 (2005)

    Article  Google Scholar 

  12. Xiang, C., Png, C.Y., Lim, S.M.: Design of Multiple-level Hybrid Classifiers for Intrusion Detection System Using Bayesian Clustering and Decision Trees. Pattern Recognition 29(7), 918–924 (2008)

    Article  Google Scholar 

  13. Liu, G., Yi, Z., Yang, S.: A Hierarchical Intrusion Detection Model Based on the PCA Neural Networks. Neurocomputing 70(7-9), 1561–1568 (2007)

    Article  Google Scholar 

  14. Agarwal, R., Joshi, M.V.: PNRule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection). Technical Report, No. RC-21719, IBM Research Division (2001)

    Google Scholar 

  15. Engen, V., Vincent, J., Phalp, K.: Exploring Discrepancies in Findings Obtained with the KDD Cup ’99 Data Set. Journal of Intelligent Data Analysis 15(2), 251–276 (2011)

    Google Scholar 

  16. Hu, W.M., Hu, W., Maybank, S.: Adaboost-Based Algorithm for Network Intrusion Detection. IEEE Transaction on Systems, Man, and Cybernetics-Part B 38, 577–583 (2008)

    Article  Google Scholar 

  17. Pfahringer, B.: Winning the KDD99 Classification Cup: Bagged Boosting. SIGKDD Explorations 1, 65–66 (2000)

    Google Scholar 

  18. Bouzida, Y., Cuppens, F.: Detecting Known and Novel Network Intrusions. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) Security and Privacy in Dynamic Environments. IFIP, vol. 201, pp. 258–270. Springer, Boston (2006)

    Chapter  Google Scholar 

  19. The University of Waikato, Weka 3, http://www.cs.waikato.ac.nz/ml/weka/

  20. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  21. Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: Forming an Optimal Feature Set for Classifying Network Intrusions Involving Multiple Feature Selection Methods. In: International Conference on Information Retrieval and Knowledge Management, pp. 178–182 (2010)

    Google Scholar 

  22. Chawla, N.V., Hall, L.O., Joshi, A.: Wrapper-based computation and evaluation of sampling methods for imbalanced data sets. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining, pp. 24–33 (2005)

    Google Scholar 

  23. de Sá, J.P.M.: Pattern Recognition: Concepts, Methods And Applications. Springer, New York (2001)

    Google Scholar 

  24. Visa, S., Ralescu, A.: Issues in Mining Imbalanced Data Sets - A Review Paper. In: Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005)

    Google Scholar 

  25. Weiss, G.M.: Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)

    Article  Google Scholar 

  26. Bouzida, Y.: Principal Component Analysis for Intrusion Detection and Supervised Learning for New Attack Detection. PhD Thesis (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kok-Chin Khor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Khor, KC., Ting, CY., Phon-Amnuaisuk, S. (2014). The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_58

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07692-8_58

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07691-1

  • Online ISBN: 978-3-319-07692-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics