The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set

Khor, Kok-Chin; Ting, Choo-Yee; Phon-Amnuaisuk, Somnuk

doi:10.1007/978-3-319-07692-8_58

Kok-Chin Khor⁵,
Choo-Yee Ting⁵ &
Somnuk Phon-Amnuaisuk⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

1533 Accesses
5 Citations

Abstract

One of the countermeasures taken by security experts against network attacks is by implementing Intrusion Detection Systems (IDS) in computer networks. Researchers often utilize the de facto network intrusion detection data set, KDD Cup 1999, to evaluate proposed IDS in the context of data mining. However, the imbalanced class distribution of the data set leads to a rare class problem. The problem causes low detection (classification) rates for the rare classes, particularly R2L and U2R. Two commonly used sampling methods to mitigate the rare class problem were evaluated in this research, namely, (1) under-sampling and (2) over-sampling. However, these two methods were less effective in mitigating the problem. The reasons of such performance are presented in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ACM KDD Cup 1999. Computer Network Intrusion Detection (1999), http://www.sigkdd.org/kddcup/
Chawla, N.V., Japckowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data sets. ACM SIGKDD Explorations Newsletter 6(1), 1–6 (2004)
Article Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling Imbalanced Data sets: A Review. GESTS International Transactions on Computer Science and Engineering 30, 25–36 (2006)
Google Scholar
Chawla, N.V.: Data Mining for Imbalanced Data sets: An Overview. In: Data Mining and Knowledge Discovery Handbook, 2nd edn., pp. 875–886. Springer Science + Business Media (2000)
Google Scholar
McHugh, J.: Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory. ACM Transactions on Information and System Security 3(4), 262–294 (2000)
Article Google Scholar
Brugger, S.T., Chow, J.: An assessment of the DARPA IDS Evaluation Data set using Snort. Technical Report CSE-2007-1, University of California, Department of Computer Science, Davis, CA (2007)
Google Scholar
Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A Novel Intrusion Detection System Based on Hierachical Clustering and Support Vector Machine. Expert Systems with Applications 38, 306–313 (2011)
Article Google Scholar
Gupta, K.K., Nath, B.: Layered Approach Using Conditional Random Fields for Intrusion Detection. IEEE Transaction on Dependable and Secure Computing 7(1), 35–49 (2010)
Article Google Scholar
Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: A Cascaded Classifier Approach for Improving Detection Rates on Rare Attack Categories in Network Intrusion Detection. Applied Intelligence 36, 320–329 (2012)
Article Google Scholar
Li, Y., Wang, J.L., Tian, Z.H., Lu, T.B., Young, C.: Building Lightweight Intrusion Detection System Using Wrapper-based Feature Selection Mechanisms. Computers & Security 28(6), 466–475 (2009)
Article Google Scholar
Depren, O., Topallar, M., Anarim, E., Kemal Ciliz, M.: An Intelligent Intrusion Detection System (IDS) for Anomaly and Misuse Detection in Computer Networks. Expert Systems with Applications 29(4), 713–722 (2005)
Article Google Scholar
Xiang, C., Png, C.Y., Lim, S.M.: Design of Multiple-level Hybrid Classifiers for Intrusion Detection System Using Bayesian Clustering and Decision Trees. Pattern Recognition 29(7), 918–924 (2008)
Article Google Scholar
Liu, G., Yi, Z., Yang, S.: A Hierarchical Intrusion Detection Model Based on the PCA Neural Networks. Neurocomputing 70(7-9), 1561–1568 (2007)
Article Google Scholar
Agarwal, R., Joshi, M.V.: PNRule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection). Technical Report, No. RC-21719, IBM Research Division (2001)
Google Scholar
Engen, V., Vincent, J., Phalp, K.: Exploring Discrepancies in Findings Obtained with the KDD Cup ’99 Data Set. Journal of Intelligent Data Analysis 15(2), 251–276 (2011)
Google Scholar
Hu, W.M., Hu, W., Maybank, S.: Adaboost-Based Algorithm for Network Intrusion Detection. IEEE Transaction on Systems, Man, and Cybernetics-Part B 38, 577–583 (2008)
Article Google Scholar
Pfahringer, B.: Winning the KDD99 Classification Cup: Bagged Boosting. SIGKDD Explorations 1, 65–66 (2000)
Google Scholar
Bouzida, Y., Cuppens, F.: Detecting Known and Novel Network Intrusions. In: Fischer-Hübner, S., Rannenberg, K., Yngström, L., Lindskog, S. (eds.) Security and Privacy in Dynamic Environments. IFIP, vol. 201, pp. 258–270. Springer, Boston (2006)
Chapter Google Scholar
The University of Waikato, Weka 3, http://www.cs.waikato.ac.nz/ml/weka/
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
MATH Google Scholar
Khor, K.C., Ting, C.Y., Phon-Amnuaisuk, S.: Forming an Optimal Feature Set for Classifying Network Intrusions Involving Multiple Feature Selection Methods. In: International Conference on Information Retrieval and Knowledge Management, pp. 178–182 (2010)
Google Scholar
Chawla, N.V., Hall, L.O., Joshi, A.: Wrapper-based computation and evaluation of sampling methods for imbalanced data sets. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining, pp. 24–33 (2005)
Google Scholar
de Sá, J.P.M.: Pattern Recognition: Concepts, Methods And Applications. Springer, New York (2001)
Google Scholar
Visa, S., Ralescu, A.: Issues in Mining Imbalanced Data Sets - A Review Paper. In: Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67–73 (2005)
Google Scholar
Weiss, G.M.: Mining with Rarity: A Unifying Framework. ACM SIGKDD Explorations Newsletter 6(1), 7–19 (2004)
Article Google Scholar
Bouzida, Y.: Principal Component Analysis for Intrusion Detection and Supervised Learning for New Attack Detection. PhD Thesis (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Multimedia University, Jalan Multimedia, 63100, Cyberjaya, Selangor, Malaysia
Kok-Chin Khor & Choo-Yee Ting
Faculty of Business and Computing, Brunei Institute of Technology, Mukim Gadong A, BE1410, Brunei Darussalam
Somnuk Phon-Amnuaisuk

Authors

Kok-Chin Khor
View author publications
You can also search for this author in PubMed Google Scholar
Choo-Yee Ting
View author publications
You can also search for this author in PubMed Google Scholar
Somnuk Phon-Amnuaisuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kok-Chin Khor .

Editor information

Editors and Affiliations

Department of Information System Faculty of Comp. Sci. & Info. Tech., University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Comp. Sci. and Info. Tech, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Rozaida Ghazali
Faculty of Comp. Sci. and Info. Tech., Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Mustafa Mat Deris

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khor, KC., Ting, CY., Phon-Amnuaisuk, S. (2014). The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-07692-8_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07691-1
Online ISBN: 978-3-319-07692-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics