Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems

Al-Mandhari, Intisar S.; Guan, L.; Edirisinghe, E. A.

doi:10.1007/978-3-030-03405-4_10

Intisar S. Al-Mandhari¹⁷,
L. Guan¹⁷ &
E. A. Edirisinghe¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 887))

Included in the following conference series:

Future of Information and Communication Conference

985 Accesses
2 Citations

Abstract

Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Modi, C., Patel, D., Borisaniya, B., Patel, A., et al.: A survey on security issues and solutions at different layers of Cloud computing. J. Supercomput. 63(2), 561–592 (2013)
Article Google Scholar
Chen, Y., Sion, R.: On securing untrusted clouds with cryptography. Science 109–114 (2010)
Google Scholar
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection, pp. 305–316 (2010). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5504793&contentType=Conference+Publications&queryText=R.+Sommer+and+V.+Paxson,+Outside+the+Closed+World:+On+Using+Machine++Learning+For+Network+Intrusion+Detection
Naiping, S.N.S., Genyuan, Z.G.Z.: A study on intrusion detection based on data mining. In: International Conference of Information Science and Management Engineering, ISME, vol. 1, pp. 8–15 (2010)
Google Scholar
Almutairi, A.: Intrusion detection using data mining techniques
Google Scholar
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)
Article Google Scholar
Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, (Cisda), pp. 1–6 (2009)
Google Scholar
Tavallaee, M.: An Adaptive Intrusion Detection System. Sdstate.Edu. (2011)
Google Scholar
Thomas, C., Balakrishnan, N.: Performance enhancement of intrusion detection systems using advances in sensor fusion. In: 11th International Conference on Information Fusion, pp. 1–7 (2008)
Google Scholar
Tran, T., et al.: Network intrusion detection using machine learning and voting techniques. In: Machine Learning, pp. 7–10 (2011). http://cdn.intechweb.org/pdfs/10441.pdf
Tsai, C.H., Chang, L.C., Chiang, H.C.: Forecasting of ozone episode days by cost-sensitive neural network methods. Sci. Total Environ. 407(6), 2124–2135 (2009). https://doi.org/10.1016/j.scitotenv.2008.12.007
Article Google Scholar
Troesch, M., Walsh, I.: Machine learning for network intrusion detection, pp. 1–5 (2014)
Google Scholar
Juma, S., et al.: Machine learning techniques for intrusion detection system: a review. J. Theor. Appl. Inf. Theor. 72(3), 422–429 (2015). http://research.ijcaonline.org/volume119/number3/pxc3903678.pdf
Panda, M., et al.: Network intrusion detection system: a machine learning approach. Intell. Decis. Technol. 5(4), 347–356 (2011). http://dx.doi.org/10.3233/IDT-20110117%5Cnhttp://iospress.metapress.com/content/911371h6266k5h4p/
Kubat, M.:. Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994. The Knowledge Engineering Review 13(4), pp. 409–412 (1999). ISBN 0-02-352781-7
Google Scholar
LeCun, Y.A., et al.: Efficient backprop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700 (2012)
Google Scholar
Engen, V.: Machine learning for network based intrusion detection. Int. J. (2010)
Google Scholar
López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
Article Google Scholar
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)
Article Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867 (2005). http://link.springer.com/10.1007/0-387-25465-X_40
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. Science 30(1), 25–36 (2006). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf
Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
Article Google Scholar
Barandela, R., et al.: Strategies for learning in class imbalance problems.pdf. Pattern Recog. 36, 849–851 (2003)
Article Google Scholar
Barandela, R., Sánchez, J.S., Valdovinos, R.M.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Article MathSciNet Google Scholar
Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft. Comput. 14(7), 713–728 (2010)
Article Google Scholar
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)
Article Google Scholar
Wang, J.: Advanced attack tree based intrusion detection (2012)
Google Scholar
Wang, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012). https://doi.org/10.1016/j.patcog.2011.09.004
Article Google Scholar
Batuwita, R., Palade, V.: Class imbalance learning methods for support vector. imbalanced learning: foundations, algorithms, applications, pp. 83–100 (2013)
Google Scholar
García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl. Based Syst. 25(1), 22–34 (2012)
Article Google Scholar
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 55, pp. 155–164 (1999). http://portal.acm.org/citation.cfm?id=312129.312220&type=series
Zhou, Z., Member, S., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Article Google Scholar
Błaszczyński, J., et al.: Integrating selective pre-processing of imbalanced data with Ivotes ensemble. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 6086, pp. 148–157 (2010)
Google Scholar
Chawla, N.V., et al.: SMOTEBoost: improving prediction. In: Lecture Notes in Computer Science, vol. 2838, pp.107–119 (2003)
Google Scholar
Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Set II (2003). http://www.site.uottawa.ca:4321/~nat/Workshop2003/chawla.pdf
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Article Google Scholar
Batuwita, R., Palade, V.: Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (2010)
Google Scholar
Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Article MathSciNet Google Scholar
Fernández, A., del Jesus, M.J., Herrera, F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010). https://doi.org/10.1016/j.ins.2009.12.014
Article MathSciNet Google Scholar
Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, pp. 1–7 (2007). http://storm.cis.fordham.edu/~gweiss/papers/dmin07-weiss.pdf
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111–117 (2000)
Google Scholar
Van Hulse, J.: An empirical comparison of repetitive undersampling techniques, pp. 29–34 (2009)
Google Scholar
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009). https://doi.org/10.1016/j.eswa.2008.05.027
Article Google Scholar
Adamu Teshome, D., Rao, V.S.: A cost sensitive machine learning approach for intrusion detection. Glob. J. Comput. Sci. Technol. 14(6) (2014)
Google Scholar
Choudhury, S., Bhowal, A.: Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), (May), pp. 89–95 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7225395
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Mohammad, M.N., Sulaiman, N., Muhsin, O.A.: A novel Intrusion Detection System by using intelligent data mining in WEKA environment. Procedia Comput. Sci. 3, 1237–1242 (2011). https://doi.org/10.1016/j.procs.2010.12.198
Article Google Scholar
Depren, O., Topallar, M., Anarim E., Ciliz, M.K.: An intelligent intrusion detection system for anomaly and misuse detection in computer networks. Expert Syst. Appl., 29, 713–722 (2005)
Article Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (IJCAI), (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Loughborough University, Loughborough, UK
Intisar S. Al-Mandhari, L. Guan & E. A. Edirisinghe

Authors

Intisar S. Al-Mandhari
View author publications
You can also search for this author in PubMed Google Scholar
L. Guan
View author publications
You can also search for this author in PubMed Google Scholar
E. A. Edirisinghe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Intisar S. Al-Mandhari .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, New Delhi, India
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al-Mandhari, I.S., Guan, L., Edirisinghe, E.A. (2019). Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-03405-4_10
Published: 27 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03404-7
Online ISBN: 978-3-030-03405-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics