Abstract
Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Modi, C., Patel, D., Borisaniya, B., Patel, A., et al.: A survey on security issues and solutions at different layers of Cloud computing. J. Supercomput. 63(2), 561–592 (2013)
Chen, Y., Sion, R.: On securing untrusted clouds with cryptography. Science 109–114 (2010)
Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection, pp. 305–316 (2010). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5504793&contentType=Conference+Publications&queryText=R.+Sommer+and+V.+Paxson,+Outside+the+Closed+World:+On+Using+Machine++Learning+For+Network+Intrusion+Detection
Naiping, S.N.S., Genyuan, Z.G.Z.: A study on intrusion detection based on data mining. In: International Conference of Information Science and Management Engineering, ISME, vol. 1, pp. 8–15 (2010)
Almutairi, A.: Intrusion detection using data mining techniques
McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)
Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, (Cisda), pp. 1–6 (2009)
Tavallaee, M.: An Adaptive Intrusion Detection System. Sdstate.Edu. (2011)
Thomas, C., Balakrishnan, N.: Performance enhancement of intrusion detection systems using advances in sensor fusion. In: 11th International Conference on Information Fusion, pp. 1–7 (2008)
Tran, T., et al.: Network intrusion detection using machine learning and voting techniques. In: Machine Learning, pp. 7–10 (2011). http://cdn.intechweb.org/pdfs/10441.pdf
Tsai, C.H., Chang, L.C., Chiang, H.C.: Forecasting of ozone episode days by cost-sensitive neural network methods. Sci. Total Environ. 407(6), 2124–2135 (2009). https://doi.org/10.1016/j.scitotenv.2008.12.007
Troesch, M., Walsh, I.: Machine learning for network intrusion detection, pp. 1–5 (2014)
Juma, S., et al.: Machine learning techniques for intrusion detection system: a review. J. Theor. Appl. Inf. Theor. 72(3), 422–429 (2015). http://research.ijcaonline.org/volume119/number3/pxc3903678.pdf
Panda, M., et al.: Network intrusion detection system: a machine learning approach. Intell. Decis. Technol. 5(4), 347–356 (2011). http://dx.doi.org/10.3233/IDT-20110117%5Cnhttp://iospress.metapress.com/content/911371h6266k5h4p/
Kubat, M.:. Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994. The Knowledge Engineering Review 13(4), pp. 409–412 (1999). ISBN 0-02-352781-7
LeCun, Y.A., et al.: Efficient backprop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700 (2012)
Engen, V.: Machine learning for network based intrusion detection. Int. J. (2010)
López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867 (2005). http://link.springer.com/10.1007/0-387-25465-X_40
Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. Science 30(1), 25–36 (2006). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf
Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)
Barandela, R., et al.: Strategies for learning in class imbalance problems.pdf. Pattern Recog. 36, 849–851 (2003)
Barandela, R., Sánchez, J.S., Valdovinos, R.M.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)
Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft. Comput. 14(7), 713–728 (2010)
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)
Wang, J.: Advanced attack tree based intrusion detection (2012)
Wang, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012). https://doi.org/10.1016/j.patcog.2011.09.004
Batuwita, R., Palade, V.: Class imbalance learning methods for support vector. imbalanced learning: foundations, algorithms, applications, pp. 83–100 (2013)
GarcÃa-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl. Based Syst. 25(1), 22–34 (2012)
Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 55, pp. 155–164 (1999). http://portal.acm.org/citation.cfm?id=312129.312220&type=series
Zhou, Z., Member, S., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Błaszczyński, J., et al.: Integrating selective pre-processing of imbalanced data with Ivotes ensemble. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 6086, pp. 148–157 (2010)
Chawla, N.V., et al.: SMOTEBoost: improving prediction. In: Lecture Notes in Computer Science, vol. 2838, pp.107–119 (2003)
Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Set II (2003). http://www.site.uottawa.ca:4321/~nat/Workshop2003/chawla.pdf
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)
Batuwita, R., Palade, V.: Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (2010)
Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)
Fernández, A., del Jesus, M.J., Herrera, F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010). https://doi.org/10.1016/j.ins.2009.12.014
Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, pp. 1–7 (2007). http://storm.cis.fordham.edu/~gweiss/papers/dmin07-weiss.pdf
Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111–117 (2000)
Van Hulse, J.: An empirical comparison of repetitive undersampling techniques, pp. 29–34 (2009)
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009). https://doi.org/10.1016/j.eswa.2008.05.027
Adamu Teshome, D., Rao, V.S.: A cost sensitive machine learning approach for intrusion detection. Glob. J. Comput. Sci. Technol. 14(6) (2014)
Choudhury, S., Bhowal, A.: Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), (May), pp. 89–95 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7225395
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Mohammad, M.N., Sulaiman, N., Muhsin, O.A.: A novel Intrusion Detection System by using intelligent data mining in WEKA environment. Procedia Comput. Sci. 3, 1237–1242 (2011). https://doi.org/10.1016/j.procs.2010.12.198
Depren, O., Topallar, M., Anarim E., Ciliz, M.K.: An intelligent intrusion detection system for anomaly and misuse detection in computer networks. Expert Syst. Appl., 29, 713–722 (2005)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (IJCAI), (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Al-Mandhari, I.S., Guan, L., Edirisinghe, E.A. (2019). Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-03405-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03404-7
Online ISBN: 978-3-030-03405-4
eBook Packages: EngineeringEngineering (R0)