Advertisement

Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems

  • Intisar S. Al-MandhariEmail author
  • L. Guan
  • E. A. Edirisinghe
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 887)

Abstract

Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.

Keywords

Cloud computing Data mining Intrusion detection Imbalanced dataset 

References

  1. 1.
    Modi, C., Patel, D., Borisaniya, B., Patel, A., et al.: A survey on security issues and solutions at different layers of Cloud computing. J. Supercomput. 63(2), 561–592 (2013)CrossRefGoogle Scholar
  2. 2.
    Chen, Y., Sion, R.: On securing untrusted clouds with cryptography. Science 109–114 (2010)Google Scholar
  3. 3.
  4. 4.
    Naiping, S.N.S., Genyuan, Z.G.Z.: A study on intrusion detection based on data mining. In: International Conference of Information Science and Management Engineering, ISME, vol. 1, pp. 8–15 (2010)Google Scholar
  5. 5.
    Almutairi, A.: Intrusion detection using data mining techniquesGoogle Scholar
  6. 6.
    McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)CrossRefGoogle Scholar
  7. 7.
    Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, (Cisda), pp. 1–6 (2009)Google Scholar
  8. 8.
    Tavallaee, M.: An Adaptive Intrusion Detection System. Sdstate.Edu. (2011)Google Scholar
  9. 9.
    Thomas, C., Balakrishnan, N.: Performance enhancement of intrusion detection systems using advances in sensor fusion. In: 11th International Conference on Information Fusion, pp. 1–7 (2008)Google Scholar
  10. 10.
    Tran, T., et al.: Network intrusion detection using machine learning and voting techniques. In: Machine Learning, pp. 7–10 (2011). http://cdn.intechweb.org/pdfs/10441.pdf
  11. 11.
    Tsai, C.H., Chang, L.C., Chiang, H.C.: Forecasting of ozone episode days by cost-sensitive neural network methods. Sci. Total Environ. 407(6), 2124–2135 (2009).  https://doi.org/10.1016/j.scitotenv.2008.12.007CrossRefGoogle Scholar
  12. 12.
    Troesch, M., Walsh, I.: Machine learning for network intrusion detection, pp. 1–5 (2014)Google Scholar
  13. 13.
    Juma, S., et al.: Machine learning techniques for intrusion detection system: a review. J. Theor. Appl. Inf. Theor. 72(3), 422–429 (2015). http://research.ijcaonline.org/volume119/number3/pxc3903678.pdf
  14. 14.
    Panda, M., et al.: Network intrusion detection system: a machine learning approach. Intell. Decis. Technol. 5(4), 347–356 (2011). http://dx.doi.org/10.3233/IDT-20110117%5Cnhttp://iospress.metapress.com/content/911371h6266k5h4p/CrossRefGoogle Scholar
  15. 15.
    Kubat, M.:. Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994. The Knowledge Engineering Review 13(4), pp. 409–412 (1999). ISBN 0-02-352781-7Google Scholar
  16. 16.
    LeCun, Y.A., et al.: Efficient backprop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700 (2012)Google Scholar
  17. 17.
    Engen, V.: Machine learning for network based intrusion detection. Int. J. (2010)Google Scholar
  18. 18.
    López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013).  https://doi.org/10.1016/j.ins.2013.07.007CrossRefGoogle Scholar
  19. 19.
    Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)CrossRefGoogle Scholar
  20. 20.
    Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867 (2005). http://link.springer.com/10.1007/0-387-25465-X_40
  21. 21.
    Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. Science 30(1), 25–36 (2006). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf
  22. 22.
    Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)CrossRefGoogle Scholar
  23. 23.
    Barandela, R., et al.: Strategies for learning in class imbalance problems.pdf. Pattern Recog. 36, 849–851 (2003)CrossRefGoogle Scholar
  24. 24.
    Barandela, R., Sánchez, J.S., Valdovinos, R.M.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft. Comput. 14(7), 713–728 (2010)CrossRefGoogle Scholar
  26. 26.
    Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)CrossRefGoogle Scholar
  27. 27.
    Wang, J.: Advanced attack tree based intrusion detection (2012)Google Scholar
  28. 28.
    Wang, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012).  https://doi.org/10.1016/j.patcog.2011.09.004CrossRefGoogle Scholar
  29. 29.
    Batuwita, R., Palade, V.: Class imbalance learning methods for support vector. imbalanced learning: foundations, algorithms, applications, pp. 83–100 (2013)CrossRefGoogle Scholar
  30. 30.
    García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl. Based Syst. 25(1), 22–34 (2012)CrossRefGoogle Scholar
  31. 31.
    Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 55, pp. 155–164 (1999). http://portal.acm.org/citation.cfm?id=312129.312220&type=series
  32. 32.
    Zhou, Z., Member, S., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)CrossRefGoogle Scholar
  33. 33.
    Błaszczyński, J., et al.: Integrating selective pre-processing of imbalanced data with Ivotes ensemble. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 6086, pp. 148–157 (2010)CrossRefGoogle Scholar
  34. 34.
    Chawla, N.V., et al.: SMOTEBoost: improving prediction. In: Lecture Notes in Computer Science, vol. 2838, pp.107–119 (2003)Google Scholar
  35. 35.
    Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Set II (2003). http://www.site.uottawa.ca:4321/~nat/Workshop2003/chawla.pdf
  36. 36.
    Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)CrossRefGoogle Scholar
  37. 37.
    Batuwita, R., Palade, V.: Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (2010)Google Scholar
  38. 38.
    Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Fernández, A., del Jesus, M.J., Herrera, F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010).  https://doi.org/10.1016/j.ins.2009.12.014MathSciNetCrossRefGoogle Scholar
  40. 40.
    Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, pp. 1–7 (2007). http://storm.cis.fordham.edu/~gweiss/papers/dmin07-weiss.pdf
  41. 41.
    Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111–117 (2000)Google Scholar
  42. 42.
    Van Hulse, J.: An empirical comparison of repetitive undersampling techniques, pp. 29–34 (2009)Google Scholar
  43. 43.
    Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  44. 44.
    Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009).  https://doi.org/10.1016/j.eswa.2008.05.027CrossRefGoogle Scholar
  45. 45.
    Adamu Teshome, D., Rao, V.S.: A cost sensitive machine learning approach for intrusion detection. Glob. J. Comput. Sci. Technol. 14(6) (2014)Google Scholar
  46. 46.
    Choudhury, S., Bhowal, A.: Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), (May), pp. 89–95 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7225395
  47. 47.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  48. 48.
    Mohammad, M.N., Sulaiman, N., Muhsin, O.A.: A novel Intrusion Detection System by using intelligent data mining in WEKA environment. Procedia Comput. Sci. 3, 1237–1242 (2011).  https://doi.org/10.1016/j.procs.2010.12.198CrossRefGoogle Scholar
  49. 49.
    Depren, O., Topallar, M., Anarim E., Ciliz, M.K.: An intelligent intrusion detection system for anomaly and misuse detection in computer networks. Expert Syst. Appl., 29, 713–722 (2005)CrossRefGoogle Scholar
  50. 50.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (IJCAI), (1995)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Intisar S. Al-Mandhari
    • 1
    Email author
  • L. Guan
    • 1
  • E. A. Edirisinghe
    • 1
  1. 1.Department of Computer ScienceLoughborough UniversityLoughboroughUK

Personalised recommendations