Skip to main content

Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems

  • Conference paper
  • First Online:
Advances in Information and Communication Networks (FICC 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 887))

Included in the following conference series:

Abstract

Research into the use of machine learning techniques for network intrusion detection, especially carried out with respect to the popular public dataset, KDD cup 99, have become commonplace during the past decade. The recent popularity of cloud-based computing and the realization of the associated risks are the main reasons for this research thrust. The proposed research demonstrates that machine learning algorithms can be effectively used to enhance the performance of existing intrusion detection systems despite the high misclassification rates reported in the literature. This paper reports on an empirical investigation to determine the underlying causes of the poor performance of some of the well-known machine learning classifiers. Especially when learning from minor classes/attacks. The main factor is that the KDD cup 99 dataset, which is popularly used in most of the existing research, is an imbalanced dataset due to the nature of the specific intrusion detection domain, i.e. some attacks being rare and some being very frequent. Therefore, there is a significant imbalance amongst the classes in the dataset. Based on the number of the classes in the dataset, the imbalance dataset issue can be considered a binary problem or a multi-class problem. Most of the researchers focus on conducting a binary class classification as conducting a multi-class classification is complex. In the research proposed in this paper, we consider the problem as a multi-class classification task. The paper investigates the use of different machine learning algorithms in order to overcome the common misclassification problems that have been faced by researchers who used the imbalance KDD cup 99 dataset for their investigations. Recommendations are made as for which classifier is best for the classification of imbalanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Modi, C., Patel, D., Borisaniya, B., Patel, A., et al.: A survey on security issues and solutions at different layers of Cloud computing. J. Supercomput. 63(2), 561–592 (2013)

    Article  Google Scholar 

  2. Chen, Y., Sion, R.: On securing untrusted clouds with cryptography. Science 109–114 (2010)

    Google Scholar 

  3. Sommer, R., Paxson, V.: Outside the closed world: on using machine learning for network intrusion detection, pp. 305–316 (2010). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5504793&contentType=Conference+Publications&queryText=R.+Sommer+and+V.+Paxson,+Outside+the+Closed+World:+On+Using+Machine++Learning+For+Network+Intrusion+Detection

  4. Naiping, S.N.S., Genyuan, Z.G.Z.: A study on intrusion detection based on data mining. In: International Conference of Information Science and Management Engineering, ISME, vol. 1, pp. 8–15 (2010)

    Google Scholar 

  5. Almutairi, A.: Intrusion detection using data mining techniques

    Google Scholar 

  6. McHugh, J.: Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)

    Article  Google Scholar 

  7. Tavallaee, M., et al.: A detailed analysis of the KDD CUP 99 data set. In: IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, (Cisda), pp. 1–6 (2009)

    Google Scholar 

  8. Tavallaee, M.: An Adaptive Intrusion Detection System. Sdstate.Edu. (2011)

    Google Scholar 

  9. Thomas, C., Balakrishnan, N.: Performance enhancement of intrusion detection systems using advances in sensor fusion. In: 11th International Conference on Information Fusion, pp. 1–7 (2008)

    Google Scholar 

  10. Tran, T., et al.: Network intrusion detection using machine learning and voting techniques. In: Machine Learning, pp. 7–10 (2011). http://cdn.intechweb.org/pdfs/10441.pdf

  11. Tsai, C.H., Chang, L.C., Chiang, H.C.: Forecasting of ozone episode days by cost-sensitive neural network methods. Sci. Total Environ. 407(6), 2124–2135 (2009). https://doi.org/10.1016/j.scitotenv.2008.12.007

    Article  Google Scholar 

  12. Troesch, M., Walsh, I.: Machine learning for network intrusion detection, pp. 1–5 (2014)

    Google Scholar 

  13. Juma, S., et al.: Machine learning techniques for intrusion detection system: a review. J. Theor. Appl. Inf. Theor. 72(3), 422–429 (2015). http://research.ijcaonline.org/volume119/number3/pxc3903678.pdf

  14. Panda, M., et al.: Network intrusion detection system: a machine learning approach. Intell. Decis. Technol. 5(4), 347–356 (2011). http://dx.doi.org/10.3233/IDT-20110117%5Cnhttp://iospress.metapress.com/content/911371h6266k5h4p/

  15. Kubat, M.:. Neural networks: a comprehensive foundation by Simon Haykin, Macmillan, 1994. The Knowledge Engineering Review 13(4), pp. 409–412 (1999). ISBN 0-02-352781-7

    Google Scholar 

  16. LeCun, Y.A., et al.: Efficient backprop. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7700 (2012)

    Google Scholar 

  17. Engen, V.: Machine learning for network based intrusion detection. Int. J. (2010)

    Google Scholar 

  18. López, V., et al.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007

    Article  Google Scholar 

  19. Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. 6(1), 7–19 (2004)

    Article  Google Scholar 

  20. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–867 (2005). http://link.springer.com/10.1007/0-387-25465-X_40

  21. Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. Science 30(1), 25–36 (2006). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.9248&rep=rep1&type=pdf

  22. Weiss, G.M., Provost, F.: Learning when training data are costly: the effect of class distribution on tree induction. J. Artif. Intell. Res. 19, 315–354 (2003)

    Article  Google Scholar 

  23. Barandela, R., et al.: Strategies for learning in class imbalance problems.pdf. Pattern Recog. 36, 849–851 (2003)

    Article  Google Scholar 

  24. Barandela, R., Sánchez, J.S., Valdovinos, R.M.: New applications of ensembles of classifiers. Pattern Anal. Appl. 6(3), 245–256 (2003)

    Article  MathSciNet  Google Scholar 

  25. Ducange, P., Lazzerini, B., Marcelloni, F.: Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft. Comput. 14(7), 713–728 (2010)

    Article  Google Scholar 

  26. Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Brief. Bioinform. 14(1), 13–26 (2013)

    Article  Google Scholar 

  27. Wang, J.: Advanced attack tree based intrusion detection (2012)

    Google Scholar 

  28. Wang, J., et al.: Extract minimum positive and maximum negative features for imbalanced binary classification. Pattern Recogn. 45(3), 1136–1145 (2012). https://doi.org/10.1016/j.patcog.2011.09.004

    Article  Google Scholar 

  29. Batuwita, R., Palade, V.: Class imbalance learning methods for support vector. imbalanced learning: foundations, algorithms, applications, pp. 83–100 (2013)

    Google Scholar 

  30. García-Pedrajas, N., et al.: Class imbalance methods for translation initiation site recognition in DNA sequences. Knowl. Based Syst. 25(1), 22–34 (2012)

    Article  Google Scholar 

  31. Domingos, P.: MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 55, pp. 155–164 (1999). http://portal.acm.org/citation.cfm?id=312129.312220&type=series

  32. Zhou, Z., Member, S., Liu, X.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)

    Article  Google Scholar 

  33. Błaszczyński, J., et al.: Integrating selective pre-processing of imbalanced data with Ivotes ensemble. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNAI, vol. 6086, pp. 148–157 (2010)

    Google Scholar 

  34. Chawla, N.V., et al.: SMOTEBoost: improving prediction. In: Lecture Notes in Computer Science, vol. 2838, pp.107–119 (2003)

    Google Scholar 

  35. Chawla, N.V.: C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proceedings of the International Conference on Machine Learning, Workshop Learning from Imbalanced Data Set II (2003). http://www.site.uottawa.ca:4321/~nat/Workshop2003/chawla.pdf

  36. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., et al.: RUSBoost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 40(1), 185–197 (2010)

    Article  Google Scholar 

  37. Batuwita, R., Palade, V.: Efficient resampling methods for training support vector machines with imbalanced datasets. In: Proceedings of the International Joint Conference on Neural Networks (2010)

    Google Scholar 

  38. Fernandez, A., et al.: A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets. Fuzzy Sets Syst. 159(18), 2378–2398 (2008)

    Article  MathSciNet  Google Scholar 

  39. Fernández, A., del Jesus, M.J., Herrera, F.: On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets. Inf. Sci. 180(8), 1268–1291 (2010). https://doi.org/10.1016/j.ins.2009.12.014

    Article  MathSciNet  Google Scholar 

  40. Weiss, G., McCarthy, K., Zabar, B.: Cost-sensitive learning vs. sampling: which is best for handling unbalanced classes with unequal error costs? Dmin, pp. 1–7 (2007). http://storm.cis.fordham.edu/~gweiss/papers/dmin07-weiss.pdf

  41. Japkowicz, N.: The class imbalance problem: significance and strategies. In: Proceedings of the 2000 International Conference on Artificial Intelligence, pp. 111–117 (2000)

    Google Scholar 

  42. Van Hulse, J.: An empirical comparison of repetitive undersampling techniques, pp. 29–34 (2009)

    Google Scholar 

  43. Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  44. Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36(3), 4626–4636 (2009). https://doi.org/10.1016/j.eswa.2008.05.027

    Article  Google Scholar 

  45. Adamu Teshome, D., Rao, V.S.: A cost sensitive machine learning approach for intrusion detection. Glob. J. Comput. Sci. Technol. 14(6) (2014)

    Google Scholar 

  46. Choudhury, S., Bhowal, A.: Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection. In: International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), (May), pp. 89–95 (2015). http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7225395

  47. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  48. Mohammad, M.N., Sulaiman, N., Muhsin, O.A.: A novel Intrusion Detection System by using intelligent data mining in WEKA environment. Procedia Comput. Sci. 3, 1237–1242 (2011). https://doi.org/10.1016/j.procs.2010.12.198

    Article  Google Scholar 

  49. Depren, O., Topallar, M., Anarim E., Ciliz, M.K.: An intelligent intrusion detection system for anomaly and misuse detection in computer networks. Expert Syst. Appl., 29, 713–722 (2005)

    Article  Google Scholar 

  50. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence (IJCAI), (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Intisar S. Al-Mandhari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Al-Mandhari, I.S., Guan, L., Edirisinghe, E.A. (2019). Investigating the Effective Use of Machine Learning Algorithms in Network Intruder Detection Systems. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication Networks. FICC 2018. Advances in Intelligent Systems and Computing, vol 887. Springer, Cham. https://doi.org/10.1007/978-3-030-03405-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03405-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03404-7

  • Online ISBN: 978-3-030-03405-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics