Skip to main content

ABC-BSRF: Artificial Bee Colony and Borderline-SMOTE RF Algorithm for Intrusion Detection System on Data Imbalanced Problem

  • Conference paper
  • First Online:
Proceedings of International Conference on Computational Intelligence and Data Engineering

Abstract

In the current scenario, machine learning techniques are advantageous for making better decisions. They indeed are important and achieve better results in Intrusion Detection Systems (IDSs). Even though prominent classifiers may produce outstanding results regarding the majority classes, they are biased. Remarkably, the Imbalanced aspect of data leads to generate inaccurate results since the minority classes are not adjudicated properly and result in misclassification costs. In this paper, the classification problems due to imbalanced data are effectively addressed by the latest evolving techniques. To overcome this difficulty and improve the model performance, an acute algorithm, named Artificial Bee Colony Borderline SMOTE on Random Forests (ABC-BSRF) is proposed. It constitutes Artificial Bee Colony (ABC analysis) for Feature selection and Borderline SMOTE through random forests for oversampling. The results are compared with individual classifiers such as Support Vector Machines (SVMs), Decision Trees, and K-nearest neighbor (KNN). Observed results inferred from the experimentations done on KDD cup 99 dataset have proved that our proposed work can be excellently resolving the issue of imbalanced data. The ROC curve, F1 score, Precision, Recall, and AUC have shown noticeable results in contrast with other traditional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4):597–604

    Google Scholar 

  2. Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newsl 6(1):40–49

    Article  Google Scholar 

  3. Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inform Fusion 9(1):4–20

    Article  Google Scholar 

  4. Silva C, Lotric U, Ribeiro B, Dobnikar A (2010) Distributed text classification with an ensemble kernel-based learning approach. IEEE Trans Syst Man Cybern C 40(3):287–297

    Google Scholar 

  5. Yang Y, Chen K (2011) Time series clustering via RPCL network ensemble with different representations. IEEE Trans Syst Man Cybern C Appl Rev 41(2):190–199

    Google Scholar 

  6. Xu Y, Cao X, Qiao H (2011) An efficient tree classifier ensemble-based approach for pedestrian detection. IEEE Trans Syst Man Cybern B Cybern 41(1):107–117

    Google Scholar 

  7. Moayedikia A, Ong KL, Boo YL, Yeoh WG, Jensen R (2017) Feature selection for high dimensional imbalanced class data using harmony search. Eng Appl Artif Intell 57:38–49

    Article  Google Scholar 

  8. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Article  Google Scholar 

  9. Wang J, Xu M, Wang H, Zhang J (2006) Classification of imbalanced data by using the smote algorithm and locally linear embedding. In: Proceedings of the 8th international conference on signal processing, vol 3, pp 1–4

    Google Scholar 

  10. Ertekin CS (2013) Adaptive oversampling for imbalanced data classification. In: Proceedings of the 28th international sciences and systems, vol 264, pp 261–269

    Google Scholar 

  11. Chawla NV (2005) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, pp 853–867. Springer

    Google Scholar 

  12. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (Eds.), ICIC 2005. LNCS, vol 3644, pp. 878–887. Springer, Heidelberg. https://doi.org/10.1007/1153805991

  13. Sujitha B, Kavitha V (2015) Layered approach for intrusion detection using multi objective particle swarm optimization. Int J Appl Eng Res 10:31999–32014

    Google Scholar 

  14. Zorarpaco E, Ozel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103

    Article  Google Scholar 

  15. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications

    Google Scholar 

  16. Gaffer SM, Yahia ME, Ragab K (2013) Genetic fuzzy system for intrusion detection: analysis of improving of multiclass classification accuracy using KDD Cup-99 imbalance dataset. In: International conference on hybrid intelligent systems. IEEE press, pp 318–323

    Google Scholar 

  17. Thomas C (2012) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6(3):309–324

    Google Scholar 

  18. Parsaei MR, Rostami SM, Javidan R (2016) A hybrid data mining approach for intrusion detection on imbalanced NSL-KDD dataset. (IJACSA) Int J Adv Comput Sci Appl 7(6):20–25

    Google Scholar 

  19. Ofek N, Rokach L, Stem R, Shabtai A (2017) Fast-CBUS: a fast clustering-based under sampling method for addressing the class imbalance problem. Neurocomputing 243(1):88–102

    Google Scholar 

  20. Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a protdriven data mining approach. Eur J Oper Res 218(1):211–229

    Article  Google Scholar 

  21. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE international joint conference on neural networks, IEEE world congress computational intelligence, pp 1322–1328

    Google Scholar 

  22. Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34(4):966–982

    Google Scholar 

  23. Teck CC, Xiang L, Junhong Z, Xiaoli L, Hong C, Woon D (2012) Hybrid rebalancing approach to handle imbalanced dataset for fault diagnosis in manufacturing systems. In: Proceedings of the 2012 7th IEEE conference on industrial electronics and applications (ICIEA), pp 1224–1229

    Google Scholar 

  24. Deville J-C, Tillé Y (2004) Efficient balanced sampling: the cube method. Biometrika 91(4):893–912

    Article  MathSciNet  Google Scholar 

  25. Japkowicz N (2000) The class imbalance problem: significance and strategies. In: Proceedings of the international conference on artificial intelligence, pp 1–7

    Google Scholar 

  26. Wang H, Gu J, Wang S (2017) An effective intrusion detection framework based on SVM with feature augmentation. Knowl-Based Syst 136:130–139

    Article  Google Scholar 

  27. Hajisalem V, Babaie S (2018) A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput Netw 136:37–50

    Google Scholar 

  28. Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75

    Google Scholar 

  29. Ren J, Guo J, Qian W, Yuan H, Hao X, Jingjing H (2019) Building an effective intrusion detection system by using hybrid data optimization based on machine learning algorithms. Secur Commun Netw 2019(7130868):11. https://doi.org/10.1155/2019/7130868

  30. Priyadarsini PI, Nikhila K, Manvitha P (2018) Ensemble based framework for intrusion detection system. Int J Eng Technol 7(4)

    Google Scholar 

  31. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes university, engineering faculty, computer engineering department

    Google Scholar 

  32. Bansal JCH, Sharma H, Jadon SHS (2013) Artificial bee colony algorithm: a survey. Int J Adv Intell Paradigms 5:123–159

    Google Scholar 

  33. Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29

    Google Scholar 

  34. Putthiporn T, Chidchanok L (2013) Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques. Pattern Recogn Lett 34(3):1339–1347

    Google Scholar 

  35. Chawla NV, Japkowicz N, Kolcz A (2004) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6

    Google Scholar 

  36. Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19

    Article  Google Scholar 

  37. Breiman L (2001) Random forest. Mach Learn 45:5–32

    Article  Google Scholar 

  38. Ouyang MG, Wang WN, Zhang YT (2002) A fuzzy comprehensive evaluation based distributed intrusion detection. In: Proceedings 1st international conference on machine learning and cybernetics, pp 281–285. China, Beijing

    Google Scholar 

  39. Anaconda software. https://www.anaconda.com/download

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pullagura Indira priyadarsini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

priyadarsini, P.I. (2021). ABC-BSRF: Artificial Bee Colony and Borderline-SMOTE RF Algorithm for Intrusion Detection System on Data Imbalanced Problem. In: Chaki, N., Pejas, J., Devarakonda, N., Rao Kovvur, R.M. (eds) Proceedings of International Conference on Computational Intelligence and Data Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 56. Springer, Singapore. https://doi.org/10.1007/978-981-15-8767-2_2

Download citation

Publish with us

Policies and ethics