Abstract
In the current scenario, machine learning techniques are advantageous for making better decisions. They indeed are important and achieve better results in Intrusion Detection Systems (IDSs). Even though prominent classifiers may produce outstanding results regarding the majority classes, they are biased. Remarkably, the Imbalanced aspect of data leads to generate inaccurate results since the minority classes are not adjudicated properly and result in misclassification costs. In this paper, the classification problems due to imbalanced data are effectively addressed by the latest evolving techniques. To overcome this difficulty and improve the model performance, an acute algorithm, named Artificial Bee Colony Borderline SMOTE on Random Forests (ABC-BSRF) is proposed. It constitutes Artificial Bee Colony (ABC analysis) for Feature selection and Borderline SMOTE through random forests for oversampling. The results are compared with individual classifiers such as Support Vector Machines (SVMs), Decision Trees, and K-nearest neighbor (KNN). Observed results inferred from the experimentations done on KDD cup 99 dataset have proved that our proposed work can be excellently resolving the issue of imbalanced data. The ROC curve, F1 score, Precision, Recall, and AUC have shown noticeable results in contrast with other traditional methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yang Q, Wu X (2006) 10 challenging problems in data mining research. Int J Inform Technol Decis Making 5(4):597–604
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. ACM SIGKDD Explor Newsl 6(1):40–49
Oza NC, Tumer K (2008) Classifier ensembles: select real-world applications. Inform Fusion 9(1):4–20
Silva C, Lotric U, Ribeiro B, Dobnikar A (2010) Distributed text classification with an ensemble kernel-based learning approach. IEEE Trans Syst Man Cybern C 40(3):287–297
Yang Y, Chen K (2011) Time series clustering via RPCL network ensemble with different representations. IEEE Trans Syst Man Cybern C Appl Rev 41(2):190–199
Xu Y, Cao X, Qiao H (2011) An efficient tree classifier ensemble-based approach for pedestrian detection. IEEE Trans Syst Man Cybern B Cybern 41(1):107–117
Moayedikia A, Ong KL, Boo YL, Yeoh WG, Jensen R (2017) Feature selection for high dimensional imbalanced class data using harmony search. Eng Appl Artif Intell 57:38–49
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Wang J, Xu M, Wang H, Zhang J (2006) Classification of imbalanced data by using the smote algorithm and locally linear embedding. In: Proceedings of the 8th international conference on signal processing, vol 3, pp 1–4
Ertekin CS (2013) Adaptive oversampling for imbalanced data classification. In: Proceedings of the 28th international sciences and systems, vol 264, pp 261–269
Chawla NV (2005) Data mining for imbalanced datasets: an overview. Data mining and knowledge discovery handbook, pp 853–867. Springer
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (Eds.), ICIC 2005. LNCS, vol 3644, pp. 878–887. Springer, Heidelberg. https://doi.org/10.1007/1153805991
Sujitha B, Kavitha V (2015) Layered approach for intrusion detection using multi objective particle swarm optimization. Int J Appl Eng Res 10:31999–32014
Zorarpaco E, Ozel SA (2016) A hybrid approach of differential evolution and artificial bee colony for feature selection. Expert Syst Appl 62:91–103
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE symposium on computational intelligence in security and defense applications
Gaffer SM, Yahia ME, Ragab K (2013) Genetic fuzzy system for intrusion detection: analysis of improving of multiclass classification accuracy using KDD Cup-99 imbalance dataset. In: International conference on hybrid intelligent systems. IEEE press, pp 318–323
Thomas C (2012) Improving intrusion detection for imbalanced network traffic. Secur Commun Netw 6(3):309–324
Parsaei MR, Rostami SM, Javidan R (2016) A hybrid data mining approach for intrusion detection on imbalanced NSL-KDD dataset. (IJACSA) Int J Adv Comput Sci Appl 7(6):20–25
Ofek N, Rokach L, Stem R, Shabtai A (2017) Fast-CBUS: a fast clustering-based under sampling method for addressing the class imbalance problem. Neurocomputing 243(1):88–102
Verbeke W, Dejaeger K, Martens D, Hur J, Baesens B (2012) New insights into churn prediction in the telecommunication sector: a protdriven data mining approach. Eur J Oper Res 218(1):211–229
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the IEEE international joint conference on neural networks, IEEE world congress computational intelligence, pp 1322–1328
Li DC, Wu CS, Tsai TI, Lina YS (2007) Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput Oper Res 34(4):966–982
Teck CC, Xiang L, Junhong Z, Xiaoli L, Hong C, Woon D (2012) Hybrid rebalancing approach to handle imbalanced dataset for fault diagnosis in manufacturing systems. In: Proceedings of the 2012 7th IEEE conference on industrial electronics and applications (ICIEA), pp 1224–1229
Deville J-C, Tillé Y (2004) Efficient balanced sampling: the cube method. Biometrika 91(4):893–912
Japkowicz N (2000) The class imbalance problem: significance and strategies. In: Proceedings of the international conference on artificial intelligence, pp 1–7
Wang H, Gu J, Wang S (2017) An effective intrusion detection framework based on SVM with feature augmentation. Knowl-Based Syst 136:130–139
Hajisalem V, Babaie S (2018) A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput Netw 136:37–50
Zhang Y, Gong DW, Cheng J (2017) Multi-objective particle swarm optimization approach for cost-based feature selection in classification. IEEE/ACM Trans Comput Biol Bioinf 14(1):64–75
Ren J, Guo J, Qian W, Yuan H, Hao X, Jingjing H (2019) Building an effective intrusion detection system by using hybrid data optimization based on machine learning algorithms. Secur Commun Netw 2019(7130868):11. https://doi.org/10.1155/2019/7130868
Priyadarsini PI, Nikhila K, Manvitha P (2018) Ensemble based framework for intrusion detection system. Int J Eng Technol 7(4)
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes university, engineering faculty, computer engineering department
Bansal JCH, Sharma H, Jadon SHS (2013) Artificial bee colony algorithm: a survey. Int J Adv Intell Paradigms 5:123–159
Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newslett 6(1):20–29
Putthiporn T, Chidchanok L (2013) Handling imbalanced data sets with synthetic boundary data generation using bootstrap re-sampling and adaboost techniques. Pattern Recogn Lett 34(3):1339–1347
Chawla NV, Japkowicz N, Kolcz A (2004) Special issue on learning from imbalanced data sets. SIGKDD Explor 6(1):1–6
Weiss G (2004) Mining with rarity: a unifying framework. SIGKDD Explor 6(1):7–19
Breiman L (2001) Random forest. Mach Learn 45:5–32
Ouyang MG, Wang WN, Zhang YT (2002) A fuzzy comprehensive evaluation based distributed intrusion detection. In: Proceedings 1st international conference on machine learning and cybernetics, pp 281–285. China, Beijing
Anaconda software. https://www.anaconda.com/download
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
priyadarsini, P.I. (2021). ABC-BSRF: Artificial Bee Colony and Borderline-SMOTE RF Algorithm for Intrusion Detection System on Data Imbalanced Problem. In: Chaki, N., Pejas, J., Devarakonda, N., Rao Kovvur, R.M. (eds) Proceedings of International Conference on Computational Intelligence and Data Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol 56. Springer, Singapore. https://doi.org/10.1007/978-981-15-8767-2_2
Download citation
DOI: https://doi.org/10.1007/978-981-15-8767-2_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8766-5
Online ISBN: 978-981-15-8767-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)