Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning

Abdelkhalek, Ahmed; Mashaly, Maggie

doi:10.1007/s11227-023-05073-x

Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning

Open access
Published: 12 February 2023

Volume 79, pages 10611–10644, (2023)
Cite this article

Download PDF

You have full access to this open access article

The Journal of Supercomputing Aims and scope Submit manuscript

Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning

Download PDF

Ahmed Abdelkhalek¹ &
Maggie Mashaly¹

5934 Accesses
20 Citations
Explore all metrics

Abstract

Network intrusion detection systems (NIDS) are the most common tool used to detect malicious attacks on a network. They help prevent the ever-increasing different attacks and provide better security for the network. NIDS are classified into signature-based and anomaly-based detection. The most common type of NIDS is the anomaly-based NIDS which is based on machine learning models and is able to detect attacks with high accuracy. However, in recent years, NIDS has achieved even better results in detecting already known and novel attacks with the adoption of deep learning models. Benchmark datasets in intrusion detection try to simulate real-network traffic by including more normal traffic samples than the attack samples. This causes the training data to be imbalanced and causes difficulties in detecting certain types of attacks for the NIDS. In this paper, a data resampling technique is proposed based on Adaptive Synthetic (ADASYN) and Tomek Links algorithms in combination with different deep learning models to mitigate the class imbalance problem. The proposed model is evaluated on the benchmark NSL-KDD dataset using accuracy, precision, recall and F-score metrics. The experimental results show that in binary classification, the proposed method improves the performance of the NIDS and outperforms state-of-the-art models with an achieved accuracy of 99.8%. In multi-class classification, the results were also improved, outperforming state-of-the-art models with an achieved accuracy of 99.98%.

Effective network intrusion detection by addressing class imbalance with deep neural networks multimedia tools and applications

Article 02 February 2022

Machine Learning Anomaly-Based Network Intrusion Detection: Experimental Evaluation

Combining Oversampling with Recurrent Neural Networks for Intrusion Detection

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Background and motivation

With the fast development of information technologies like smart devices, internet of things (IoT), cloud computing and big data, etc., the number of connected devices to the internet increased more than ever. Networks are getting larger and the risk of cyberattacks increases as networks become more difficult to monitor. A cyberattack begins with reconnaissance of the target and concludes with exploiting weaknesses to complete a malicious task [1]. Those cyberattacks cause an intrusion on the system which is defined as unauthorized access to the system resulting in a compromise in the confidentiality, integrity and availability (CIA) of security mechanisms of computer or network resources [2]. Many new cyberattacks have emerged in recent years like distributed denial of service or denial of service, brute force, botnet, cross-site scripting, etc. [3]. Those cyberattacks created a concern more serious than ever in cybersecurity. Many cloud servers and hosts have been under attack and have become botnets and Bitcoin Trojans malware vectors. According to the Internet Security Threat Report (ISTR), malware is found in one out of every thirteen web queries. Spam in e-mail has risen to more than 55%, internet dangers have risen to 46%, and ransomware has grown to 46% [4], incurring cyberattacks damages costing $200,000 in 2019 according to CNBC [5]. Because of this expansion of networks and cyberattacks, a system to detect those attacks and provide network security is essential, namely intrusion detection systems (IDS).

An IDS, in its most basic form, is a software that monitors various sources and detects an intrusion on a system. IDS has proved to be an effective approach in detecting intrusions and has caught the attention of many researchers. It has the ability to detect known and unknown threats and intrusions by monitoring traffic data in computer systems and issuing alerts when these threats are detected [6]. According to the data sources monitored, IDSs are classified into Host IDS (HIDS) and Network IDS (NIDS). HIDS monitors data from logs, system calls, etc., but does not monitor network traffic, and thus, it can detect internal attacks not involving the network [7]. NIDS, on the other hand, collects and monitors data directly from the network through network monitoring equipment such as switches, routers and other network devices and thus can detect many types of network attacks.

According to the detection approach, IDSs are classified into misuse and anomaly detection. In misuse detection (also known as signature-based detection), any suspicious access is compared to a database of all known attacks, and the intrusion is detected based on that. This detects any previously known attacks successfully, but fails in detecting novel day-zero attacks. In addition, the database of known attacks should be regularly updated in order to keep up with the ever-increasing new attacks. On the other hand, anomaly-based intrusion detection system are capable of detecting both network and computer intrusions by monitoring system activity and classifying it as either normal or anomalous. The classification is based on heuristics or rules and not comparing to signatures. This has the theoretical potential of detecting day-zero new attacks, which has attracted the interest of many research fields.

Anomaly detection in intrusion detection systems was started in 1980 by Anderson [8] whose proposed method was system monitoring to detect anomalies. Since then, many techniques have been developed to implement anomaly-based network intrusion detection. Some of those techniques are computing based, data mining based, statistical based, machine learning, cognitive based or knowledge and user intention identification based, etc. [9]. One of the techniques used for anomaly detection is machine learning, which has shown ability to detect the differences between normal and anomaly traffic as shown in [10]. However, with the increase in network traffic and attack types, traditional machine learning models (shallow learners) are not keeping up with the needed performance [11]. To meet this large-scale demand, deep learning (DL) which is a branch of machine learning is now being used in NIDS. Studies have shown that deep learning outperforms traditional machine learning models in detecting anomalies due to its ability to extract information from massive amounts of data [12]. Deep learning-based approaches used in NIDS include deep neural networks (DNN), convolutional neural networks (CNN), long short-term memory (LSTM) and recurrent neural networks (RNN) [13]. A detailed literature review on machine learning- and deep learning-based NIDS is provided in Sect. 2.

1.2 Challenges

NIDSs based on deep learning models has proven to be an improvement upon machine learning models and achieving higher accuracy. However, they fail in detecting attacks with less traffic due to class imbalances in the benchmark datasets. Intrusion detection state-of-the-art benchmark datasets include class imbalances where the normal traffic is much more than the attacks’ traffic (simulating real-world network traffic). Even among the different attack types, some attacks appear much more than others. This causes the NIDS to have difficulties detecting certain types of attacks and lowers the overall performance of it. Increased false alarm rate and decreased detection rate are signs of this lower performance. In recent NIDS researches, not enough attention was given toward the problem of imbalanced data despite its negative effect on the NIDS’s accuracy of detecting attacks [14].

1.3 Contributions

This research aims to address the issue of class imbalances to improve the detection rate of minority classes using NIDS based on deep learning models. This is done by proposing a hybrid data resampling algorithm consisting of oversampling using Adaptive Synthetic Sampling (ADASYN) and undersampling using TomekLink. ADASYN is an oversampling technique that creates artificial samples of the minority classes, while TomekLink is an undersampling technique for cleaning up redundant samples that occur with ADASYN. A detailed explanation of the proposed method is included in Sect. 3. The main contributions of this paper can be summarized as follows:

Addressing the class imbalances issue by using deep learning in combination with data resampling techniques. Both oversampling and undersampling techniques are applied to the dataset to increase the detection rate of minority classes. Oversampling is done using ADASYN which creates artificial data samples of the minority classes, while TomekLink is used to undersample any redundant data samples.
Building four deep learning models to study the effect of data resampling on them and compare them with each other and with previous works. Those models include multi-layer perceptron (MLP), DNN, CNN and CNN-BLSTM. The effect of data resampling is observed on all models, and the improvement in false alarm rate and detection rate is noted.
Testing the proposed data resampling method with the different deep learning models on the benchmark NSL-KDD dataset. The NSL-KDD is the most used NIDS dataset [15] while also suffering from the class imbalances problem.
Evaluating the model performance by obtaining well-known performance metrics and comparing the models among each other’s and with the related state-of-the-art previous models to show the superiority of the proposed models.

2 Literature review

With the increase in the need to detect network intrusions, the research into NIDS is gaining more and more attention. In addition, much progress has been achieved in both the machine learning and deep learning techniques and their applications in anomaly-based NIDS. In this section, we will review the most recent approaches in both techniques, focusing on the techniques applied on the benchmark NSL-KDD dataset presented in the literature which is divided into KDDTrain and KDDTest for training and testing, respectively.

2.1 Machine learning NIDS

Machine learning has been used as a technique to detect anomalies in network traffic as well as novel day-zero attacks without having its signature known before. The most common machine learning models used to detect anomalies in literature are support vector machines (SVM), decision trees (DT) and random forest (RF). The authors of [16] compare the accuracies of different machine learning models in classifying the KDDTest dataset into two classes (normal and anomaly). Reported results show that SVM achieved an accuracy of 69.52%, RF achieved an accuracy of 80.67%, and the J48 decision tree achieved an accuracy of 81.67%, all when tested on KDDTest. The authors of [17] proposed machine learning-based intrusion detection systems to classify attacks into normal, dos, probe, r2l and u2r. Their tests using DT algorithm achieved an accuracy of 75.22%, compared to 73.26% using SVM, and 62.73% using RF algorithm, all when tested on the KDDTest. In [18], an IDS was proposed based on preprocessing, feature selection, clustering and then classification. This was done by using fuzzy rule-based system to analyze the features followed by a decision tree to select important features. The data are then clustered using K-means to minimize the number of the data sets used in training in order to bring down the complexities in the computation and processing. The SVM classifier is then used to categorize the intrusion on the network. This achieves an average accuracy of 97% on the KDD-NSL dataset. In [19] Firat et al. applied SVM, K-nearest neighbors (KNN) and DT algorithms on the NSL-KDD dataset and evaluated by splitting the KDDTrain dataset into train and test sets. Building on this work, Soheily et al. [20] proposed a hybrid NIDS based on K-means and RF(KM-RF) and evaluated the proposed algorithm on the NSL-KDD dataset. Similar work using an enhanced KNN algorithm along with local outlier factor (LOF) was done by authors of [21] and tested on the CICIDS2017 dataset of predicting zero day attacks with an accuracy of 92.74%.

Although machine learning techniques achieve good detection accuracies, they cannot face the problems in a real-world network environment. This is because traditional machine learning techniques rely heavily on feature engineering to extract information from network traffic [12]. On the other hand, deep learning can extract and learn features from the data due to their deep structure [22]. Thus, deep learning techniques are more suitable for massive amounts of data.

2.2 Deep learning NIDS

Deep learning techniques are typically used in more complex tasks like image recognition and natural language processing. This makes deep learning more suitable for a complex task like intrusion detection especially for intrusions that have not been seen before. Many researchers have studied the application of deep learning for intrusion detection and proved that it outperforms other techniques.

In [12], a recurrent neural network (RNN) model was proposed and evaluated on the NSL-KDD dataset. It achieved accuracies of 83.28% and 81.29% in binary and multi-class classification, respectively, and thus outperforming traditional machine learning models. In [13], long short-term memory (LSTM) layers were used to build a model and evaluated on the same dataset achieving a higher accuracy of 93.88%. The authors of [23] used a sequential feedforward neural network (SEQ-FFNN) to detect and classify attack packets and evaluated the model on the KDDTrain dataset achieving accuracies of 98.97% for Tanh activated layers and 99.59% for the Sigmoid activated model. In [24], a multi-layer perceptron (MLP) with particle swarm optimization algorithm (PSO) model was proposed (MLP-PSO) and was shown to achieve an accuracy of 83.27% on binary classification. The authors in [25] suggested a BAT-MC model for NIDS using NSL-KDD that outputs a 122-feature vector and uses a min-max scaler in the data preprocessing layer. The traffic data are then converted into traffic images in the multiple convolution layers stage, which contains three two-dimensional convolutional layers that extract spatial features, followed by the bidirectional long short-term memory (BLSTM) layer, which connects the forward and backward LSTM and learns the time-series features in the data packet. Each data packet can produce a packet vector, which forms a network flow vector. Then, to learn features on the network flow vectors and pay more attention to key features, attention layers are used. These proposed enhancements achieved 84.25% accuracy in KDDTest in five category classification. The authors in [26] implemented a CNN2D model and achieved an accuracy of 86.95% on binary classification. Finally, in [27], a CNN model was used and evaluated after splitting the KDDTrain dataset to achieve an accuracy of 98.63 % which is the highest compared to other models. In [28], an intrusion detection system based on fusion convolutional neural network (FCNN) for feature extraction and stacked ensemble (SE) for classification was proposed. The FCNN uses a 1D CNN with a 2D CNN to extract features from the dataset. This method produced a new dataset of 256 features. The classification was then carried out by an SE learner where a combination of K-nearest neighbors (KNN), decision tress (DT) and Naive Bayesian (NB) was used. The experiments on the NSL-KDD dataset with tenfold validation resulted in an average accuracy of 98.9%. In [29], an IDS based on LSTMs and gated recurrent unit (GRU) called LTSMGRU is proposed. The steps of this IDS included feature selection using Pearson correlation method after scaling the data using MinMaxScaler then using the model to predict the attack. This was evaluated on the CICIDS2018 dataset by splitting it into train and test sets and testing with tenfolds. The highest average accuracy achieved was recorded to be 99.76%. Another IDS is proposed in [30], and it is based on gated recurrent neural network (GRU-RNN) on a software-define network (SDN) called DeepIDS. This was tested on the NSL-KDD dataset and achieved an accuracy of 80.7% and 90% in binary detection for a DNN and GRU-RNN, respectively. In [31], feature selection was implemented using sequence forward selection (SFS) algorithm and decision tree (DT) model before anomaly detection. Anomaly detection was carried out by using RNN, LSTM and GRU. This IDS achieves the highest average accuracy of 92%.

Deep learning techniques show an obvious improvement over traditional machine learning models by achieving very high accuracies. However, those high accuracies achieved are due to the class imbalances in the NSL-KDD dataset. Most models are able to detect majority classes easily, which results in high accuracies, while failing to detect minority classes.

2.3 Class Imbalances

When a class is underrepresented in a dataset, this causes the dataset to be imbalanced. Detecting the minority class becomes a difficulty which lowers the performance of the intrusion detection system [32]. The NSL-KDD dataset has much more normal samples than the attack samples, just like in real-world network. This causes the class imbalances problem to appear and be one of the most common challenges in intrusion detection [11]. In [33], the performance of DT and RF was improved by using CatBoost with random oversampling and undersampling. This was evaluated on the CIC-IDS-2018 dataset and achieved an accuracy of 91.95% in multi-class classification. In [34], an improved NSGS-III called I-NSGA-III feature selection algorithm was proposed for solving the imbalance problem in NSL-KDD. A better detection rate was reported, but not a higher accuracy. In [35], random oversampling was used on the minority classes, and random undersampling was used on the majority class in the NSL-KDD dataset to improve intrusion detection. However, in this work, random oversampling is known to cause overfitting [36], and only the accuracy was reported. In [37], two tree-based and one deep learning-based classifiers were tested with different sampling rates; it proved that sampling improves the detection of different classes. In [38], Zhang et al. propose SMOTE combined with edited nearest neighbors (SMOTE-ENN) and deep neural network (DNN) algorithm to evaluate on the NSL-KDD dataset. A cost-sensitive deep learning model was combined with ensemble algorithms to deal with the imbalance problem in [39]. An IDS consisting of three layers. The first layer uses cost-sensitive deep neural network to separate normal traffic from suspicious traffic. Suspicious traffic is then fed into an XGBoost model to classify into a majority class or a collection of minority classes. The collection of minority classes is then classified into a particular minority class using a random forest. This achieves high detection rates for both majority and minority attacks but it consumes heavy resources and takes a lot of time. A deep metric learning that combines autoencoders and triplet networks to detect attacks was proposed in [40]. The triplet network is trained to get the embedding vectors of the network flows and detect the presence of a malicious activity. However, this achieved good results mainly on binary classification of the NSL-KDD giving accuracy of 93.5%. In [41], seven machine learning models were tested on the NSL-KDD dataset for binary classification including k-means clustering, k-NN, RF, IF, b-XGBoost, DNN and CNN. Both of the b-XGBoost and DNN models were the top performers. For multi-class classification, two classifiers, namely m-XGBoost and Siamese-NN, were evaluated, and m-XGBoost was chosen. The model was tested on the NSL-KDD dataset. The accuracy was reported to be 80% in binary classification. In multi-class, an average F-score is 70%. In [42], oversampling was done using SMOTE, while the oversampled data were undersampled using TomekLink to remove redundant data samples. This was evaluated on many datasets including the NSL-KDD with a stratified tenfold cross-validation, while the classification was carried out with a LSTM model. This method achieved an accuracy of 99%.

Related works in NIDS that consider the imbalance problem highly improve the performance of NIDS compared to others that do not address imbalances problem. However, few of the related works evaluated their methods on NSL-KDD dataset; also to the best of our knowledge, none of the studies have analyzed the combination of ADASYN and TomekLink algorithms together for undersampling and oversampling, respectively.

3 Proposed method

In this paper, it is aimed to deal with the NSL-KDD dataset imbalances problems in order to achieve not only good accuracies but also improved detection rate of the minority classes. This can be achieved by mitigating the imbalanced classes’ problem and then proceeding with intrusion detection.

To deal with the imbalance problem, a data preprocessing method is proposed. This method consists of oversampling the minority classes using Adaptive Synthetic Sampling (ADASYN) followed by undersampling using TomekLink to remove any redundant data samples generated by ADASYN. ADASYN basic idea is using a weighted distribution for different minority class examples according to their level of difficulty in learning, where more synthetic data are generated for minority class examples that are harder to learn compared to those minority examples that are easier to learn [43]. Some of the generated samples may overlap with the majority classes, and that is why undersampling is used after oversampling. Undersampling is done using TomekLinks which deletes some of the majority class to remove redundancies. This proposed method is shown to improve the performance of NIDS, and to the best of our knowledge, ADASYN + TomekLinks method has not been tested in the previous literature. After the resampling method, the classification is then carried out by more than one deep learning model, and the performances of the different models are compared. After the data resampling, the models are trained on both binary and multi-class classification problems. The performance of the models is evaluated on the KDDTest dataset and a part of KDDTrain dataset, where in both cases, the test dataset remains unseen by the training model. A detailed explanation of the dataset, data preprocessing as well as the purposed architectures will be included in this section. A diagram of the purposed architecture is shown in Fig. 1.

3.1 Dataset description

The NSL-KDD is a publicly available dataset purposed by M. Tavallaee in [44] as an improvement over the well-known KDD’99 dataset. It is available for researchers as a benchmark dataset to evaluate different intrusion detection methods. The NSL-KDD dataset contains KDDTrain and KDDTest which are partitioned into different difficulty levels. The NSL-KDD improved many of the shortcomings of the KDD’99 dataset by providing the following advantages:

It does not include redundant records in the train set, so that the training would not be biased toward a certain record [44].
It does not include duplicate records, which helps the training to be more accurate [44].
The selected records of each difficulty level is proportional to the original KDD’99 dataset in order not to affect the training [44].
The number of records in the train and test sets is not very large, which makes it affordable to run experiments on the whole dataset without needing to select a portion of the dataset randomly. This enables a more accurate comparison between intrusion detection algorithms [44].

The dataset contains 41 attributes, with 1 column as the classification label. There are five labels, one label for the normal traffic, while the attacks are grouped into four different labels (DoS, Probing, R2L and U2R) according to the type of attack as shown in Table 1. An overview of each classification label is given below:

Normal: This refers to normal traffic, and it is considered as the majority class of the dataset [45].
Denial of Service Attack (DoS): This is an attack that aims to make a system extremely busy and consume its memory resources to hinder its ability to accept normal requests, preventing users’ access to the system [45].
Probing Attack (Probe): This attack is used to gather information about a network in order for its initiator to exploit the network’s vulnerabilities [45].
Remote to Local Attack (R2L): This is an attempt by an intruder to get access to a network as a user [45].
User to Root Attack (U2R): This attack starts as a regular user account on the network and exploits vulnerabilities to gain root access to the system [45].

Table 1 NSL-KDD dataset attack types [45]

Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning

Abstract

Similar content being viewed by others

Effective network intrusion detection by addressing class imbalance with deep neural networks multimedia tools and applications

Machine Learning Anomaly-Based Network Intrusion Detection: Experimental Evaluation

Combining Oversampling with Recurrent Neural Networks for Intrusion Detection

1 Introduction

1.1 Background and motivation

1.2 Challenges

1.3 Contributions

2 Literature review

2.1 Machine learning NIDS

2.2 Deep learning NIDS

2.3 Class Imbalances

3 Proposed method

3.1 Dataset description

3.2 Data preprocessing

3.2.1 Encoding

3.2.2 Normalization

3.2.3 Class balancing

3.2.3.1 Dropping normal traffic

3.2.3.2 ADASYN

3.2.3.3 TomekLinks

3.2.3.4 ADASYN+TomekLinks

3.3 Model architectures

3.3.1 Multi-layer perceptron (MLP)

3.3.1.1 Convolutional neural networks (CNN)

3.3.1.2 Recurrent neural networks (RNN)

4 Experiments and results

4.1 Experiment setup

4.2 Evaluation metrics

4.3 Experimental results

4.4 Discussion

5 Conclusion

5.1 Future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation