1 Introduction

The usage of computers for information sharing over network systems is tremendously increasing. Consequently, the number of security threats is also increasing with the growth of Internet applications. The computer network system suffers from security vulnerabilities since the number of network intrusions is increasing daily. Any unauthorized access to a system that violates the data’s availability, confidentiality, and integrity is an attack. As a result, different applications running on several platforms cause a serious risk to the web servers’ security [4]. For these reasons, the intrusion detection system is widely deployed to examine the events in computer and network systems. The network traffic’s automatic monitoring to efficiently identify malicious attacks from normal traffic is termed an Intrusion Detection System (IDS) [18, 43].

Network attacks’ continuous growth is becoming a severe challenge to build up adaptable cybersecurity techniques. Due to this challenge, the probability of losing data, hacking, and infringement has been grown with the Internet’s development and modernization [22]. As a result, IDSs have become an essential component for information security to protect the computer network from a particular attack. Network Intrusion Detection Systems’ (NIDS) role is to actively supervise the network’s function while detecting malicious activities [2, 4]. As network traffic gets larger, it is becoming a major problem to design and implement the NIDS to achieve the objective [59].

One of the problems of machine learning-based NIDS is missing intrusions (attacks) while learning from imbalanced data [9]. There are four main categories of attacks that include Denial of Service (DoS), probe, User-to-Root (U2R) and Remote-to-Local (R2L) [35]. DoS and probe attacks can be identified easily by careful investigation of the statistics of the connections at the vulnerable host machine. But, it is very difficult to detect the low-frequency attacks like U2R and R2L. This is because the occurrence of U2R and R2L is very rare and their number of samples are too less (i.e., extreme minority classes) when compared with the DoS and probe attacks. Therefore, the effect of data imbalance poses serious issues in intrusion detection performance [33].

Data imbalance is one of the classification problems in data mining and machine learning when certain minority classes have fewer instances than majority classes. The skewed distribution of classes could make the classifier produces adverse prediction performance for the minority classes. Various researches have suggested several solutions to solve two-class imbalanced learning [6, 7, 13, 26]. However, the two-class classification may not cover all cases in real-world applications since imbalanced multi-class data is occurring more often. The problem of multi-class data imbalance is more difficult compared with two-class cases during classification. The reason may be that multi-minority and multi-majority classes can complicate learning from imbalanced data [53].

In the domain of NIDS, studies have suggested different approaches to solve the problem of class imbalance that help in achieving better prediction performance [23, 39, 40, 43, 44, 57, 60, 61]. However, the detection accuracy of U2R and R2L attacks is still low and need to be improved.

The exact ratio of class distribution in multi-class classification is also not yet sufficiently studied in NIDS. Even though the optimal ratio of majority and minority classes is suggested to be 65% and 35% respectively for two-class classification [20], there is no clear justification for the optimal ratio of classes for multi-class classification to obtain the best prediction accuracy.

Retaining the natural distribution of classes in multi-class datasets is also an other issue to be considered while improving the general accuracy. Replication of minority class instances to balance with the majority class may not always be practical. For example, in NIDS, U2R and R2L attacks occur rarely and their number of instances is very less compared to normal, DoS and probe classes. Replicating the instances of these classes to have equal or nearly equal number of instances with the majority classes become impractical in real-world applications. Furthermore, the problem of high computational time while learning from imbalanced data in multi-class dataset needs to be solved in NIDS.

The class balancing methods like Random Over Sampling (ROS) may create over-fitting since its new instances come from the replication of original instances [53]. Synthetic Minority Oversampling Technique (SMOTE) can increase the problem of over generalization even though it decreases the risk of over-fitting. In imbalanced multi-class dataset, the generation of wrong synthetic instances can make the area of minority class to be generalized into the region of majority class. This over generalization can decrease the learnability of minority classes instances [6, 62]. In SMOTE, there is no clear agreement about the amount of percentage to be added for the instances of the minority classes during multi-class balancing. That means, by what percent do the minority classes increase in light of the size of the majority class instances to obtain the optimal class distribution?

To build robust machine learning model for NIDS, the problem of data imbalance needs to be considered. Some main issues should be examined with extensive effort both theoretically and empirically to understand in detail about the effect of imbalanced data problems for detection of attacks. The multi-class data balancing approaches have to be explored that make the machine learning algorithms to perform better when compared with learning from the skewed class distributions in NIDS. The extent of ratios for the distribution of classes while balancing from the original multi-class dataset is an other topic that needs to be studied [13].

It is known that the performance of machine learning models is largely evaluated using predictive accuracy. If the classification categories are not more or less uniformly distributed, a dataset is considered imbalanced. When the data is imbalanced, it is not easy to obtain accurate results during classification. The reason is that the classifiers favor the majority classes and the minority classes are not able to be correctly detected [16].

The motivation of our research is to propose a machine learning-based NIDS framework by integrating with the SMMO data balancing approach and collaborative feature selection for multi-class data. The framework may help balance the ratio of classes in a multi-class dataset and improve the detection accuracy of the extreme minority classes, i.e., U2R and R2L in NIDS. Furthermore, the framework integrates multi-class data balancing with the collaborative feature selection method to obtain the essential features that help improve NIDS’s general prediction performance.

This paper proposes a Synthetic Multi-minority Oversampling (SMMO) data balancing by integrating it with a collaborative feature selection (CoFS) approach for NIDS. The relevant features are selected in our work using the collaboration of the Correlation-based Feature Selection (CFS) technique and evolutionary search algorithm. This work aims to help the NIDS improve the detection accuracy of low-frequency attacks and extreme minority classes, i.e., U2R and R2L, through achieving optimal class distribution and selecting important features.

In the domain of NIDS, even though several approaches exist, to the best of our knowledge, further studies are needed to justify the optimal ratio of classes for multi-class classification. More specifically, there are no sufficient studies that show the exact ratio of classes’ distribution to obtain the best prediction accuracy for extreme minority classes in NIDS. Our work is novel in that we apply an SMMO data balancing approach to improve the classes’ degree of imbalance in NIDS with a multi-class dataset. There are no such approaches that apply the CoFS method for selecting the essential features from the balanced dataset using SMMO in NIDS. This approach helps the NIDS improve the general prediction performance and the detection accuracy of the extreme minority classes by balancing the multi-class distribution while selecting important features.

The contributions of this research to the domain of network intrusion detection systems are summarized as follows.

  1. 1.

    We propose an SMMO to improve the degree of multi-class imbalance rather than two-class by integrating with the CoFS approach. We find the difference between majority class’s instances and other minority classes in the dataset in the proposed approach. Their difference helps decide the number of synthetic instances to be generated for each minority class during multi-class balancing.

  2. 2.

    To select the essential features from the balanced dataset, we apply a collaborative feature selection approach that collaborates with CFS and an evolutionary search algorithm.

  3. 3.

    We perform extensive experiments to evaluate the effectiveness of the proposed approach. The experimental results show that SMMO with

    CoFS (i.e., SMMO-CoFS) produces a significant improvement of the detection accuracy for the extreme minority classes (i.e., U2R and R2L) as well as the general prediction accuracy on a multi-class intrusion dataset when compared to several other techniques.

The remaining parts of the paper are organized as follows. Section  2 summarizes related works about class balancing in the domain of network intrusion detection systems. In Sect.  3, the details of the proposed SMMO-CoFS method are presented. In Sect.  4, experiments and the results are analyzed. In Sect.  5, the discussion of the experimental results is presented including their implications. In Sect.  6, we conclude our work and outline the future works.

2 Related Works

There have been various machine learning techniques that have been published on data balancing for NIDS, such as a Synthetic Minority Oversampling Technique (SMOTE) [39, 40, 44, 57, 60], Sample Selection (SS) [5], Difficult Set Sampling Technique (DSSTE) algorithm [28] and a Data Generative Model (DGM) [9]. Deep learning techniques also have been applied to solve data imbalance in NIDs, such as the unsupervised deep learning method based on the Generative Adversarial Networks (GAN) [23], Deep Reinforcement Learning (DRL) [29], Imbalanced Generative Adversarial Network (IGAN) [14], Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM) [31].

SMOTE has been applied in several types of research to solve the data imbalance issues in NIDS by integrating various machine learning techniques. Yan et al. [57] proposed a Region Adaptive (RA) SMOTE with random forest, Back Propagation Neural Network (BPNN) and Support Vector Machines (SVM) classifiers. Zhang et al. [60] applied the SMOTE combined with Edited Nearest Neighbors (SMOTE-ENN) algorithm and Convolutional Neural Network (CNN) to balance the skewed network traffic. Parsaei et al. [39] employed a hybrid approach that combines cluster center and nearest neighbor (CANN) with SMOTE for balancing the dataset. Sun et al. [46] proposed the SMOTE by integrating it with Neighborhood Cleaning Rule (NCL) for balancing the data. Liu et al. [28] propose a Difficult Set Sampling Technique(DSSTE) algorithm to tackle the class imbalance problem. In their method, they use the Edited Nearest Neighbor (ENN) algorithm to divide the imbalanced training set into the difficult set and the easy set. To compress the majority samples, they use the K-Means algorithm that help reduce the majority in the difficult set. Even though these studies improved general detection accuracy, their performance for the extreme minority classes like U2R and R2L is still very low. In addition, SMOTE can increase the problem of over-generalization though it decreases the risk of over-fitting and this over-generalization can decrease the learnability of minority classes instances [6, 62].

Despite the increasing popularity of deep learning, only some empirical works exist in the domain of NIDS to solve the problem of data imbalance. Lee et al. [23] suggested the Generative Adversarial Networks (GAN) model for solving the class imbalance in NIDS by generating new virtual data similar to the existing data. Lopez-Martin et al. [29] proposed a Radial Basis Function Neural Networks (RBFNN) for optimizing parameters within a reinforcement learning framework. The RBFNN classifier plays the role of the policy network in a Deep Reinforcement Learning (DRL) training scheme to handle classification problems with unbalanced datasets and noisy data. Huang et al. [14] proposed an Imbalanced Generative Adversarial Network (IGAN) to tackle the class imbalance problem in NIDS. They introduced an imbalanced data filter and convolutional layers to the GAN by generating new representative instances for minority classes. Meliboev et al. [31] applied the deep learning architectures of a single CNN and a combination of LSTM with SMOTE data balancing technique to improve the classification performance.

The data imbalance topic in NIDS so far focuses mainly on the extreme minority classes (i.e., U2R and R2L) during multi-classification. To increase the number of the minority classes, Phetlasy et al. [40] applied SMOTE with sequential classifier that improved sensitivity and accuracy. Abebe and Lalitha [48] synthetically oversampled the U2R and R2L attacks by 800% to improve the detection accuracy of the minority classes using random forest classifiers. However, it is not clear how they decided the amount of the oversampling percentage. Seo et al. [44] randomly generated several tuples using SMOTE to optimize the ratios of the rare classes (U2R, R2L, and probe). Each instance in the test dataset was assigned to their support vector regression model, and the best SMOTE ratios were chosen. Khor et al. [19] investigated the effectiveness of under-sampling and over-sampling methods for the rare classes problem in intrusion dataset. Their study confirmed that the two methods are less effective because of the overlapping classes in the data set. According to their study, the major learning algorithms could not generalize R2L and U2R since learning algorithms generally favor the dominant classes. Their study also showed that the overlapping classes could create difficulties in the feature selection phase. Rodda et al. [43] analyzed the class imbalance problem in intrusion datasets. In their study, they evaluated the effect of class imbalance on the benchmark NSL-KDD dataset. Their analysis was based on the four popular classification techniques (Naïve Bayes, BayesNet, J48 and random forest). Their analysis showed that none of the models achieved adequate performance for the rare class U2R.

However, still the above studies did not explain to what extent the multi-minority classes should be oversampled in relation to the majority class. The detection accuracy of the extreme minority classes is still low and need to be improved. Furthermore, they did not consider collaborative feature selection method to acquire the essential features.

3 The Proposed Framework

The purpose of our proposed SMMO-CoFS framework is to improve the class distribution of the imbalanced multi-class dataset in a network intrusion detection system. After data is balanced using SMMO, it is integrated with the CoFS approach to select important features. The integration of SMMO with CoFS is used to improve the detection accuracy of the extreme minority classes in NIDS. These extreme minority classes mainly include R2L and U2R attacks. The scheme of the proposed framework is shown in Fig. 1 and its high-level description is shown in Pseudo-code 1.

Fig. 1
figure 1

The scheme of the proposed framework

3.1 SMMO-CoFS

Even though SMOTE helps balance the binary class dataset (i.e., two categories), it has limitations to solve skewed data in multi-class datasets (i.e., multiple categories). There may be multiple minorities and multiple majority classes in some datasets. Therefore, to achieve the balanced distribution of instances in NIDS with a multi-class dataset, we propose an SMMO-CoFS. In our proposed method, SMMO calculates the difference between the number of instances of a majority class and the remaining minority classes. The calculated difference of instances between a majority class and each minority class is used to decide the number of synthetic instances to be generated. Then, each minority class is iteratively oversampled using the synthetically generated instances. After the distribution of classes in a multi-class dataset is improved, CoFS selects the important features that help enhance the general prediction performance of NIDS. As a result, SMMO-CoFS improves the detection accuracy of the extreme minority classes.

SMMO uses the following steps to obtain the balanced multi-class data set without affecting its natural class distribution [15].

  1. 1.

    Identify the majority class and its number of class samples. NSL-KDD dataset contains five classes which are normal, Denial of Services (DoS), probe, U2R (User-to-Root) and R2L

    (Remote-to-Local). Among these classes, normal class is the majority class with maximum number of class samples.

  2. 2.

    Determine the weight of class imbalance between the majority class and the remaining minority classes.

    $$\begin{aligned} w_{l}=\frac{M}{(L)\left( X_{l}\right) }, \end{aligned}$$
    (1)

    where \(w_{l}\) is the weight of class imbalance between the \(l\textrm{th}\) minority class and the majority class (w < \(\tau\), \(\tau\) is the threshold of class imbalance), M is the total number of samples in the training dataset, L is the number of unique target classes and \(X_{l}\) is the number of samples for the \(l\textrm{th}\) minority class.

  3. 3.

    Determine the number of synthetic data instances to be created for each minority class:

    $$\begin{aligned} { G } _ { l }= {J} - {X}_{l}, \end{aligned}$$
    (2)

    where \({ G } _ { l }\) is number of synthetic data instances to be created for the \(l\textrm{th}\) minority class, J is the number of majority class instances, \({X}_{l}\) is the number of the \(l\textrm{th}\) minority class instances.

  4. 4.

    Compute the ratio of each instance \(x_{i}\) and the K nearest neighbors that we find based on the Euclidean distance:

    $$\begin{aligned} r_{i}=q_{i} / K, i=1, \ldots , X, \end{aligned}$$
    (3)

    where \(q_{i}\) is the number of instances in the K nearest neighbors of \(x_{i}\) that belongs to the majority class, \(r_{i}\) \(\in\) [0, 1] .

  5. 5.

    Normalize \(r_{i}\) according to the density distribution \(\hat{r}_{i}=r_{i} / \sum _{i=1}^{X} r_{i}\), (\(\sum _{i=1}^{X} \hat{r}_{i} = 1\)).

  6. 6.

    Determine the number of synthetic data instances m to be created for each minority instance \(x_{i}\):

    $$\begin{aligned} m_{i}=\hat{r}_{i} \times G_{l}. \end{aligned}$$
    (4)
  7. 7.

    Create synthetic data instances \(S_{m_{i}}\) for each minority class data instance \(x_{i}\):

    $$\begin{aligned} S_{m_{i}}=x_{i}+\left( x_{\eta i}-x_{i}\right) \times \delta , \end{aligned}$$
    (5)

    where \(x_{\eta i}\) is a randomly selected instance from K-nearest neighbor, \(\delta\) is a vector in which each element is a random number, \(\delta\) \(\in\) [0,1].

  8. 8.

    To decide the acceptance of the new synthetic samples, compare the distance of the new synthetic instances between the majority and the old minority class samples. Suppose that Q = {\({Q_{1}, Q_{2}, \dots , Q_{m}}\)} is the set of the new synthetic instances, and \(Q_{i}^{(j)}\) is the \(j\textrm{th}\) feature value of \(Q_{i}\), j \(\in\) [1, n], where n is the number of features. Let \(S_{J} = \{{S_{J1}, S_{J2}, \dots , S_{JZ}}\)} be the set of majority samples and \(S_{ml} = \{{S_{ml1}, S_{ml2}, \dots , S_{mly}}\)} be the set of samples of the \(l\textrm{th}\) minority class, then we calculate the distance between \(Q_{i}\) and \(S_{Ja}\) (i.e., \(D_\textrm{majority}(Q_{i}, S_{Ja})\)). We also calculate the distance between \(Q_{i}\) and \(S_{mlb}\) (i.e., \(D_\textrm{minority}(Q_{i}, S_{mlb})\)). For each of the newly generated synthetic instances iteratively, the distance is calculated as

    $$\begin{aligned}&\begin{aligned} D_{ \textrm{majority }}\left( {Q}_{i}, S_{Ja}\right) =\\ \sum _{j=1}^{n} \sqrt{\left( {Q}_{i}^{(j)}-S_{Ja}^{(j)}\right) ^{2}}, a \in [1, Z], \end{aligned} \end{aligned}$$
    (6)
    $$\begin{aligned}&\begin{aligned} D_{ \textrm{minority }}\left( {Q}_{i}, S_{mlb}\right) =\\ \sum _{j=1}^{n} \sqrt{\left( {Q}_{i}^{(j)}-S_{mlb}^{(j)}\right) ^{2}}, b \in [1, y], \end{aligned} \end{aligned}$$
    (7)

    where \(D_{ \textrm{majority }}\left( {Q}_{i}, S_{Ja}\right)\) is the distance between the new synthetic instance \(Q_{i}\) and the majority instance \(S_{Ja}\), and \(D_{\textrm{minority }}\left( {Q}_{i}, S_{mlb}\right)\) is the distance between the new synthetic instance \(Q_{i}\) and the old minority instance \(S_{mlb}\), which is the \(l\textrm{th}\) minority class. The distance is calculated for all newly created synthetic data instances with each minority class. Based on the distance, if the minimum value of each \(D_\textrm{minority}\) is less than the minimum value of \(D_\textrm{majority}\), then the new synthetic instance is accepted otherwise, it is rejected.

  9. 9.

    Determine the percentage of the synthetic samples to be added for each minority class. We use ceiling function to find the integral multiple of 100 and defined as

    $$\begin{aligned} {A}_{l} = \left\lceil \frac{S_{m_{i}}}{100} \right\rceil , \end{aligned}$$
    (8)

    where \({A}_{l}\) is the number of class samples to be added for the \(l\textrm{th}\) minority class.

  10. 10.

    Iteratively oversample each minority class using the generated synthetic instances

    $$\begin{aligned} S_{m_{l}}=X_{l}+A_{l} \times \delta , \end{aligned}$$
    (9)

    where \(S_{m_{l}}\) is the \(l\textrm{th}\) minority class, which is synthetically oversampled, and \(X_{l}\) is the old number of instances for the \(l\textrm{th}\) minority class.

3.2 Collaborative Feature Selection (CoFS)

In the proposed framework, we use the collaboration of the Evolutionary Algorithm (EA) and Correlation Feature Selection (CFS) technique to select the important features from the improved dataset in its class distribution (i.e., balanced dataset). Using EA, we find the best solution from the population of attributes. Then CFS evaluates the obtained subset of the solution to select the most important features subset for classification purposes.

The advantage of EA over others is it allows the best solution to emerge from the best of prior solutions. It improves the selection of features over time. EA creates new and more fitted individuals by combining the generation of the different solutions and extracts the best genes (features) from each one [12, 34]. Additionally, the worth of a subset of attributes is evaluated using the CFS technique. CFS considers the individual predictive ability of each feature along with the degree of redundancy between them.

Collaborative feature selection uses the following steps to select useful features from the balanced dataset.

  1. 1.

    Encode features (solutions to the problem) using EA with a constant length vector of bits where \(x(i)=1\) denotes that EA selects the \(i\textrm{th}\) feature and \(x(i) =0\) means the opposite.

  2. 2.

    Make a random initialization in EA to populate the initial population with completely random features.

  3. 3.

    Determine the fitness function (evaluation function) to check how fit a solution is. It evaluates how close a given solution subset is to the optimum number of features for NIDS. To assess the fitness of each feature from the solution subsets, we use the CFS subset evaluation method. The feature that is highly correlated with the class is the fittest. It is described as follows [55].

    $$\begin{aligned} f_{t}=\frac{n r_{c f}}{\sqrt{n+n(n-1) r_{f f}}}, \end{aligned}$$
    (10)

    where \(f_{t}\) is the fitness score of the feature (gene) in the population, f is the feature in one of the solution subsets (chromosome), n is the number of features in each solution subsets, \(r_{cf}\) is the mean of class-feature correlations and \(r_{ff}\) is the mean of feature–feature correlations in the obtained solution subsets from the population.

  4. 4.

    Determine the selection operation for EA. In this research, we use binary tournament selection. Its efficiency for both non-parallel and parallel architecture makes this selection technique preferable to random and rank selection techniques. It does not require sorting the whole population, and its selection pressure can be adjusted easily. In T-way tournament selection, it selects T features from the population at random and chooses the best out of them to become a parent. The same processes are repeated for selecting the next parent feature [1, 41].

  5. 5.

    Determine the crossover operation for EA. We use single-point crossover since it is probably one of the simplest crossover operators. A random crossover point is selected in this operator, and the tails of its two parents are swapped to get new off-spring features. The probability of crossover can be described as [50, 56]:

    $$\begin{aligned} P_{c}=\left\{ \begin{array}{c} \frac{P_{c}\left( f_{t_{\max }}-F\right) }{f_{t_{\max }}-f_{t_{\min }}}, \quad F \ge f_{t_{\text{ avg } }} \\ P_{c 1}, \quad F<f_{\text{ tavg } }, \end{array}\right. \end{aligned}$$
    (11)

    where \(P_{c}\) is the probability of crossover, F is the value of the more adaptable parent

    employed for crossover, \(F_{t_\textrm{min}}\) is the smallest value of fitness obtained from the population, \(f_{t_\textrm{max}}\) is the greatest value of fitness in the population, \(P_{c1}\) is the numbers whose values are within the range of (0,1) and \(f_{t_\textrm{avg}}\) is the value of average fitness obtained from the population.

  6. 6.

    Determine the mutation operation for EA. We use a bit-flip mutation operator to optimize functions over binary strings. The bit-flip mutation acts independently on each bit in a solution. This operator changes the value of the bit (i.e., from 0 to 1 and vice versa) with the probability of mutation \(P_{m}\) and it is described as [45]:

    $$\begin{aligned} P_{m}=\left\{ \begin{array}{c} \frac{P_{m 1}\left( f_{t \max }-F\right) }{f_{t_{\max }}-f_{t_{\min }}}, \quad F \ge f_{t_{{\text {avg}}}} \\ P_{m 1}, \quad F<f_{t_{\text{ avg } }}, \end{array}\right. \end{aligned}$$
    (12)

    Where F is the value of the more adaptable parent employed for mutation, \(f_{t_\textrm{min}}\) is the smallest value of fitness obtained from the population, \(f_{t_\textrm{max}}\) is the greatest value of fitness in the population, \(P_{m1}\) is the numbers whose values are within the range of (0, 1) and \(f_{t_\textrm{avg}}\) is the value of average fitness obtained from the population.

  7. 7.

    Determine the survivor selection technique to identify the best subset of solutions (features). We use elitism-based survivor selection (generational replacement) since it is more efficient to improve the performance of EA. Elitism-based survivor selection determines which features are to be discarded and which are to be kept in the next generation. In this selection technique the best subset of solutions (features) are always kept [58].

3.3 Deployment of SMMO-CoFS-Based NIDS

Even though Machine Learning (ML)-based network intrusion detection systems can take the network traffic (input) to predict a network behavior, these systems still cannot be directly implemented in real-time network environments. The ML-based models do not have packet sniffers that capture the network traffic in real-time. A network sniffer (packet sniffer) is a traffic monitoring and analysis tool that can identify (sniff out) the data flowing over the network in real-time. For achieving real-time detection, the developed ML-based network intrusion detection systems have to work with packet sniffers, such as Spark, Bro, and Snort [25].

Fig. 2
figure 2

The scheme of SMMO-CoFS-based NIDS deployment

When our proposed SMMO-CoFS-based NIDS is deployed, the packet sniffer captures the network traffic and will be pre-processed. SMMO-CoFS is used to pre-process the data. The processed data will be used as input for the prediction decision of the classifier. Finally, the classifier predicts one of the five target classes, including the normal traffic data and one of the four attacks (i.e., DoS, probe, R2L and U2R). The network administrators will decide the required network intrusion prevention mechanisms based on the report. The deployment of SMMO-CoFS-based NIDS is as shown in Fig. 2.

4 Experiments and Result Analysis

4.1 Benchmark Dataset

We conduct our experiment using Network Security Laboratory-Knowledge Discovery and Data Mining(NSL-KDD) dataset to evaluate our proposed method. NSL-KDD dataset is an effective benchmark data set to help researchers compare different intrusion detection methods despite a lack of public data sets for network-based IDSs. The data set, NSL-KDD, comprises 41 features with five classes labeled as attacks type (intrusions) and normal. The intrusions are categorized into four classes of attacks, which are: Denial of Services (DoS), Probe, User-to-Root (U2R), and Remote-to-Local (R2L) [30, 37, 47].

figure a

4.2 Experimental Framework

In our experiment, to test our proposed method’s prediction performance, we make the partition of data by applying 10-fold cross-validation. Then we build models on four learners viz. random forest, J48, BayesNet, and AdaBoostM1. We also carry out the statistical significance test to evaluate the significance of the SMMO-CoFS method’s improved performance. This significance test is evaluated against other methods using the Wilcoxon statistical significance test. After conducting an extensive experiment, we also compare our proposed method’s performance with other methods previously proposed by other researchers.

RapidMiner Studio Professional 9.6 [32], WEKA version 3.8 [52], KEEL (Knowledge Extraction based on Evolutionary Learning) software tool [49], OriginPro 2018 [38], and IBM SPSS Statistic 23 [36] are adopted to create and examine the empirical results.

4.3 Evaluation Metrics

4.3.1 Prediction Performance Test

Table 1 The description of notations

The evaluation of performance in our experiment is conducted by using the evaluation metrics: True Positive Rate (TPR), Precision (Pr), Recall (Rc), F Measure (F1), and Receiver Operating Characteristic (ROC) curve and time (s). The degree of imbalance (mild, moderate and extreme), and the proportion of minority classes is also considered. Since the classification is multi-class, for each classes l, the number of classes L and the micro average \(\mu\), the formulas to calculate the evaluation metrics are given as follows [17, 18]. The description of the notations which are used in the formulas is also shown in Table 1.

$$\begin{aligned} (\textrm{TPR})_{l}&= \frac{ (\textrm{TP})_{l} }{ (\textrm{TP})_{l} + (\textrm{FN})_{l} } \end{aligned}$$
(13)
$$\begin{aligned} (\textrm{FPR})_{l}&= \frac{ (\textrm{FP})_{l} }{ (\textrm{TN})_{l} + (\textrm{FP})_{l} } \end{aligned}$$
(14)
$$\begin{aligned} (\textrm{Pr})_{l}&= \frac{ (\textrm{TP})_{l} }{ (\textrm{TP})_{l} + (\textrm{FP})_{l} } \end{aligned}$$
(15)
$$\begin{aligned} (\textrm{Rc})_{l}&= \frac{ (\textrm{TP})_{l} }{ (\textrm{TP})_{l} + (\textrm{FN})_{l} } \end{aligned}$$
(16)
$$\begin{aligned} (\textrm{Acc})_\textrm{avg}&= \frac{ \sum _ { l = 1 } ^ { L } \frac{ (\textrm{TP}) _ { l } + (\textrm{TN}) _ { l } }{ (\textrm{TP}) _ { l } + (\textrm{TN}) _ { l } + (\textrm{FP}) _ { l } + (\textrm{TN}) _ { l } } }{ L } \end{aligned}$$
(17)
$$\begin{aligned} (\textrm{Pr})_{\mu }&= \frac{ \sum _ { l = 1 } ^ { L } (\textrm{TP}) _ { l } }{ \sum _{ l = 1 } ^ { L } \left( (\textrm{TP}) _ { l } + (\textrm{FP}) _ { l } \right) } \end{aligned}$$
(18)
$$\begin{aligned} (\textrm{TPR})_{\mu }&= \frac{ \sum _ { l = 1 } ^ { L } (TP) _ { l } }{ \sum _{ l = 1 } ^ { L } \left( (\textrm{TP}) _ { l } + (\textrm{FN}) _ { l } \right) } \end{aligned}$$
(19)
$$\begin{aligned} (\textrm{Rc})_{\mu }&= \frac{ \sum _ { l = 1 } ^ { L } (TP) _ { l } }{ \sum _{ l = 1 } ^ { L } \left( (\textrm{TP}) _ { l } + (\textrm{FN}) _ { l } \right) } \end{aligned}$$
(20)
$$\begin{aligned} (F1) _{\mu }&= \frac{ 2 * { (\textrm{Pr})} _ { \mu } * {(\textrm{Rc}) } _ { \mu } }{{(\textrm{Pr})} _ { \mu } + {(\textrm{Rc})} _ { \mu } }. \end{aligned}$$
(21)

4.3.2 Statistical Significance Test

To prove that the performance improvement of the SMMO-CoFS method is statistically significant, we use the mostly used Wilcoxon statistical significance test [8, 24]. The test is evaluated against the original data, the CoFS (no data balancing) and the SMOTE method. We also make a statistical test to prove how our proposed method significantly improves the results of the existing related research works.

The Wilcoxon signed-ranks test is a nonparametric statistical test that ranks the differences in performances of two feature selection methods using the dataset. It ignores the signs and compares the ranks for the positive and the negative differences. Let \(d_{i}\) be the difference between the performance scores of the feature selection methods on the \(i\textrm{th}\) classification model; then the differences are ranked according to their absolute values. The average ranks are assigned in case of ties. Let \(R^{+}\) be the sum of ranks on which the second algorithm outperformed the first, and \(R^{-}\) the sum of ranks for the opposite, then

$$\begin{aligned} R^{+}=\sum _{d_{i}>0} {\text {rank}}\left( d_{i}\right) +\frac{1}{2} \sum _{d_{i}=0} {\text {rank}}\left( d_{i}\right) \end{aligned}$$
(22)

and

$$\begin{aligned} R^{-}=\sum _{d_{i}<0} {\text {rank}}\left( d_{i}\right) +\frac{1}{2} \sum _{d_{i}=0} {\text {rank}}\left( d_{i}\right) . \end{aligned}$$
(23)

Ranks of \(d_{i} = 0\) are split evenly among the sums. If there is an odd number of them, one is ignored. Let the Wilcoxon test \(\tau\) be the smaller of the sums, then \(\tau\) = \(\textrm{min}(R^{+},R^{-})\) for confidence level of \(\alpha = 0.05\).

4.4 Result Analysis

Table 2 Distribution of instances in NSL-KDD Dataset for each class
Fig. 3
figure 3

Distribution of classes for original dataset and after applying Synthetic Multi-minority Oversample (SMMO)

4.4.1 Analysis of Degree of Imbalance for Multi-minority Classes

Table 2 and Fig. 3 show the distribution of instances for the five classes (i.e., normal, DoS, probe, U2R and R2L) for the original training dataset and after applying the proposed SMMO-CoFS method. As shown in Table 2, the minority class’s degree of imbalance, in the original training dataset, is mild for DoS, moderate for probe, and extreme for U2R and R2L [21] [42]. After the SMMO-CoFS method is applied, the distribution of instances for each class is improved. As shown in Table 2 and Fig. 3, the proportion of DoS class is slightly decreased from 36.45 to 36% while the proportion of probe, U2R and R2L is increased. The proportion of probe, U2R and R2L is increased from 9.25 to 9.6%, 0.04 to 0.6%, and 0.78 to 1.3%, respectively. As a result, the degree of imbalance for the R2L class is improved from extreme to moderate. The proportion of the majority class (i.e., normal class) is decreased from 53.45 to 52.5%, which improves the minority class’s degree of imbalance in the dataset by keeping the natural distribution of attacks.

4.4.2 Analysis of Recall and F measure

Table 3 Performance evaluation of SMMO-CoFS method in terms of TPR, Pr, recall, F1 and ROC for U2R and R2L classes using random forest and J48 classifiers by comparing with original data set
Fig. 4
figure 4

Performance of SMMO-CoFS with random forest, J48, BayesNet and AdaBoostM1 in terms of TPR for U2R and R2L classes

Fig. 5
figure 5

Performance of SMMO-CoFS with random forest, J48, BayesNet and AdaBoostM1 in terms of precision for U2R and R2L classes

Fig. 6
figure 6

The ROC curve of random forest classifier with the SMMO-CoFS method for the extreme minority classes U2R and R2L

Fig. 7
figure 7

The ROC curve of random forest classifier with the original dataset for the extreme minority classes U2R and R2L

Table 4 Performance evaluation of SMMO-CoFS method in terms of TPR, Pr, recall, F1 and ROC for U2R and R2L classes using BayesNet and AdaBoostM1 classifiers by comparing with original data set
Table 5 Performance evaluation of SMMO-CoFS method in terms of TPR, Pr, recall, F1 and ROC for U2R and R2L classes using random forest and J48 classifiers by comparing with collaborative feature selection (no data balancing)
Table 6 Performance evaluation of SMMO-CoFS method in terms of TPR, Pr, recall, F1 and ROC for U2R and R2L classes using BayesNet and AdaBoostM1 classifiers by comparing with collaborative feature selection (no data balancing)

In Tables 3 and 4, we report the detailed performance results of the proposed SMMO-CoFS compared to the original dataset and the SMOTE method in terms of True Positive Rate (TPR), Precision (Pr), Recall (Rc), F measure (F1), and ROC curve. These tables describe the SMMO-CoFS’s evaluation results for predicting the two extreme minority classes: U2R and R2L. Random forest, J48, BayesNet, and AdaBoostM1 classifiers are used for evaluating the performances. As shown in Tables 3 and 4, the performance of the SMMO-CoFS method outperforms the original data set and SMOTE in terms of all evaluation metrics (i.e., TPR, Pr, Rc, F-measure, and ROC curve) for the U2R class. SMMO-CoFS also achieves the improved performance for R2L in terms of ROC with J48 and Pr, F1 and ROC with BayesNet. The results in these tables also show that the proposed SMMO-CoFS method achieves the best performance with a random forest classifier in terms of most metrics. Furthermore, the results show a significant performance improvement for the extreme minority classes, especially for U2R, in terms of the indicated evaluation metrics. In Tables 3 and 4, the performance of SMMO-CoFS outperforms the original data set and SMOTE in terms of Rc and F1 for U2R class with all classifiers. It also achieves the improved performance in terms of F1 for R2L class with the BayesNet classifier.

In Tables 5 and 6, we report the detailed performance results of the proposed SMMO-CoFS method compared to CoFS (i.e., without data balancing) in terms of TPR, Pr, Rc, F1, and ROC curve for predicting U2R and R2L. We use the random forest, J48, BayesNet, and AdaBoostM1 classifiers for evaluating the performances. As shown in Tables 5 and 6, the SMMO-CoFS method achieves the best performance compared with the CoFS in terms of most evaluation metrics (i.e., TPR, Pr, Rc, F1, and ROC curve). The results in these tables also show that the proposed SMMO-CoFS method achieves the best performance with a random forest classifier in terms of most metrics. Furthermore, the results show a performance improvement for both extreme minority classes U2R and R2L in terms of most evaluation metrics.

4.4.3 Analysis of True-Positive Rate (TPR)

Figure 4 shows the performance of the proposed SMMO-CoFS method in terms of TPR with the four classifiers (i.e., random forest, J48, BayesNet and AdaBoostM1) for predicting U2R and R2L classes. The experimental results are taken from Tables 3, 4, 5 and 6. The results in this figure also show the performance of the four classifiers with the original dataset, CoFS and SMOTE. As shown in Fig. 4, the performance of the proposed SMMO-CoFS method outperforms the other methods using the four classifiers in terms of TPR. The proposed method also significantly improves the prediction performance of the indicated classifiers for U2R in terms of TPR, which is the extreme minority class.

4.4.4 Analysis of Precision

Figure 5 shows the performance of the proposed SMMO-CoFS method in terms of precision with the four classifiers (i.e., random forest, J48, BayesNet and AdaBoostM1). The experimental results are taken from Tables 3, 4, 5 and 6. The results in this figure also show the performance of the four classifiers with the original dataset, CoFS and SMOTE. As shown in Fig. 5, the proposed SMMO-CoFS method outperforms the other methods with the four classifiers for U2R and R2L classes in terms of Pr. The proposed method also significantly improves the prediction performance of the indicated classifiers for U2R in terms of Pr, which is the extreme minority class.

4.4.5 Analysis of ROC Curve

Figure 6 and 7 show the performance of the SMMO-CoFS method and the original dataset in terms of ROC curves. The ROC curves are constructed by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) of random forest classifier, which achieves the best performance as shown in Tables 3 and 4. Figure 6 shows the ROC curves of the two extreme minority classes (i.e., U2R and R2L) using the SMMO-CoFS method. As shown in this table, the classes’ ROC curves are aligned or very close to the upper-left corner of the graph, which shows that the results are very close to perfect classification. As shown in Fig. 7, the original dataset ROC curves are relatively far from the graph’s upper-left corner, which shows that the random forest classifier with the original dataset achieves less classification accuracy. Therefore, the proposed SMMO-CoFS method helps the classifier achieve efficient and accurate prediction for the extreme minority classes (i.e., U2R and R2L) in terms of ROC curve.

Table 7 Performance evaluation of SMMO-CoFS method in terms of time (s) using random forest, J48, BayesNet and AdaBoostM1 classifiers
Fig. 8
figure 8

Performance of SMMO-CoFS with random forest, J48, BayesNet and AdaBoostM1 in terms of processing time

4.4.6 Analysis of Time Complexity

Table 7 shows the performance evaluation of SMMO-CoFS method in terms of time with the four classifiers (i.e., random forest, J48, BayeNet and AdaBoostM1). The table also includes the the performance of classifiers with the original dataset, CoFS and SMOTE. As shown in this table, the proposed SMMO-CoFS method achieves faster processing speed with the four classifiers. The experimental results in Table 7 also show that the processing speed of all classifiers is higher (i.e., it takes slow processing time) with the SMOTE. Furthermore, the four classifiers with CoFS without data balancing achieves medium speed (time) (i.e., it is faster than the original dataset and SMOTE, but slower than the SMMO-CoFS).

Figure 8 shows the performance of the proposed SMMO-CoFS, original dataset, CoFS and SMOTE using the four classifiers (random forest, J48, BayesNet and AdaBoostM1) in terms of processing time. The experimental results are taken from Table 7. As shown in this figure, the processing time of the proposed SMMO-CoFS method is less (faster) than CoFS, SMOTE and the original dataset. Furthermore, Fig. 8 shows that the processing time of SMMO-CoFS method is faster with BayesNet classifier. On the other hand, the processing time of SMMO-CoFS method is slower with random forest classifier though better than the other methods.

4.4.7 Analysis of Statistical Significance Test

The results of Wilcoxon signed statistical significance test is shown in Table 8 that compares the performance of the proposed SMMO-CoFS method using the four classifiers with the original data, CoFS and SMOTE methods. The statistical significance test is conducted to confirm that our proposed method’s performance difference is statistically significant in terms of TPR for the extreme minority class U2R. This test includes random forest, J48, BayesNet, and AdaBoostM1 classifiers with the corresponding methods. Table 9 shows the results of the Wilcoxon signed statistical significance test to compare the proposed SMMO-CoFS method with the current research results in terms of accuracy and F measure.

In Tables 8 and 9, the P-values of all comparing methods are significant, i.e., P < 0.05 with confidence interval level of \(\alpha\) = 0.05. The test statistic \(\textrm{min}\)(\(R^{+},R^{-}\)) is smaller and their P-values are less than the threshold \(\alpha\) = 0.05. This statistical significance test shows that our proposed SMMO-CoFS method outperforms the other techniques and the original data’s performance. Therefore, the performance improvement that is achieved by our proposed method is statistically significant.

Table 10 shows the result of a statistical t-Test using SAS (Statistical Analysis Software) to prove that there is a significant difference between the means of the processing time for the proposed SMMO-CoFS method and the original dataset, CoFS and SMOTE methods. The test controls the Type I comparison-wise error rate with (\(\alpha\) = 0.05, Least Significant Difference (LSD) = 14.072, Error Degrees of Freedom (EDF) = 9, Error Mean Square (EMS) = 77.394, Critical Value of t (CV-t) = 2.262). As shown in Table 10, the t-grouping letters (i.e., A, B and C) indicate the difference and similarity of the mean values of the comparing methods. The mean values with the same letter are not significantly different. Table 10 shows that the proposed SMMO-CoFS method has different t-grouping compared to the original data and SMOTE method. Therefore, even though the processing time of the proposed method is not significantly different from CoFS technique, it achieves a statistically significant difference (improvement) when compared with the original data and SMOTE method.

Table 8 The result of Wilcoxon statistical significance test in terms of TPR with a significance threshold of \(\alpha\) = 0.05, sum of signed ranks \(R^{+}\) and \(R^{-}\), and P-values
Table 9 The result of Wilcoxon statistical significance test in terms of accuracy and F measure with a significance threshold of \(\alpha\) = 0.05, sum of signed ranks \(R^{+}\) and \(R^{-}\), and P-values
Table 10 The result of a statistical significance t test in terms of processing time with (\(\alpha\) = 0.05, LSD = 14.072, EDF = 9, EMS = 77.394, CV-t = 2.262)

4.4.8 Performance Comparison

The performance comparisons of our proposed method’s result with the existing current research works are presented in Tables 11 and 12. Table 11 exhibits some of the lists of NIDS frameworks suggested by different authors in previous researches. This table also demonstrates the comparison of the proposed SMMO-CoFS-based NIDS framework with other frameworks in terms of detection rate for the extreme minority classes (U2R and R2L). The frameworks are chosen since their problem domain is NIDS with machine learning topics. Furthermore, all the compared frameworks have been evaluated using the NSL-KDD dataset similar to our dataset.

As shown in Table 11, our proposed framework outperforms all other frameworks with the indicated performance evaluation metric. The evaluation metric detection rate, is chosen since it is used by most of the researchers. Therefore, the comparison results show that the proposed SMMO-CoFS method helps the NIDS achieve improved classification performance for the extreme minority classes compared with the other methods.

Table 12 shows the performance comparison of the proposed SMMO-CoFS method with the results of current existing related research works in terms of general accuracy and F-measure. The results in Table 12 proves that our proposed SMMO-CoFS method outperforms the other methods in terms of accuracy and F-measure.

Table 11 Performance comparison of the proposed framework with other NIDS frameworks in terms of detection rate for the extreme minority classes (U2R and R2L)
Table 12 Performance comparison of the proposed framework with other NIDS frameworks in terms of accuracy and F-measure

5 Discussion and Implications

5.1 Discussion

5.1.1 SMMO-CoFS Method in Improving Classes’ Degree of Imbalance

The experiment results in Table 2 and Fig. 3 show that SMMO-CoFS helps in improving the proportion of multi-class NIDS dataset and the minority classes’ degree of imbalance. Consequently, these results confirm that SMMO-CoFS helps in retaining the NIDS’s natural class distribution while improving sclasses. In addition, Figs. 6 and 7 show that data balancing helps in achieving best prediction performance when the measurement is in terms of ROC curves as explained in [6, 54].

5.1.2 SMMO-CoFS Method in Improving Detection Accuracy of Multi-minority Classes

The experimental results shown in Tables 3, 4, 5, 6, 8, 11, 9 and 12 as well as in Figs. 4, 5, 6 and 7 confirm that integrating multi-class data balancing with feature selection method helps the NIDS in improving the detection performance of the minority classes. Furthermore, the results assert that the integration of synthetic multi-minority oversampling and collaborative feature selection method improves the NIDS’s performance in detecting the extreme minority classes (i.e., U2R and R2L attacks). As a result, the proposed SMMO-CoFS method significantly improves the detection accuracy of these extreme minority classes as well as the general prediction performance of NIDS in multi-class dataset.

5.1.3 SMMO-CoFS Method in Selecting Useful Features

As shown in Table 7, the proposed SMMO-CoFS method helps the NIDS in selecting the essential features which contribute for improving the prediction performance. The experimental results shown in Tables 3, 4, 5 and 6 confirm that SMMO-CoFS outperforms CoFS in terms of most of the evaluation metrics. These results ascertain that SMMO-CoFS helps in selecting relevant and non-redundant features instead of using the dataset with skewed distribution of classes. As a result, the selected features further improve the prediction performance of classifiers to detect the extreme minority classes in NIDS.

5.1.4 SMMO-CoFS Method in Improving Processing Time

The experimental results presented in Tables 7 and 10, and Fig. 8 affirm that the proposed SMMO-CoFS achieves faster processing time. The experimental results assures that multi-class data balancing significantly helps the NIDS in achieving faster processing time while improving prediction performance. Furthermore, the results confirm that achieving both faster processing time and improved prediction performance is difficult. As shown in Table 7 and Fig. 8, random forest classifier achieves slower processing time when compared with other classifiers with SMMO-CoFS method. But random forest achieves the best prediction performance as shown in Tables 3, 4, 5 and 6 using the proposed SMMO-CoFS method. On the other hand, ByesNet classifier achieves faster processing time but its prediction performance is less than random forest’s performance with SMMO-CoFS method.

5.2 Implications

The integration of multi-class data balancing and collaborative feature selection methods help data scientists during data cleaning and pre-processing in NIDS. Therefore, the proposed SMMO-CoFS method certainly helps experts select the most appropriate and optimum number of features. It contributes to improving the prediction performance of machine learning models during data analysis tasks.

SMMO-CoFS method likely helps machine learning engineers develop and implement robust NIDS to detect the extreme minority classes (i.e., low-frequency attacks). For machine learning-based NIDS model developers, applying the SMMO-CoFS method certainly helps in building efficient NIDS. This NIDS contributes to network administrators effectively identifying, monitoring, and analyzing network traffic to protect a system from possible attacks.

In general, NIDS based on the SMMO-CoFS method certainly supports organizations in building high performance, secured, reliable and scalable network. It also helps companies in identifying any unwarranted networking threats before the intrusion penetrates into a full-force security attack, data leak, and service outages. Furthermore, the SMMO-CoFS method likely improves organizations’ business models and services by transforming large data sets into knowledge and actionable intelligence.

5.3 Limitations of the Study

Even though evolutionary algorithms have been demonstrated to be an effective approach for solving real-world optimization problems, they can not guarantee the optimal result in the presence of huge solution space [10]. In addition, the limitation of CFS is its failure to select features that have locally predictive values when overshadowed by strong, globally predictive features [11].

It would be interesting to apply a feature selection and instance selection concurrently with the combination of Reinforcement Learning (RL) and Deep Learning (DL) algorithms. The agent in RL may find an optimal classification policy in imbalanced data under the guidance of a specific reward function and beneficial learning environment. The approach may help select important features while acquiring dimensionality reduction and improving efficiency for NIDS [3, 51].

6 Conclusion

This paper has demonstrated the positive impact of integrating Synthetic Multi-minority Oversampling (SMMO) with collaborative feature selection (CoFS) to increase the detection accuracy of the extreme minority classes in network intrusion detection systems. SMMO-CoFS improves the distribution of classes in a multi-class dataset by minimizing the degree of imbalance for the multi-minority classes. As a result, the proportion of the extreme minority classes is improved. The ratio of instances for R2L is improved to moderate, and the ratio of U2R is increased in the multi-class NSL-KDD dataset. The study also indicates that applying a collaborative feature selection approach helps select relevant features while reducing dimensionality. Consequently, the proposed SMMO-CoFS method significantly improves the detection accuracy of these extreme minority classes in multi-class NIDS dataset.

In this work, we have compared our framework’s performance with the original dataset, collaborative feature selection method (i.e., without applying data balancing) and SMOTE. The proposed framework achieves the best classification performance using random forest, J48, BayesNet, and AdaBoostM1. The statistical significance of the detection accuracy improvement achieved by the proposed framework is tested. It is proved that the difference of improvement in terms of detection rate is statistically significant. Our proposed NIDS framework is also compared with other researchers proposed approaches, and it achieves the best performance in detecting the extreme minority classes (i.e., U2R and R2L attacks) as well as the general prediction performance of the NIDS.

The experiment results provide compelling evidence and suggest that the integration of SMMO with a collaborative feature selection helps achieve improved class distribution while selecting essential features. As a result, the framework contributes to improving the detection accuracy of the extreme minority classes in network intrusion detection systems. Therefore, the proposed framework is reliable for NIDSs, and it is a valuable method in achieving the improved detection performance of multi-minority classes.

In the future, deep learning approaches will be explored to build more effective and robust IDSs in computer network systems. Furthermore, we will explore deep learning to build stronger NIDS in Internet of Things (IoT) systems. It may help detect Distributed Denial of Service (DDoS) attack across the IoT systems.