1 Introduction

The COVID-19 outbreak and rapid spread of the coronavirus have driven organizations to shift their services online. In the wake of the epidemic, sectors, such as education, health care, entertainment, food services, and retail, have moved online to contain the spread of the virus. Most organizations have implemented work from home either fully or partially for the same reason. The digital shift underwent a quantum leap within a span of a few months. With the growth of dependencies on online services and activities due to emergency scenarios, it has become crucial for organizations to provide uninterrupted services/resources to end users. Unavailability of services/resources can incur a massive loss for organizations. DDoS attacks are one of the most effective attacks used by cybercriminals to prevent legitimate users from using the service by exhausting server resources. DDoS attacks have become more sophisticated with tremendous increases in volume [1]. In recent years, Internet of Things (IoT)-enabled DDoS attacks have increased at an alarming level. Cybercriminals are launching DDoS attacks using botnets (robot networks), an army of infected IoT devices, which increase the intensity of attacks to tbps. Botnets work like an army to launch a DDoS attack on a victim server through a centralized botnet command and control system [2]. The modern DDoS attack has become a mainstream commodity in the cyber world [3]. There are underground communities of cybercriminals [2], where each cybercriminal has an army of bots. They team up to launch DDoS attacks with much higher volume and velocity on the victim server. DDoS attacks can be direct attacks or an amplified reflection-based attack on the victim server [4]. Direct attacks are carried out by an army of infected devices, while reflective servers are used for amplified reflection-based attacks. Researcher [5] conducted a systematic mapping study to evaluate the most common cyber security threats by analyzing 78 primary studies. They reported denial of service as the most addressed vulnerability with the frequency of 37% in their systematic mapping study. F5Lab reported that 73% of DDoS attacks are volumetric DDoS attacks, out of which 53% are reflection-based attacks launched using vulnerable servers [6] or infected devices. Volumetric DDoS attacks exploit network protocol fragility to target servers. It overwhelms the server’s resources by sending massive traffic or service requests. Network protocol servers such as SSDP (simple service discovery protocol) and LDAP (lightweight directory access protocol) servers are well-known examples of volumetric DDoS attacks. The amplified reflection-based SSDP attack generates an enormous amount of traffic by exploiting the UPnP (universal plug and play) protocol. LDAP servers supporting UDP (user datagram protocol) services are used to launch volumetric LDAP DDoS attacks.

SYN flood attacks are another example of a volume-based DDoS attack. Attackers abuse the 3-way handshaking process of stateful TCP (transmission control protocol) connections to launch SYN flood attacks by repeatedly sending many SYN packets to the victim servers. Servers must reply to each SYN packet with the SYN-ACK (synchronize–acknowledge) flag and wait for each SYN request’s acknowledgment packet. This process uses some memory and processing power of a server. The flood of SYN requests overwhelm servers, resulting in service unavailable errors to any new request, including legitimate users. The connectionless nature of the UDP protocol causes the UDP protocol-based amplified reflection DDoS (AR-DDoS) attacks. AR-DDoS requires minimal effort [7] to launch an efficient volumetric DDoS attack. The connectionless nature is for the enormous benefit of the network, but attackers misuse it. They abuse UDP services such as LDAP, Memcached, NTP, and DNS to execute attacks.

With the ever-growing dependencies on the digital world, it has become vital to provide uninterrupted services to users. Unavailability of service for a fraction of time can cause considerable revenue loss for any business. Most organizations do not reveal if they are attacked by DDoS, making it difficult to estimate the financial loss caused by DDoS attacks. It has been reported that there is a 55% increase in DDoS attacks from January 2020 to March 2021 [6]. In 54% of incidents, attackers launched DDoS attacks using multiple modern attack vectors. Figure 1 depicts the modern attack vector. Despite many studies carried out by researchers [8,9,10,11,12,13,14,15,16,17,18,19] and leading organizations in cybersecurity sectors, DDoS attacks are rapidly growing and pose a tremendous threat to cyberspace.

Fig. 1
figure 1

A modern cyberattack overview

The significant contributions of this work are summarized as follows:

  • This paper proposes a weighted voting-based multimode machine learning framework, VMFCVD, to detect and mitigate volumetric DDoS attacks.

  • VMFCVD has three modes, namely FDM, DFDM & high accuracy mode (HAM), to classify network packets. Initially, HAM is activated. Based on the DDoS attack possibilities, the mode switches.

  • FDM has low computational and memory overhead, as it takes only two features to classify any network packet. It activates when the framework observes a high volume of incoming traffic.

  • DFDM has the same low computational and memory overhead as FDM. Only if all votes are in favor of a packet, does DFDM allow it.

  • HAM has more computational and memory overhead than FDM and DFDM, but gives the highest accuracy.

  • We have compared the performance of VMFCVD with traditional ML algorithms and with state-of-the-art baselines.

2 Related Work

DDoS attacks have a high potential to bring down unprotected servers within a small fraction of time, making them a growing concern for all organizations committed to providing uninterrupted services to their users. Various researchers have proposed techniques to defend against DDoS attacks. In this section, we discuss ML-based classification techniques to combat DDoS attacks.

Aamir and Zaid [8] proposed an ML framework to detect DDoS attacks where they initially applied feature engineering, such as backward elimination, Chi-square test, and information gain. Feature engineering supports the dataset to overcome issues related to missing values, skewness, and collinearity–multicollinearity. They applied five ML models, namely K-nearest neighbors (KNN), naive Bayes (NB), support vector machines (SVM), random forests (RF), and artificial neural networks (ANN), to evaluate their framework’s performance. To obtain optimal results, their experimental setup was different for each dataset. Doriguzzi-Corin et al. [9] proposed LUCID, a lightweight CNN-based DDoS attack detection technique intending to speed up network traffic classification on resource-containing devices. Their proposed approach produces consistent detection results on datasets such as ISCX2012, CIC2017, UNB201X, and CSECIC2018 with accuracies of 0.988, 0.9967, and 0.9946, respectively. They created a tool to extract network traffic into the required input format for LUCID for live detection. Jia et al. [10] presented two ML models, long short-term memory (LSTM) and convolutional neural network (CNN), to identify and classify malicious traffic. They called it Flowguard. Flowguard was validated on the CICDDoS2019 dataset and on their dataset generated using BoNeSi and SlowHTTPTest DDoS simulators. Their proposed model recorded an accuracy of 98.9% and outperformed other ML models implemented in work, namely ID3, random forest (RF), naive Bayes (NB), and logistic regression (LR) models.

Injadat et al. [11] proposed a multistage ML framework for network intrusion detection. The stages involved in their framework are data preprocessing, feature selection, hyperparameter optimization, and combination to give an optimized result. The main techniques used in various stages were Z score normalization & synthetic minority oversampling techniques for the first stage, mutual information gain and feature correlation for the second stage, and random search, meta-heuristic optimization algorithms, and Bayesian optimization techniques for parameter optimization in the third stage. Researchers evaluated the framework on the CICIDS 2017 and UNSW-NB 2015 datasets. They were able to enhance the detection accuracy by over 99%. Priyadarshini and Barik [12] proposed a deep learning-based DDoS attack mitigation technique to protect fog and cloud computing environments from DDoS attacks. They deployed the proposed mechanism on the SDN controller in a software-defined network. The proposed model was evaluated on the Hogzilla dataset. Researchers also evaluated it on live DDoS network traffic extracted using TCPDump. They obtained a maximum accuracy of 99.12% on the training sample and 98.88% on the test sample.

Aamir and Zaid [13] developed a clustering-based semisupervised ML scheme to improve DDoS detection. They applied agglomerative clustering and principal component analysis (PCA) with K-means clustering to reduce the dimensionality of the dataset. In the next step, they developed a voting technique to classify the label of the network traffic. They evaluated the proposed framework on the CICIDS2017 DDoS dataset, a subset of the CICIDS2017 dataset. The KNN, SVM, and RF algorithms determined the voting to classify the network traffic into benign or DDoS attacks. The KNN, SVM, and RF accuracies were 95%, 92%, and 96.66%, respectively, and the voting model obtained 82.10% accuracy. They intended to include more ML models to enhance the accuracy. Rehman et al. [14] proposed DIDDOS, a DL- and ML-based framework to detect DDoS attacks. DIDDOS was based on a gated recurrent unit (GRU), recurrent neural network (RNN) from deep learning, naive Bayes (NB), and sequential minimal optimization (SMO) from the machine learning paradigm. The experiment was performed on the CICDDoS2019 dataset. They recorded the highest accuracy of 99.91% using the GRU classifier on the CICDDoS2019 SSDP dataset. CICDDoS2019 SSDP is the subset of the CICDDoS2019 dataset with a massive number of records, where only 0.0003% of records are benign records and the remaining 99.9997% of records are SSDP DDoS attack records (763 benign and 2610611 SSDP DDoS records). Popoola et al. [15] developed a deep learning-based botnet attack detection framework called LAE-BLSTM by implementing a long short-term memory autoencoder (LAE) for dimensionality reduction and bidirectional long short-term memory (BLSTM) for the classification of network traffic into benign and malicious traffic. LAE-BLSTM experimented on the Bot-IoT dataset. LAE reduced the number of features from 37 to 6; deep BLSTM outperformed on a reduced dataset when the Nadam optimizer was applied. The framework is preferable for memory-constrained IoT devices.

Ravi and Shalinie [16] developed a learning-driven detection mitigation (LEDEM) mechanism to detect and mitigate DDoS attacks triggered by malicious IoT devices on IoT servers. Supervised machine learning techniques were used for attack detection. They proposed different fixed IoT (fIot) and mobile IoT (mIoT) techniques for attack mitigation. An approximation algorithm, fMS, was used for fIoT, where malicious IoT devices were grouped according to their VLAN id and the packets were dropped if they belonged to that VLAN or even disconnected that VLAN. A greedy drop rule was used for mIoT, which dropped all the incoming packets from the malicious IoT devices. A testbed was created to verify LEDEM in a real network. LEDEM was also demonstrated on the UNB-ISCX dataset. The achieved accuracy was 96.28%. Gu et al. [17] proposed SKM-HFS, a semisupervised weighted k-mean framework using hybrid feature selection for DDoS attack detection. The Hadoop-based hybrid feature selection technique was used to determine the vital features to obtain higher detection accuracy. SKM-HFS experimented on the DARPA, CAIDA2007, and CICID2017 datasets with accuracies of 99.68%, 99%, and 98.86%, respectively. An experiment was conducted for a real-world dataset and achieved an accuracy of 99.75%. Bawany et al. [18] discussed major challenges and requirements for an effective DDoS mechanism. They proposed a customizable framework called ProDefense (SDN-Based Proactive DDoS Defense Framework) to defend against DDoS attacks. ProDefense is a scalable & lightweight application that provides rapid detection of DDoS attacks without utilizing high computation power. It supports blocking port, diverting flows, and controlling bandwidth techniques to mitigate DDoS attacks. Idhammad et al. [19] propose semi-supervised DDoS detection, combining an unsupervised ML technique and a supervised ensembled ML technique. The unsupervised approach is based on co-clustering, entropy estimation, and information gain ratio that helps it to remove noisy data from network traffic. The processed network traffic improves the performance and reduces the false-positive rate. The supervised ensemble ML classifiers classify the malicious traffic accurately and reduce the false-positive rates. Researchers experimented their approach on NSL-KDD, UNB ISCX IDS 2012, and UNSW-NB1 dataset getting accuracies of 98.23%, 99.88%, and 93.71%, respectively.

3 Proposed Framework

In this work, a combination of ML models are used to detect DDoS attacks. The systematic flow of VMFCVD is given here: Fig. 2 exhibits the same.

  • The data cleaning module handles dataset noises such as single value columns, infinity values, and missing values well in advance.

  • The data transformation module normalizes data so that all the features in the dataset will have a normalized data range.

  • Extensive work on dimensionality reduction techniques ensures that all the features are ranked correctly based on various factors, such as feature importance, Pearson’s correlation, and mutual information gain.

  • The cluster formation module emphasizes the highly ranked features while creating clusters.

  • The module to find the best cluster is designed precisely to rank all the clusters quickly.

  • The ML models involved in voting are selected precisely. Each algorithm outperforms other algorithms one or more times during our testing on various datasets. There is no single algorithm that dominates others.

  • The voting calculation is dynamic. The weight calculation module ensures that the high-performing algorithm obtains higher weight during voting.

  • The prediction of the voting module for any network traffic is between 0 and 1; it is the probability of being benign or malicious flow. This value helps to fine-tune the final result of our voting classifier.

  • All three modes are optimized based on their development detection strategy. FDM is optimized toward detection speed, DFDM is optimized toward detecting malicious flow with high speed, and HAM is optimized toward accuracy without concerning the detection speed.

  • VMFCVD discards all the packets predicted as malicious to defend against a DDoS attack.

  • VMFCVD is validated on most of the benchmark DDoS and botnet attack datasets.

Fig. 2
figure 2

Flow diagram of VMFCVD

3.1 Data Preprocessing

Datasets may have noise that may hamper the performance of a ML model. Table 7 shows that most of the datasets have NA values. NA values can reduce the performance of the model if not handled correctly. There are many datasets where one or more features have a unit value. Unit values are identical for all the records; therefore, they do not affect the model’s accuracy, but they can increase the complexity of the model. Data transformation is equally essential to transform data from one dimension to another, convert nonnumeric data to numeric data or scale down the range of the dataset in a standard range of values. Data preprocessing includes all these steps and many more to produce a high-quality dataset so that any ML model can obtain a high accuracy while using low computational power. In VMFCVD, we worked extensively on data preprocessing, and the output of this module provides us with exceptionally high-quality dimensionally reduced noise-free datasets. Figure 3 exhibits an overview of our proposed data preprocessing module

Fig. 3
figure 3

Data preprocessing module

3.1.1 Data Cleaning

Data cleaning is one of the crucial steps of data preprocessing. We started with deleting features that had identical values. Features with identical values for each record do not affect the accuracy of the ML model. In the second step of data cleaning, we replaced all the \(-\mathrm{inf}\) and inf values with NA to handle them using our ImputeMissing module. It is required as \(-\mathrm{inf}\) & inf values are not supported by many ML models. At the end of data cleaning, we called the ImputeMissing module to determine the most appropriate value for each missing value occurring in the dataset. Deleting these values can be another option to handle missing values, but deleting a single missing value deletes the entire row. This can lead to the loss of a considerable amount of essential data. For example, the original UNSW-NB15 dataset has 2.27% data missing, but deleting missing values deletes 47.8% of the dataset. During data preprocessing, if a dataset has more than 40% missing data for a particular feature, we delete that feature entirely. Including these features can slow down the detection process, as for each missing data, the ImputeMissing module needs to be called. Including features with missing values can increase the processing overhead for VMCFD. The impact is high when VMFCVD switches to the low feature-based detection module, where we consider a minimum number of features for attack detection to speed up the attack detection process.

The developed missing value imputation module uses a linear regression algorithm to predict the missing value. We first categorized the dataset into two categories: COLS_WONA, which consisted of features without missing values, and COLS_WNA, which consisted of the features with missing values. We predicted missing values for each feature of COLS_WNA. A train–test split was performed in such a way that Y_Train contained all the nonmissing values, and Y_Test had all the missing values. The linear regression model was trained using X_Train and Y_Train, and then, the Y_Test was predicted using X_Test data. The pseudo-code is given by Algorithm 1.

figure a

3.1.2 Data Transformation

This is a process to restructure the dataset from one to another. It enhances the dataset’s quality and organizes it in a more precise format to improve the performance of ML models. Data normalization and data encoding are some resources involved in data transformation. Normalization is a process to rescale and transform the data so that each feature can have an identical range of values [20]. It ensures that the ML model is not biased toward any feature with a wide range of values compared to features with a value range in the single digits or a value even smaller than the single-digit range. It produces an improved version of the dataset with equal importance for each feature as input to the learning model. We used the StandardScaler transformation technique to normalize the dataset, where we scaled down all numeric values between \(-1\) and 1. Data encoding is another imperative process used to transform categorical features into numerical features. Most ML models perform excellently on numerical features, while categorical features are not welcomed by many. Encoding categorical features to numeric features improves the performance of the model. We have applied OneHotEncoder, the most widely used data encoding technique, to transform categorical features into numerical features.

3.1.3 Dimensionality Reduction

Dimensionality reduction is a technique to identify an extraneous feature of a dataset that either decreases the ML model’s performance or does not improve the ML model’s performance. Datasets may have redundant features and irrelevant information, negatively affecting the ML model’s performance [21]. Figures 4 and 5 show that if we increase the number of features, it increases the time complexity of the model. It is also observed that increasing the feature size does not mean that it will increase the model’s accuracy. It is essential to achieve a reduced set of features that can give high accuracy and reduce the difficulties of the learning process [22]. Our investigation shows that lowering the dimension improves the detection speed. We have considered feature importance (FI), mutual information gain, and feature correlation to obtain high-ranked features for training the voting model.

Fig. 4
figure 4

Increase in complexity

Fig. 5
figure 5

Unpredictable accuracy

Feature Importance Feature importance describes how important a feature is for the model’s classification performance. There are different ML models that help to measure feature importance. During the experiment, it was observed that different ML models give diverse feature importance (FI) values on the same dataset. Each ML model’s FI values were different; for the same feature, one algorithm showed zero importance, albeit the other showed a considerable positive value. To ensure that we did not miss any potential features, we consolidated three ML models, namely LightGBM, XGBoost, and DecessionTree, to measure feature importance.

We observed that the data range of each ML model’s FI varied widely. After calculating FI, we rescaled the FI range for each algorithm to the range of [0, 1]. The min–max normalization function is given as:

$$\begin{aligned} {\bar{X}} = \frac{X - {\textit{min}}(X)}{{\textit{max}}(X) - {\textit{min}}(X)} \end{aligned}$$
(1)

where

X:

is a list of features with their calculated FI

\({\bar{X}}\):

is the normalized value calculated from X.

Once the FI values were measured and normalized, we combined the FI values of all three models. A graph is plotted in Fig. 6 based on the combined FI values. Features with higher FI values are essential features for the classification model.

Fig. 6
figure 6

Combined feature importance

Feature Correlation This is the crucial stage where we determined the correlation between all the features in the dataset. If two features are highly correlated, then we can keep one feature and drop the other feature. Features that are highly correlated with classification output are preferred; features highly associated with other features are not suitable for ML algorithms [23]. We calculated the correlation between features using Pearson’s correlation coefficient formula.

$$\begin{aligned} P_{XY} = \frac{ \sum _{i=1}^{n} (X_{i} - {\bar{X}}) (Y_{i} - {\bar{Y}})}{\sqrt{\sum \limits _{i=1}^{n} (X_{i} - {\bar{X}})^{2}} \sqrt{\sum \limits _{i=1}^{n} (Y_{i} - {\bar{Y}})^{2}}} \end{aligned}$$
(2)

where

X and Y:

are two features of the dataset

n:

is the number of rows in X

\({\bar{X}}\):

is mean of X

\({\bar{Y}}\):

is mean of Y

\(P_{XY}\):

is Pearson’s correlation coefficient between X and Y.

A heatmap is plotted in Fig. 7 based on the calculated correlation values between different features. The correlation map shows how much two features are linearly correlated. The possible range of values for the correlation coefficient is \(-1\) to 1. A value equal to zero or close to zero indicates that there is no relation between two features. A value equal to 1 or \(-1\) or close to it indicates that both features are highly correlated.

Fig. 7
figure 7

Heatmap based on features correlation

Mutual Information Mutual information is calculated between an independent feature and a dependent feature to determine the information gain value, which measures the dependency of a dependent feature on an independent feature. Features with a higher value of information gain can be included in the minimal set of features on the ML algorithm. We used Mutual_info_classif to determine the MI value for each feature concerning the classification output. Figure 8 depicts the graph plotted from the calculated mutual information gain. The range of MI values can be between 0 and 1. A high MI score of a feature means it has a closer connection with the target feature and including it in the training set can be helpful for the classification model.

Fig. 8
figure 8

Graph plotted from mutual information gain

figure b

3.2 Cluster Formation, Ranking, and Best Cluster Identification

Once the features were ranked from the techniques explained above, we needed to determine the minimal features from these selected features. The output of the data preprocessing and feature selection phase was a list of features with their ranking from highest to lowest. The highest-ranked features had a high potential for the minimal set of features for our detection framework. It is essential to identify high-ranked features that are good together to improve the model’s accuracy. In the next step, we clustered the datasets so that highly ranked features obtained more weight in the clustering process. The pseudo-code is given by Algorithm 2. The output of ClusterFormation is represented in Fig. 9 when n features were called one by one. We call ClusterFormation module \(n-2\) times where n is total number of features. First, we call it for three features, then for four features, and so on until it reaches the last feature. Each time when we call ClusterFormation, it generates three subclusters. The accuracy of all these subclusters is calculated in next stage, and we select the best-performing cluster only.

Fig. 9
figure 9

Data preprocessing module

We calculated the accuracy of each subcluster and selected the best-performing cluster for the next step. Table 1 shows details of the best cluster from the set of subclusters.

Table 1 Best clusters from each subcluster

The best-performing cluster with two features was selected for one of the modes of VMFCVD. For another mode, we chose the best-performing cluster from the rest of the cluster sets. Table 2 gives information about the fastest cluster and cluster that can give the highest accuracy.

Table 2 Fastest and highest performing clusters

3.3 Voting-based Volumetric DDoS Detection Framework

Factors such as dataset size and deployment environment extensively affect ML model performance. It is not advisable to depend on a single ML algorithm when developing a framework. Although a ML algorithm can perform exceptionally well on a particular dataset with high accuracy and precision, there is no guarantee that it will perform well on all datasets [24]. VMFCVD relies on the triad of fast detection mode (FDM), defensive fast detection mode (DFDM), and high accuracy mode (HAM) models. The implementation of this triad ensures that the system will defend itself against DDoS attacks more efficiently. We considered five well-known ML models to classify network traffic: AdaBoost classifier, bagging classifier, gradient boosting classifier, K-neighbors classifier, and random forest classifier. The result was calculated based on their voting. Each ML algorithm has a different voting right based on its performance calculated during its training time. VMFCVD is highly adaptive. It has three different approaches to classify cyber-attacks and switches between these approaches based on the network traffic.

figure c
Table 3 Data reduction in FDM
Table 4 Improvement in DFDM over FDM

Calculation of Vote All the ML models were trained on the training dataset. Once trained, the accuracy and their predictions were calculated on the test dataset. We calculated the weight of each ML algorithm based on its accuracy. Once the weight was calculated, we multiplied the prediction of each ML algorithm by its weight for each record for every ML model’s prediction. The resultant weighted prediction was used to calculate the voting for a packet to determine whether it was benign or malicious. Algorithm 3 is a master algorithm to perform the vote. All three modes of VMFCVD call it and perform the voting based on their voting criteria.

3.3.1 Fast Detection Mode (FDM)

As shown in Fig. 4, if we increased the feature size, the training time also increased, and the framework used more computational power. Considering this, we trained FDM with the minimum possible number of features set. Taking only two features from the network traffic enables FDM to classify more network traffic. FDM used a highly dimensionally reduced dataset provided by our dimensionality reduction module. Table 3 shows the percentage of data reduction for all the datasets we used during the experiment. Keeping the minimum number of features in the training and test datasets helped VMFCVD process the maximum number of requests compared to the mode where we considered more features for model training and testing. FDM was active all the time, even when HAM was running. During this time, FDM trained itself with all the incoming flows. VMFCVD automatically switched to FDM when it observes considerably high network traffic. HAM suspends during this time. Suspending HAM helps FDM obtain the maximum amount of resources for attack detection.

Table 5 Accuracy improvement in HAM over FDM
Table 6 Performance of ML models on various datasets

3.3.2 Defensive Fast Detection Mode (DFDM)

This is the extended version of the FDM. DFDM uses FDM’s voting module, but it has voting calculation techniques. Even if a single vote is against a packet, that packet is classified as a malicious packet. Only the packets that have zero probability of being malicious are classified as benign packets. This mode activates when the network experiences an attack with a very high volume of incoming data. DFDM increases the accuracy of detecting malicious packets but decreases the accuracy for detecting benign packets. VMFCVD switches to DFDM mode when the DDoS attack intensity is very high. The increase in accuracy of detecting malicious packets in DFDM over FDM is given in Table 4.

Table 7 Detailed information about the datasets used for experiment
Fig. 10
figure 10

Distribution of benign and malicious records

3.3.3 High Accuracy Mode (HAM)

This mode gives the highest accuracy. HAM may require a higher number of input features for training, which increases computational overhead. This mode activates when the network is stable, and the number of incoming packets is average. This model is intended to give the highest accuracy on any dataset. Table 5 shows the improvement in HAM’s accuracy over FDM. The cluster that outperformed the others during feature selection time with the highest accuracy is considered input for this model.

Table 8 Comparison between FDM, DFDM, and HAM

4 Experimental Results and Discussion

All experiments are carried out on a 64-bit Window 10 Pro operating system equipped with a 2.70 GHz Core i7 processor, 16 GB RAM. VMFCVD is implemented in Python 3.8.8 using Jupyter Notebook. The machine learning classification models used are the AdaBoost classifier, bagging classifier, gradient boosting classifier, K-neighbors classifier, and random forest classifier. These classifiers are selected based on their diverse performance on the various datasets. Table 6 shows the diversity in their performance; if an ML model gives the highest accuracy for one dataset, the same model gives the lowest accuracy for another. The cells highlighted show the maximum performance of an ML model. AdaBoost gives the highest average performance, while bagging gives the best performance on five occasions. The lowest performance of bagging and GradientBoost is recorded three times.

4.1 Benchmark Datasets

Our framework has experimented with twelve labeled datasets containing benign and malicious network traffic. Most of the datasets we have considered have been generated in recent years and include new kinds of attacks. Table 7 gives detailed information on all the datasets used in this work. For DDoS attack detection, we considered CICIDS2017 DDoS, CSE-CIC-IDS2018 DDoS, and CICDDOS2019 DDoS. The CICIDS2017 BoT, CSE-CIC-IDS2018 BoT, NBaIoT2018 Mirai, UNSW2018 BoTIoT, and UNSW NB15 datasets were considered for the botnet attack. The CICIDS 2017 BoT and CICIDS 2017 DDoS dataset are subsets of the CICIDS2017 [25] dataset created by the Canadian Institute for Cyber-security. The CICIDS 2017 BoT has 99% benign and 1% malicious flows, while the CICIDS 2017 DDoS has 43% benign and 57% malicious flows. The CSE-CIC-IDS2018 BoT and DDoS are subsets of the CSE-CIC-IDS2018 [25] dataset. CSE-CIC-IDS2018 is an updated version of the CICIDS2017 dataset. We considered the DNS, LDAP, SSDP, and SYN datasets, a subset of the CICDDOS2019 [26] dataset. CICDDoS2019 has a large number of malicious records with only a few benign records. CICDDoS2019 DNS and 0.067% benign records, CICDDoS2019 LDAP have 0.074%, CICDDoS2019 SSDP have 0.029% and CICDDoS2019 SYN have only 0.025% benign records.

Fig. 11
figure 11

Dimensionality reduction in FDM, DFDM, and HAM

Table 9 Accuracy of FDM over ML models

DoHBrw-2020 is the most recent dataset generated by the Canadian Institute for Cybersecurity. It contains a hybrid of modern benign and malicious network traffic [27]. NBaIoT2018 Mirai is a subset of the NBaIoT2018 dataset collected from nine infected IoT devices [28]. UNSW2018 BoTIoT is extracted from the UNSW BoTIoT dataset that incorporates legitimate IoT network traffic [29]. The UNSW-NB15 dataset is a well-structured dataset to evaluate cyber-attack detection systems created at the University of New South Wales in 2015 [30]. Figure 10 depicts the distribution of benign vs. malicious records.

4.2 Evaluation Metrics

All three detection modes of VMFCVD are evaluated based on the four evaluation metrics, viz. accuracy, precision, sensitivity, and F1 score. The equation for the same is given below:

$$\begin{aligned} {\textit{Accuracy}}= & {} \frac{{\textit{TP}}+{\textit{TN}}}{{\textit{TP}}+{\textit{FP}}+{\textit{FN}}+{\textit{FP}}} \end{aligned}$$
(3)
$$\begin{aligned} {\textit{Precision}}= & {} \frac{{\textit{TP}}}{{\textit{TP}}+{\textit{FP}}} \end{aligned}$$
(4)
$$\begin{aligned} {\textit{Sensitivity}} ({\textit{Recall}})= & {} \frac{{\textit{TP}}}{{\textit{TP}}+{\textit{FN}}} \end{aligned}$$
(5)
$$\begin{aligned} F1\, {\textit{Score}}= & {} 2 * \frac{{\textit{Precision}} * {\textit{Recall}}}{{\textit{Precision}} + {\textit{Recall}}} \end{aligned}$$
(6)

4.3 Experimental Result

The analysis of the experimental results is divided into three sections. In the first section, we will compare all three modes of VMFCVD. In the second section, VMFCVD is compared with ML models’ performance. In the last section, we will compare the performance of VMFCVD with state-of-the-art baselines.

4.3.1 Performance Analysis Between FDM, DFDM, and HAM

These three modes are designed to detect an attack in different scenarios that primarily affect these modes’ performance. HAM outperformed FDM and DFDM when comparing accuracy, sensitivity, and F1 score. Considering the extensive dimensionality reduction in FDM, the accuracy of FDM was quite competitive to HAM. The precision of DFDM outperformed FDM for all datasets and topped HAM in many cases. The highlighted cells in Table 8 show the outperformed values.

Table 10 Accuracy of DFDM over ML models for detecting malicious packets
Table 11 Accuracy of HAM over ML models
Fig. 12
figure 12

Accuracy (average) of VMFCVD over ML models

Table 12 Accuracy of VMFCVD over state-of-the-art baselines

Figure 11 shows the dimensionality reduction in various datasets. The minimum dimensionality reduction is 85.4 on the UNSW NB15 dataset for HAM. The maximum dimensionality reduction we achieved was 98.2%, which was on the NBaIoT2018 Mirai dataset for FDM. The average reduction was 97.03% for both FDM & DFDM and 91.8% average for HAM. The overall dimensionality reduction we achieved for VMFCVD was 95.28

4.3.2 Performance Comparison of VMFCVD with ML Models

The accuracy of VMFCVD’s FDM outperformed the ML algorithms used in this work. The VMFCVD’s performance, shown in Table 9, shows that VMFCVD outperforms ML algorithms on all the datasets experimented on within this work. However, on eight occasions, ML models reached the accuracy of VMFCVD.

Table 10 shows the accuracy of detecting malicious packets when experimenting with the cluster of FDM. In this performance comparison, DFDM dominates ML models most of the time.

The accuracy of VMFCVD’s HAM outperformed the ML algorithms used in this work. The VMFCVD’s performance in Table 11 shows that it outperformed ML algorithms on all the datasets experimented on within this work. However, seven out of 110 comparisons, one of the ML models reached the performance of VMFCVD.

4.3.3 Performance Comparison of VMFCVD with State-of-the-art Baselines

In this section, we compare the accuracy of all three modes of VMFCVD with state-of-the-art baselines. Table 12 shows that VMFCVD outperformed other studies in terms of classification accuracy. HAM topped all different modes of VMFCVD and recent ML models, as shown in Table 12. Despite the significant dimensionality reduction in FDM, the performance of FDM is better in most of the recent studies. Although DFDM is dedicated to detecting malicious packets efficiently, its overall accuracy is comparable to recent studies. It was better in most cases, as shown in Table 12.

Figure 12 depicts the performance comparison of VMFCVD over ML models. The accuracy of ML models ranged from 43% to 100%. The accuracy of VMFCVD was always above 98.7%, with an overall average of 99.82%.

5 Conclusion and Future Work

DDoS attacks continue to undermine the availability of online services despite the enormous effort made by researchers and industries to defend them. Existing systems suffer from high processing overhead and are validated on a limited number of datasets. In this work, the extensive work on feature selection and then systematic approach to the formation and selection of best clusters gives a dimensionally reduced high-quality dataset created from noisy datasets for various VMFCVD modes. VMFCVD takes low processing overhead when the server is under attack. Extensive experiments were performed to evaluate its effectiveness. It is evaluated on the CICIDS2017 (BoT & DDoS), CSE-CIC-IDS2018 (BoT & DDoS), CICDDoS2019 (DNS, LDAP, SSDP & SYN), DoHBrw2020, NBaIoT2018 (Mirai), UNSW2018 BoTIoT, and UNSW NB15 datasets. The experimental results show that VMFCVD outperformed other studies in terms of classification accuracy. The extent to which we have reduced the dataset is maximal compared to any of the previous studies. In some cases, VMFCVD reduced the dataset by 98.2%, maintaining an accuracy of 99.99%. As of future work, we plan to create a generic DDoS and botnet dataset that can be used to train the model when implemented on a live server. We plan to include a module to identify and block devices if multiple malicious network traffic come from them.