Abstract
In present times, data science become popular to support and improve decisionmaking process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizerbased deep neural networks (ADODNN), named CIDDADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADObased hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.
Similar content being viewed by others
Introduction
With the progressive technical advancements, numerous data streams are produced robustly in recent times. Some of the latest technologies are sensor networks, spam filtering models, traffic management, and intrusion prediction [1]. Certainly, a data stream \(S\) is meant to be potentially uncovered, and sequential instances are frequently derived with greater speed. The major limitation in data stream learning is to resolve the concept drift, the principle behind this model should be drifted in a dynamic fashion. Usually, the concept drift exists in realtime application. For instance, in recommend systems (RS), user priorities might be changed on behalf of trend, finance, and various other external factors. Also, the climate detection models are modified according to the seasonal change in the environment. This modification intends to degrade the classification process. Hence, a classifier applied in this study must be eligible to examine and get adopted to these alterations. The main theme of this work is to develop a classifier learning module to mine the streaming data in dynamic platforms effectively.
Concept drift can be classified on the basis of speed, as sudden and gradual drifts, as shown in Fig. 1. Here, sudden concept drift is represented by the massive changes from basic class distribution as well as the incoming samples within the time duration. Second, the gradual concept drift is a timeconsuming process and represents the change in differences of fundamental class distributions between previous and new instances. Obviously, the type of change is not considered, and it has to be applicable to observe and track the changes. It is general that realtime data streams can appear in future. It shows an exclusive type of drift called recurring concepts. For instance, news reading desire of a user might be changed immediately. People may have different thoughts on weekends, mornings, and evenings. Additionally, user explores the astrology articles in newyear and economical articles for each quarter. But some models have been applied. Sometimes, the classifiers which are employed in past might also be applied in future. Therefore, the traditional works on drift prediction ignore the phenomenon and intend to consider the concept as new one. Because of the statement in drift prediction, it captures the changes in data streams and upgrades the prediction approach to maintain higher accuracy.
Overview of concept drift
Concept drift exists when the target is modified in limited time period. Assume two target concepts A and B, and sequence of samples \(I~ = ~\left\{ {i_{1} ,~i_{2} ,~ \ldots ,~i_{n} } \right\}\). In prior to instance id, target concept is not modified and remains in A. Afterwards, ∆x is a concept which is stable under diverse concept called B. Hence, concept is drifting among the sample \(i_{{d + 1}} ~~\) and \(i_{d} + \Delta x\), and replace concept target A for B. Based on the efficiency of a drift (∆x), a modification is may allocated as gradual in drifts which is slow from two concepts, while in abrupt drifts the change occurs suddenly.
The concept drift models are defined in three distinct ways such as windowrelated, weightrelated, and ensemble of classification models. Initially, the windowrelated methods attempt in selecting samples from dynamic sliding window while a weightbased method weights the samples and removes the former ones according to the weights. Third, the ensemble classification provides different classification models and integrates them to accomplish final and effective classification model. The sample count is considered in training phase. The concept drift handling methodologies are classified as online approach: it upgrades the classifier after getting the instance while the batch approach spends time to receive massive instances to start learning process. Followed by, learning approach gains the streams of data and divide into batches. Few models are used for dealing with stream of batches as shown in the following: Fullmemory: a learner applies classical training samples (batches), Nomemory: applies the current batch for training process and finally, Windowoffixed size n: applies the n most current batches. Here, windowbased model with fixed window size (n = 10) has been applied.
Problem formulation
Assume input data stream gathered from \(n\) sources \(\left( {{\text{So}}_{i} } \right)\) are referred as \({\text{So}}_{{\text{l}}}\), \({\text{So}}_{2} ,~So_{3} ,~ \ldots ,~~{\text{So}}_{n}\). A source i produces k streams \(\left( {{\text{So}}_{{ik}} } \right)\), i.e., \({\text{So}}_{{{\text{i}}1}} ,{\text{~So}}_{{{\text{i}}2}} , \ldots ,~{\text{So}}_{{ik}}\). The samples from these sources make complete streaming data \({\text{USo}}_{{\text{i}}} = {\text{So}}\). The central premises of data preprocessing method are to declare the storage of reservoir \(S_{{\text{R}}}\) for stream data \(So\) from \(n\) sources. The two factors are significant for examination of statistical reservoir size for complete stream data. Hence, the degree of disparity in stream data shows the difference in count of samples distributed for every source. A maximum degree of disparity leads to minimum confidence interval which is possible for estimating the correct value [2].
where in Eq. (1), \(\left {S_{{\text{R}}} } \right\) implies the overall sample size and \({\text{N}}\) shows the overall population. In addition, \(e\) indicates the confidence interval. For low confidence intervals, the data sampling method decides maximum number of instances. Else, a minimum number of samples are essential to show the complete stream data. Once the sampling process is applied, the two class problems are constant in stream data classification. Assume the online ensemble classifier \(\Theta\) which receives novel instance \(x_{t}\) at \(t\) time, and detected class label is \(y_{{\text{t}}}^{\prime }\). When the prediction is computed, a classifier receives desired label of \(x_{t}\) as \(y_{t}\). Therefore, predicted and desired label allocates \(\left\{ {1,{\text{~}}  1} \right\}\). The result of ensemble classifier \(\Theta\) is divided into four classes namely,

1.
True positive if \(y_{t} = y_{t}^{\prime } = 1\)

2.
True negative if \(y_{t} = y_{t}^{\prime } =  1\)

3.
False positive if \(y_{t} =  1;y_{t}^{\prime } = 1\)

4.
False negative if \(y_{t} = 1;y_{t}^{\prime } =  1\)
Based on the above measures, the ensemble classification accuracy has been evaluated for minority and majority class instances. Therefore, imbalance factor is quantified with the help of occurrence possibility of minority classes. Because of the imbalance in distribution of samples between majority and minority class instances, the classifier performance gets degraded.
Paper contributions
Learning from data streams (incremental learning) has significantly attracted the research communities owing to several issues and realtime applications. The concept drift detection is a strategy while the changes in data distribution make recent prediction method as inaccurate. The stream data classifier with no concept drift adaptation is not desirable to classify imbalance class distribution. Therefore, this paper designs a novel class imbalance with concept drift detection (CIDD) using Adadelta optimizerbased deep neural networks (ADODNN), named CIDDADODNN model to classify highly imbalance data streams. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data. In addition, the adaptive sliding window (ADWIN) technique is applied for the recognition of concept drift in the applied streaming data. At last, ADODNN model is utilized for the classification processes. For ensuring the classifier results of the CIDDADODNN model, three streaming datasets are used namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset.
In short, the contribution of the paper is listed as follows:

Develop a new CIDDADODNN model to classify highly imbalance data streams.

Employ ADASYN technique for handling class imbalance data and ADWIN technique for the recognition of concept drift in the applied streaming data.

Lastly, ADODNN model is utilized for the classification processes.

Validate the performance of the CIDDADODNN model, three streaming datasets.
Literature survey
Mostly, the big data streaming domains suffer from problems like class imbalance as well as concept drift. The classical sampling models make use of two modules to overcome the above defined problems like resampling and similarity methodologies. Resampling is one of the effective schemes at the data level. Some of the resampling approaches manage the data distribution by applying deterministic frameworks [3]. The remarkable approaches are used in selecting the instances from frequently incoming data stream under the application of sampling with and without replacement. Also, sampling with alternative has been applied when there is a requirement of fixed sample size while sampling with no replacement can be utilized for the applications. The traditional approaches are not suitable in sample adequacy with no redundancy, and secondary technique is not applicable for substreams which refers to diverse patterns.
In Wu et al. [4], Dynamic Feature Group Weighting framework with Importance Sampling (DFGWIS) aims resolving the issues of concept drift and class imbalance. Hence, the weighted ensemble undergo training on the feature group that is extracted randomly. It refers that the minority classes remain same; however, the minority class instances in previous window are dissimilar to present classes. Additionally, solutions of irregular class distribution by applying classical samples are not applied to concept drift significantly. Hence, the sampling approaches in Cervellera and Macciò [5] use the recursive binary partition across input samples and decides the instance showcasing the entire stream. Hence, the greedy optimality as well as explicit error bound are applicable in managing the problems related to concept drift.
The adaptive sampling approach [6] on irregular data streams takes place under the application of repeatable and scalable prediction approaches. Hence, a predictive method has been developed if the data are imbalanced minimally. If the data are imbalanced heavily, then it activates a data scan by enough minority instances. Therefore, the major constraints of this model are that it is implemented with accurate reservoir and does not assume the worst case optimality. To overcome these problems, stream sampling as well as continuous random sampling make use of overlap independence. By the integration of density and distance metrics, the DENDIS implies the matrix from [7] to retain the semantic coherence.
The Gmeans Update Ensemble (GUE) in [8] tries to resolve the predefined issues. To manage the imbalanced class distribution, it employs the oversampling operation and applies weighting frameworks to handle the concept drift. A static threshold measure is not applicable to resolve the imbalanced class distribution. The Gradual Resampling Ensemble (GRE) method has been developed by Ren et al. [9] to overcome these problems. It has exploited resampling scheme for previously received minority classes and amplifies the present minority class labels. The DensityBased Spatial Clustering of Applications with Noise (DBSCAN) is utilized for identifying the disjunct and eliminate the influence of disjunct on resemblance estimation. It helps GRE to apply the novel samples. Similarly, efficient learning the nonstationary imbalanced data stream has been projected in Meenakshi et al. [10]. It tries to limit the misclassified samples with the aid of twoclass issues. It develops several blocks of chunk and a chunk training, while testing is processed by classification model. Therefore, severe problems have been experienced by multiclass classification.
In Iwashita et al. [11], the popular spiking neural networks are introduced to learn the data streams through online. The major objective of this work is to reduce the neuron repository size and to make use of the benefits of data reduction models and compressed neuron learning capability. The KnowledgeMaximized Ensemble (KME) in [12] unifies the online as well as chunk relied ensemble classification models to resolve different concept drift problems. The application of unsupervised learning techniques and saved recurrent models enhance the knowledge applied in stream data mining (DM). As a result, it enhances the accuracy of data classification. Though several works are existed in the literature, the classification of concept drift solution is considerably affected by class imbalance data. The sampling approaches are commonly employed for processing the incessantly incoming data stream with an adequate sample count. The chosen samples have constructed a statistical inference for supporting imbalance class distribution. The stream data classifier with no concept drift adaptation is not desirable to classify imbalance class distribution.
The proposed CIDDADODNN model
The working principle involved in the presented CIDDADODNN model is depicted in Fig. 2. Primally, data preprocessing takes place to transform the raw streaming data into a compatible format for further processing. Next, the ADASYN technique is applied for handling the class imbalance. Followed, next, a drift detection technique called ADWIN is employed to detect the existence of the concept drift. At last, the ADODNN model is applied to determine the class label of the streaming data which incorporates the ADO to tune the hyperparameters of the DNN model.
Data preprocessing
At the initial stage, preprocessing of the raw streaming data takes place in three ways such as format conversion, data transformation, and chunk generation. First, the online streaming data in any raw format are converted into the required.csv format. Second, the data transformation process alters the categorical values in the data to numerical values. Third, the streaming dataset in any size is divided into a number of chunks for further processing.
ADASYN based class imbalance data handling
The ADASYN model receives the preprocessed data as input and executed the ADASYN model to handle the class imbalance. It makes use of a weighted distribution for dissimilar minority class instances based on the learning levels of difficulty [13]. It generates distinct synthetic instances for the minority classes based on the distribution. Due to the popularity of synthetic models like synthetic minority oversampling technique (SMOTE), SMOTEBoost, and DataBoostIM has been introduced. It performs the learning from imbalanced data sets. Hence, objective is twofold: limiting the bias and adaptively learning. Hence, the newly developed model for twoclass classification problem is defined below:
Input: Training data set \(D_{{{\text{tr}}}}\) with \(m\) instance \(\left\{ {{\text{x}}_{{\text{i}}} ,{\text{~}}y_{i} } \right\},\) \(i = 1,m\), in which \(x_{i}\) is a sample in the \(n\) dimensional feature space \(X\) and \(y_{i} \in Y = \left\{ {1,{\text{~}}  1} \right\}\) is a class identity label related to \(x_{i}\). Describe \(m_{{\text{s}}}\) and \(m\iota\) as count of minority class samples and count of majority class instance, correspondingly. Hence, \(m_{{\text{s}}} \le m_{{\text{l}}} ~\) and \(m_{{\text{s}}} + m_{{\text{l}}} = m.\)
Procedure
(1) Estimate the degree of class imbalance:
where \(d \in\) \((0\), 1].
(2) When \(d < d_{{{\text{th}}}}\) and \((d_{{{\text{th}}}}\) is a current threshold for highly tolerated degree of class imbalance ratio):
(a) Estimate the number count of synthetic data samples which has to be produced for minority class:
where \(\beta \in\) [0, 1] defines a parameter applied to specify the required balance level when the synthetic data is generated. \(\beta\) \(=\) \(1\) defines a completely balanced data set is deployed after generalization process.
(b) For all example \(x_{i} \in\) minority class, identify \(K\) nearest neighbors (kNN) dependent on Euclidean distance in \(n\) dimensional space and estimate the ratio \(r_{i}\) described as:
where \(\vartriangle _{i}\) implies the count of samples in \({\text{kNN}}\) of \(x_{i}\) which comes under the majority class, hence \(r_{i} \in \left[ {0,{\text{~}}1} \right]\).
(c) Generalize \(r_{i}\) based on the \(\hat{r}_{i}\) \(= r_{i} /\mathop \sum \limits_{{i = {\text{l}}}}^{{m_{{\text{s}}} }} r_{i}\), thus the \(\hat{r}_{i}\) refers the density distribution \(\left( {\mathop \sum \limits_{i}^{~} \hat{r}_{i} = 1} \right)\).
(d) Estimate the count of synthetic data samples generated for a minority class \(x_{i}\):
where \(G\) shows the overall count of synthetic data samples to be emanated for minority class as described in Eq. (3).
(e) For a minority class data sample \(x_{i}\), produce \(g_{i}\) synthetic data samples on the basis of given steps.
Create a Loop from 1 to \(g_{i}\):
(i) Select the minority data sample randomly, \(x_{{zi}} ,\) from kNN for data \(x_{i} .\)
(ii) Produce the synthetic data instance:
where \(\left( {x_{{zi}}  x_{i} } \right)\) defines the difference vector in \(n\) dimensional spaces, and \(\lambda\) refers the random value: \(\lambda \in \left[ {0,~1} \right].\)
End Loop.
ADWINbased drift detection
The application of ADASYN model balances the dataset effectively and then drift detection process gets executed by the use of ADWIN technique [14]. In this study, windowbased approach is employed for drift detection with the window of fixed size (n = 10).
Bifet [15] presented an ADWIN technique, which is eligible for data streams with sudden drift. It has applied a sliding window \(W\) with currently reading samples. The major principle of ADWIN is listed in the following: when two huge subwindows of \(W\) imply distinct enough averages, then the desired values are varied and existing portion of a window has been lost. The statistical hypothesis states that: “the average \(\mu _{t}\) is an ideal constant in \(W\) with confidence \(\delta\)”. The pseudo‐code of ADWIN is shown in Algorithm 1. The major portion of algorithm is definition of \(\varepsilon _{{{\text{cut}}}}\) and it is sampled. Assume \(n\) is a size of \(W\), and \(n_{0}\) and \(n_{{\text{l}}}\) be the sizes of \(W_{0}\) and \(W_{1}\) finally, the \(n = n_{0} + n_{1}\). Suppose \(\mu _{{\hat{W}_{0} }}\) and \(\mu _{{\hat{W}_{1} }}\) is an average of the values in \(W_{0}\) and \(W_{1} ,\) and \(\mu _{{W_{0} }}\) and \(\mu _{{W_{1} }}\) are desired measures. Thus, the value of \(\varepsilon _{{{\text{cut}}}}\) is presented in the following:
where \(m = \frac{1}{{1/n_{0} + 1/n_{1} }}\), and \(\delta ^{\prime} =\) \(\frac{\delta }{n}.\)
The statistical test represented in pseudo‐code verifies the average in subwindows is varied by threshold \(\varepsilon _{{{\text{cut}}}}\). A threshold is measured with the help of Hoeffding bound and provides formal assurance of fundamental classifier’s function. The phrase “holds for each split of \(W\) into \(W = W_{0} \cdot W_{1}\)” refers that every pair has to be verified while \(W_{0}\) and \(W_{{\text{l}}}\) are developed by dividing \(W\) into two portions. Hence, researchers have presented an improvement model to identify the optimal cutpoint significantly. Therefore, actually presented ADWIN models are lossless learners, hence window size \(W\) grows uncertainly when there is no drift. It is enhanced simply by inclusion of parameters which reduces the windows maximal size.
ADODNNbased classification
Once the ADWIN technique has identified the concept drift, the trained model gets updated and then the classification process gets executed. By doing this, the classification results can be significantly improved. When the concept drift does not exist, then the classification process using ADODNN is straightaway performed instead of model updating process. The ADODNN has the ability to determine the actual class labels of the applied data and the application of ADO helps to attain improved classification performance.
Here, a DNNbased model is presented by applying stacked autoencoders (SAE) for concept drift classification to enhance the estimation measures. The DNN classifier in concept drift dataset has been developed under the application of SAE and softmax layer [16]. A dataset is comprised attributes and class variables which are defined in the following. Figure 3 illustrates the structure of DNN model. The parameters are induced as input for the input layer. Generally, DNN is developed by two layers of SAE. A network is composed of two hidden layers with neurons. Additionally, a softmax layer is attached with final hidden layer to perform the classification task. Hence, the output layer provides the possibilities of class labels for applied record.
Suppose N input vectors are considered for training the AE as \(\left\{ {x_{{\left( 1 \right)}} ,{\text{~}}x_{{\left( 2 \right)}} \ldots x_{{\left( N \right)}} } \right\}\). The reformation of input is processed by training AE as:
which is represented as:
where \(f_{{{\text{AE}}}}\) implies the function that maps input into output as AE.
Followed by, AE undergoes training with the reduction of appropriate objective function that is applied by total error function as:
where \(E_{{{\text{MSE}}}} {\text{,}}\) \(E_{{{\text{Reg}}}} ,\) \(E_{{{\text{sparsity}}}}\) implies the mean square error (MSE), regularization factor as well as sparsity factor correspondingly. An MSE, \(E_{{{\text{MSE}}}}\) is determined by:
where \(e_{i}\) shows the error, which implies the difference among original output, \(x\left( i \right)\) and observed output, \(x^{\prime}\) (i). Hence, the error \(e_{i}\) is determined as:
Deep networks are used in learning the point in training data and results in overfitting issues. To resolve the problem, regularization factor, \(E_{{{\text{Reg}}}}\) has been assumed in objective function to be estimated using the given expression:
where λ means the term for regularization of a method. Sparsity limitation enables a method for learning the essential features from data. Sparsity factor \(E_{{{\text{sparsity}}}}\) is evaluated by:
where β denotes a sparsity weight term as well as \({\text{KL}}(\rho {\text{}}\rho _{j} {\text{)}}\) defines Kullback–Leibler divergence as projected by:
where sparsity constant is shown by ρ that implies average activation value of jth neuron that is measured by:
where \(f^{j} \left( {x_{{\left( i \right)}} } \right)\) signifies the activation function of jth neuron in a hidden layer of AE. Under the application of AE, cascading encoder layers. Recalling the mapping of AE in Eq. (6) and SAE is described as:
where the SAE function is implied as \(f_{{{\text{SAE}}}}\). In every layer of SAE, encoder function has been employed. It is apparent that a decoder function is absent in each layer.
Softmax classifier is defined as a multiclass classifier which applies Logistic Regression (LR) that is used in data classification. It has applied supervised learning mechanism that applies extended LR to categorize several classes. Therefore, LR depends upon this classification model. In multiclass classifier issues, softmax classifier evaluates the possibility of a class with classified data. Therefore, sum of possibilities in all classes might be 1. Also, it performs normalization and exponentiation to find the class probabilities. A function \(f_{{{\text{SC}}}}\) is connected with SAE. When the layers are trained, upcoming process of training the model is named fine tuning. It is the last step in classification process that is applied to enhance the model performance. To reduce the classification error, it is finetuned with supervised learning. Using the training data set, complete set of networks is trained as same as training process of multilayer perceptron (MLP). Here, the encoder portion of AE has been applied.
ADObased parameter tuning
The deep learning (DL) based optimizers have a predefined learning rate by default [17]. But in practical cases, the DL models are nonconvex problems. To determine the effective learning rate of the DNN model, ADO is applied which computes the learning rate in such a way to attain maximum classification performance. Adadelta was developed by Zeiler [18]. The main aim of this model is circumventing Adagrad’s vulnerability with drastic reduction in learning rate produced by the collection of the previously squared gradients in a denominator. The Adadelta measures the learning rate using the current gradients processed within the limited time period. Also, the Adadelta applies the accelerator by considering previous updates and Adadelta update rule is given below:

The gradient \(E^{{\left( t \right)}}\) is computed.
$$\begin{aligned} E^{{\left( t \right)}} & = \frac{{\delta l\left( {\hat{X}^{{\left( t \right)}} } \right)}}{{\delta \hat{X}^{{\left( t \right)}} }} \\ & = \left( {1  H} \right) \odot \left( {\hat{X}^{{\left( t \right)}} \cdot \left( {\left( {\hat{X}^{{\left( t \right)}} } \right)^{T} \cdot \hat{X}^{{\left( t \right)}} + \in \times I} \right)^{{  0.5}} } \right), \\ \end{aligned} $$(18) 
The local average \(\tilde{G}^{{\left( t \right)}}\) of existing value is determined \(\left( {E^{{\left( t \right)}} } \right)^{2}\)

New term accumulating updates are estimated (momentum: acceleration term)
$$ Z^{{\left( t \right)}} ~ = \rho \times Z^{{\left( {t  1} \right)}} + \left( {1  \rho } \right){\text{~}} \times {\text{~}}\left( {W^{{\left( {t  1} \right)}} } \right)^{2} ,~ $$(19) 
Finally, the update expression is applied below.
$$ W^{{\left( t \right)}} {\text{~}} = \frac{{\sqrt {Z^{{\left( t \right)}} + \varepsilon ~ \times ~I} }}{{\alpha \sqrt {\tilde{G}\left( t \right)~ + ~\varepsilon ~~ \times ~I} }}~ \odot E^{{\left( t \right)}} ,~ $$(20)
Performance validation
For examining the detection performance of the CIDDADODNN model, a series of simulations were carried out using three benchmark datasets namely KDDCup99 [19], Spam [20], and Chess dataset [21]. The details about the dataset are given in Table 1. The first KDDCup99 dataset includes 42 features with a total number of 125,973 instances. Then, the second Spam dataset contains 58 features with a total number of 4601 instances. Third, the Chess dataset comprises nine features with a total number of 503 instances. For experimentation, tenfold cross validation is used to split the dataset into training and testing datasets. Figures 4, 5 and 6 visualizes the frequency distribution of the instances under distinct attributes on the applied three datasets. Besides, the snapshots generated at the time of simulations are provided in “Appendix”.
Table 2 provides the outcome of the ADWIN technique for class imbalancement. The table values denoted that the initial 125,973 instances in the KDDCup99 dataset are balanced into a set of 134,689 instances. Similarly, on the applied Spam dataset, the actual 4601 instances are balanced into a set of 5457 instances. Third, on the Chess dataset, the available 503 instances are increased into 616 instances by balancing it.
Figure 7 shows the ROC curve generated by the ADODNN and CIDDADODNN models on the applied KDDCup'99 dataset. Figure 7a depicts that the ADODNN model has resulted in a maximum ROC of 0.95. Likewise, Fig. 7b illustrates that the CIDDADODNN model has also accomplished effective outcomes with a high ROC of 0.98.
Figure 8 depicts the ROC curve generated by the ADODNN and CIDDADODNN models on the applied Spam dataset. Figure 8a illustrates that the ADODNN model has resulted in the highest ROC of 0.95. Likewise, Fig. 8b shows that the CIDDADODNN model has also accomplished effective results with a high ROC of 0.98.
Figure 9 demonstrates the ROC curve generated by the ADODNN and CIDDADODNN methodologies on the applied Spam dataset. Figure 9a showcases that the ADODNN model has resulted in a superior ROC of 0.67. Likewise, Fig. 9b illustrates that the CIDDADODNN method has also accomplished an effective outcome with a high ROC of 0.85.
Table 3 tabulates the classification results attained by the ADODNN and CIDDADODNN models on the applied three datasets. Figure 10 portrays the analysis of the results obtained by the ADODNN and CIDDADODNN models on the test KDDCup99 dataset. The figure demonstrated that the ADODNN model has resulted in a precision of 0.9311, recall of 0.9330, specificity of 0.9207, accuracy of 0.9273, and F score of 0.9320. At the same time, the CIDDADODNN model has exhibited considerably better outcomes over the ADODNN model with a higher precision of 0.9628, recall of 0.9552, specificity of 0.9631, accuracy of 0.9592, and F score of 0.9590.
Figure 11 implies the analysis of the results attained by the ADODNN and CIDDADODNN methods on the test Spam dataset. The figure depicted that the ADODNN scheme has resulted in a precision of 0.9346, recall of 0.8917, specificity of 0.9040, accuracy of 0.8965, and F score of 0.9126. Meantime, the CIDDADODNN approach has implemented moderate outcome over the ADODNN model with maximum precision of 0.9272, recall of 0.9408, specificity of 0.9228, accuracy of 0.9320, and F score of 0.9340. Figure 12 displays the analysis of the results accomplished by the ADODNN and CIDDADODNN approaches on the test Chess dataset. The figure portrayed that the ADODNN model has shown a precision of 0.6296, recall of 0.6010, specificity of 0.7705, accuracy of 0.7038, and F score of 0.6150. Simultaneously, the CIDDADODNN scheme has displayed manageable results over the ADODNN model with the supreme precision of 0.7515, recall of 0.7974, specificity of 0.7311, accuracy of 0.7646, and F score of 0.7738.
Table 4 and Fig. 13 performs a detailed comparative results analysis of the CIDDADODNN model on the test KDDCup99 dataset [22]. The resultant values reported that Gradient Boosting and Naïve Bayesian models have depicted inferior performance by obtaining minimum accuracy values of 0.843 and 0.896, respectively. Besides, the Gaussian process and OCSVM models have depicted slightly higher accuracy values of 0.911 and 0.918, respectively. Followed by, the DNNSVM model has accomplished a manageable accuracy of 0.92. However, the presented ADODNN and CIDDADODNN models have exhibited superior performance by obtaining a higher accuracy of 0.927 and 0.959, respectively.
Table 5 and Fig. 14 computes a detailed comparative results analysis of the CIDDADODNN model on the test Spam dataset [23,24,25]. The resultant scores reported that HELF and KNN models have depicted inferior performance by obtaining lower accuracy values of 0.750 and 0.818, respectively. Followed by, the GA and Adaboost models have depicted slightly higher accuracy values of 0.840 and 0.870 correspondingly.
Similarly, the NB approach has depicted a reasonable result with accuracy value of 0.881. Followed by, the Flexible Bayes model has accomplished a manageable accuracy of 0.888. But, the proposed ADODNN and CIDDADODNN schemes have implemented supreme function by gaining maximum accuracy of 0.896 and 0.932, respectively.
Table 6 and Fig. 15 defines a detailed comparative results analysis of the CIDDADODNN model on the test Chess dataset [26]. The resultant values addressed that ZeroR and SVM models have depicted poor performance by accomplishing minimal accuracy values of 0.390 and 0.420, respectively. Then, the LR and OneR methods have demonstrated moderate accuracy values of 0.549 and 0.598 correspondingly. Besides, the MLP scheme has attained a considerable accuracy of 0.647. Thus, the newly projected ADODNN and CIDDADODNN approaches have represented supreme function by gaining optimal accuracy of 0.703 and 0.764, respectively.
From the detailed experimental analysis, it is evident that the CIDDADODNN model has accomplished an effective outcome on all the applied dataset. Particularly, the presented CIDDADODNN model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively. It is due to the following reasons: effective handling of class imbalance problems, accurate drift detection, and proficient hyperparameter tuning process. Therefore, the CIDDADODNN model has been found to be an effective tool for classifying highly imbalanced streaming data.
Conclusion
This paper has designed a novel CIDDADODNN model for the classification of highly imbalanced streaming data. Primarily, preprocessing of the raw streaming data takes place in three ways such as format conversion, data transformation, and chunk generation. The ADASYN model receives the preprocessed data as input and makes use of a weighted distribution for dissimilar minority class instances based on the learning levels of difficulty. The application of ADASYN model balances the dataset effectively and then drift detection process gets executed by the use of ADWIN technique. To determine the effective learning rate of the DNN model, ADO is applied which computes the learning rate in such a way to attain maximum classification performance. For ensuring the classifier results of the CIDDADODNN model, a comprehensive set of experimentations were carried out. The simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively. In future, the performance of the CIDDOADODNN model can be improved using feature selection and clustering techniques.
References
Aggarwal CC (2007) Data streams: models and algorithms. Springer, Berlin
AlKateb M, Lee BS, Wang XS (2007) Adaptivesize reservoir sampling over data streams. In: IEEE in Null, p 22
Wu K et al (2017) Statistical data reduction for streaming data. In: IEEE scientific data summit (NYSDS)
Wu K et al. (2014) Classifying imbalanced data streams via dynamic feature group weighting with importance sampling. In: Proceedings of the SIAM international conference on data mining, society for industrial and applied mathematics
Cervellera C, Macciò D (2017) Distributionpreserving stratified sampling for learning problems. IEEE Trans Neural Netw Learn Syst 27:2886–2895
Zhang W et al (2017) Adaptive sampling scheme for learning in severely imbalanced large scale data. In: Asian Conference on Machine Learning
Ros F, Guillaume S (2016) DENDIS: a new densitybased sampling for clustering algorithm. Expert Syst Appl 56:349–359
Wang SK, Dai BR (2016) A Gmeans update ensemble learning approach for the imbalanced data stream with concept drifts. In: International conference on big data analytics and knowledge discovery. Springer, Cham, pp 255–266
Ren S, Liao Bo, Zhu W, Li Z, Liu W, Li K (2018) The gradual resampling ensemble for mining imbalanced data streams with concept drift. Neurocomputing 286:150–166
Thalor MA, Patil ST (2018) Propagation of misclassified instances to handle nonstationary imbalanced data stream. J Eng Sci Technol 13(4):1134–1142
Iwashita AS, Albuquerque VHC, Papa JP (2019) Learning concept drift with ensembles of optimumpath forestbased classifiers. Future Gener Comput Syst 95:198–211
Ren S, Liao Bo, Zhu W, Li K (2018) Knowledgemaximized ensemble algorithm for different types of concept drift. Inform Sci 430:261–281
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp 1322–1328), IEEE
Brzeziński D (2010) Mining data streams with concept drift. In: Cs Put Pozna, p 89
Bifet A, Gavalda R (2007) Learning from timechanging data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining (pp 443–448). Society for Industrial and Applied Mathematics
Kannadasan K, Edla DR, Kuppili V (2019) Type 2 diabetes data classification using stacked autoencoders in deep neural networks. Clin Epidemiol Glob Health 7(4):530–535
Kandel I, Castelli M, Popovič A (2020) Comparative study of first order optimizers for image classification using convolutional neural networks on histopathology images. J Imaging 6(9):92
Zeiler MD (2012) Adadelta: an adaptive learning rate method. http://arxiv.org/abs/1212.5701
(2019) http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. Accessed 27 Jan 2021
(2019) https://www.csee.usf.edu/lohall/dm/UCIarff/spambase.arff. Accessed 27 Jan 2021
(2019) https://sites.google.com/site/zliobaite/resources1. Accessed 27 Jan 2021
Hindy H, Atkinson R, Tachtatzis C, Colin JN, Bayne E, Bellekens X (2020) Towards an effective zeroday attack detection using outlierbased deep learning techniques. http://arxiv.org/abs/2006.15344
PérezDíaz N, RuanoOrdás D, FdezRiverola F, Méndez JR (2016) Boosting accuracy of classical machine learning antispam classifiers in real scenarios by applying rough set theory. Sci Program 2016:5
Zhao C, Xin Y, Li X, Yang Y, Chen Y (2020) A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci 10(3):936
Saidani N, Adi K, Allili MS (2020) A semanticbased classification approach for an enhanced spam detection. Comput Secur 2020:101716
Aubaid AM, Mishra A (2020) A rulebased approach to embedding techniques for text document classification. Appl Sci 10(11):4009
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have expressed no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Implementation Results of ADODNN on KDDCup99 Dataset
Implementation Results of CIDDADODNN on Spam Dataset
Implementation Results of Drift Detection on Chess Dataset
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Priya, S., Uthra, R.A. Deep learning framework for handling concept drift and class imbalanced complex decisionmaking on streaming data. Complex Intell. Syst. 9, 3499–3515 (2023). https://doi.org/10.1007/s40747021004560
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40747021004560