Neuroevolutionary learning in nonstationary environments

Escovedo, Tatiana; Koshiyama, Adriano; da Cruz, Andre Abs; Vellasco, Marley

doi:10.1007/s10489-019-01591-5

Neuroevolutionary learning in nonstationary environments

Open access
Published: 30 January 2020

Volume 50, pages 1590–1608, (2020)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

Neuroevolutionary learning in nonstationary environments

Download PDF

Tatiana Escovedo¹,
Adriano Koshiyama ORCID: orcid.org/0000-0001-7536-1503²,
Andre Abs da Cruz³ &
…
Marley Vellasco¹

2230 Accesses
4 Citations
Explore all metrics

Abstract

This work presents a new neuro-evolutionary model, called NEVE (Neuroevolutionary Ensemble), based on an ensemble of Multi-Layer Perceptron (MLP) neural networks for learning in nonstationary environments. NEVE makes use of quantum-inspired evolutionary models to automatically configure the ensemble members and combine their output. The quantum-inspired evolutionary models identify the most appropriate topology for each MLP network, select the most relevant input variables, determine the neural network weights and calculate the voting weight of each ensemble member. Four different approaches of NEVE are developed, varying the mechanism for detecting and treating concepts drifts, including proactive drift detection approaches. The proposed models were evaluated in real and artificial datasets, comparing the results obtained with other consolidated models in the literature. The results show that the accuracy of NEVE is higher in most cases and the best configurations are obtained using some mechanism for drift detection. These results reinforce that the neuroevolutionary ensemble approach is a robust choice for situations in which the datasets are subject to sudden changes in behaviour.

Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review

Article Open access 19 April 2022

Particle swarm optimization algorithm: an overview

Article 17 January 2017

A Survey on ensemble learning under the era of deep learning

Article 02 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The ability of a classifier to learn from incremental and dynamic data extracted from a nonstationary environment (when data distribution changes over time) poses a challenge to the field of computational intelligence. In the context of neural networks, the problem becomes even more complicated, since most of the existing models must be retrained when a new data block is available, using the whole set of patterns learned until then. To cope with that sort of problem, a classifier must, ideally, be able to [43]:

Track and detect any changes in the underlying data distribution;
Learn with new data without the need to present the whole dataset again for the classifier;
Adjust its own parameters in order to address the detected changes on data;
Forget what has been learned when that knowledge is no longer useful for classifying new instances.

All these abilities seek, in one way or another, to deal with a phenomenon called concept drift [51, 22]. This phenomenon defines datasets that suffer changes over time, such as when there is a change in the relevance of the variables, or when the mean and variance of the variables change.

Many approaches have been devised to accomplish some or all of the abilities mentioned above. One of the older and simpler approaches is a sliding window (not always continuous) on the input data used to train the classifier with the data delimited by this window [21]. Another method is to detect deviations and, if they occur, to adjust the classifier [7]. Some models, in turn, use rule-based classifiers, like [43, 59,60,61]. A more successful and widely used approach though is to use a group of different classifiers (ensemble) to cope with changes in the environment. Several different ensemble models have been proposed in the literature, including recent approaches like [56,57,58], and may or may not weigh each of its members. Most models using weighted classifier ensembles determine the weights for each classifier using a set of heuristics related to classifier performance in the most recent data received [22].

Although several algorithms have already been proposed in the literature for classification in concept drift scenarios - many even using ensembles - for this type of problem, neuroevolution has still been little explored. Neuroevolution uses evolutionary algorithms to adjust parameters that affect the performance of artificial neural networks, such as topology, learning rate, weights, among others. In this case, each solution of the evolutionary algorithm stores a representation of these parameters, which are evolved to find the optimal network for the problem. Applied to neural network ensembles, evolutionary algorithm is also able to dynamically adjust the entire model, a task that would be very arduous if performed manually, due to the complexity of the model.

Because of the architecture complexity, it is necessary that the neuroevolutionary models based on classifier ensembles have good computational performance and fast convergence, in order to be able to be applied in real scenarios. This feature becomes even more relevant in nonstationary environments, since it is necessary to update the ensemble each time new data become available or when some change is detected in data. Thus, this step must be fast so as not to compromise the overall performance of the model. To deal with this issue, an interesting and still little-explored strategy in the literature related to neuroevolutionary models is the quantum-inspired evolutionary algorithms. This is a class of evolutionary algorithms developed to achieve better performance in computationally intensive problems, inspired by quantum computing principles [17, 18, 2, 39, 52, 8]. One of the main advantages of the quantum-inspired evolutionary models is that good solutions are obtained with the smallest possible number of evaluations. This class of algorithms has been previously used in the literature to solve combinatorial and numerical optimization problems, based on binary [18, 39] and real representations [2, 39, 52], providing better results and using less computational effort than classical genetic algorithms [47]. Applied to neural network ensembles, quantum-inspired evolutionary algorithms can be used to model the neural networks and to determine the voting weights for each ensemble member. Thus, each time a new block of data arrives, the ensemble can be optimized, improving its classification performance for the new data.

Models for learning in nonstationary environments can or cannot contain drift detection mechanisms. Most of the models found in the literature assume that the changes occur in a hidden context external to the model itself and, therefore, the drift cannot be predicted [15]. For this reason, these models use the passive and reactive approaches, that is, from the results of the model (in classification problems, the label predicted by the model is compared with the real label received), verify the drift occurrence and react to it only after the error is observed in the model. However, anticipating the detection of drift in the input data before they are submitted for prediction (i.e., before receiving the true labels) seems to be a more satisfactory approach since it permits to adjust the model previously to better deal with the new scenario and avoid the classification error. For this reason, the model proposed in this work uses this active approach, being an important differential compared to the existing approaches in the literature.

Given the above, the main objective of this work is to propose and develop a self-adaptive and flexible model, with good accuracy and suitable for learning in nonstationary environments. A new quantum-inspired neuroevolutionary model, based on a Multi-Layer Perceptron (MLP) neural network ensemble, will be presented for learning in nonstationary environments. The proposed model, called NEVE (Neuroevolutionary Ensemble), has the following characteristics:

Contains a concept drift detection mechanism, with the ability to detect changes proactively or reactively. This method, already detailed in [10] allows the reaction and adjustment of the model whenever necessary;
Performs the automatic generation of new classifiers for the ensemble, most suitable for the new input data, using the quantum-inspired evolutionary algorithm for numerical and binary optimization (QIEA-BR) [39];
Automatically determines the voting weights of each ensemble member, using the quantum-inspired evolutionary algorithm for numerical optimization (QIEA-R) [2, 52], a simplified version of QIEA-BR.

Several experiments were performed with artificial and real datasets to validate and compare the performance of the proposed model with other existing models for learning in nonstationary environments, verifying how the detection model affects the performance and accuracy of NEVE.

This work is structured in four additional sections. Section 2 presents a brief review of the literature related to the fundamentals of concept drift. It also describes the evolutionary models with quantum inspiration used in this work: QIEA-R and QIEA-BR. Section 3 presents the proposed neuroevolutionary model (NEVE) and Section 4 discusses the experimental results. Finally, Section 5 presents the conclusions of this work and possibilities of future work.

2 Literature review

2.1 Concept drift

The term concept drift can be defined informally as a change in the concept definition over time and, hence, change in its distribution. Concept drift refers to a supervised learning scenario, where the relationship between the input data and the target variable changes over time [15]. An environment from which this kind of data is obtained is considered a nonstationary environment. Formally speaking, considering the posterior probability of a sample x belonging to a class y, according to [9] concept drift is any scenario in which this probability changes over time, that is: P_t + 1(y| x) ≠ P_t(y| x).A practical example of concept drift mentioned in [29] is detecting and filtering out spam e-mails. The description of the two classes “spam” and “non-spam” may vary over time. They are user specific, and user preferences also change over time. Moreover, the variables used at time t to classify spam may be irrelevant at t + k. In this way, the classifier must deal with “spammers”, who will keep creating new forms to trick the classifier into labeling a spam as a legitimate e-mail.

Concept drift is usually classified in abrupt or gradual [15, 51, 54]. The abrupt drift occurs when a concept A is abruptly switched for another concept B, that is, at time t the source S1 is suddenly replaced by S2. The gradual drift, on the other hand, happens when a concept A is gradually exchanged for the other concept B. In this case, while there is no definitive change from concept A to concept B, we observe more and more occurrences of B and fewer occurrences of A. Both sources S1 and S2 are active, but as time passes, the probability of sampling the source S1 decreases as the sample probability of the source S2 increases. At the beginning of this drift, before more instances are observed, an instance of the S2 source can be easily mistaken for random noise. It is important to note that noise (or outlier) is not considered a type of drift because it refers to an anomaly or isolated occurrence of a random drift. In this case, there is no need to adapt the model, which should be robust to noise.

The term “Drift Detection” refers to techniques and mechanisms for detecting drift by identifying points of change or small intervals during which the variations occur. In this case, the environment has sufficiently changed so that the existing models can no longer be effective to predict the behavior of the current data [15]. Several drift detection mechanisms have already been proposed in the literature, but most of them work reactively: they compare the class predicted by the classifier to the correct class label received later, noticing the drift only after its occurrence and the misclassification. Only then, the reactive detector applies a sequence of procedures to identify some change in the conditional class distribution - a concept drift. Examples of reactive detectors can be found in [14, 5, 36, 4, 42, 3, 31, 23, 13, 46].

Few papers use a proactive approach. [28] applies principal component analysis (PCA) for features extraction before the drift detection. The authors discuss and show evidence that components with lower variance should be stored as the extracted features, since they are more likely to be affected by a change. The authors then choose a change detection criterion based on the semiparametric log-likelihood function that is sensitive to changes in the mean and variance of the multidimensional distributions.

In [10], we proposed a new drift detection mechanism, called DetectA (Detect Abrupt Drift), which uses a proactive detection approach. This model is used in the experiments of this work and comprises three basic steps: (i) label the patterns from the test set (an unlabelled data block), using an unsupervised method; (ii) compute some statistics from the training and test sets, conditioned to the given class labels provided in the training set; and (iii) compare the training and testing statistics using a multivariate hypothesis test. Based on the results of the hypothesis tests, we attempt to detect the drift on the test set, before the real labels are obtained.

Algorithms for handling concept drift problems can be categorized in several ways. Table 1, based on [9, 27, 29, 30], summarizes the most commonly used classifications in the literature, with their respective definitions.

Table 1 – Types of Algorithms

Full size table

Algorithms that use the passive approach (without drift detection) regularly update the model as new data arrives and a forgetting heuristic is used, independently of the existence of change. For example: in a classifier ensemble, the weights of the members are updated after each new data received (individual or in blocks), based on the recent accuracy of ensemble members. Without concept drift, the classification accuracy will be stable and the weights will converge. If any changes occur, the weights will change to reflect them, without the need for explicit detection [29].

However, this can be very costly if the amount of data that arrives is excessively large or if the application require user feedback to label the data, which can be time-consuming. One way to reduce this problem is to use special techniques to detect changes and adapt the model only when unavoidable, using the active approach [51], also called trigger approach. In general, when active approaches detect a drift, some action is taken, for example, configuring a window with the latest data and retraining the classifier, or adding a new classifier to the ensemble.

Thus, the active method seeks to point out when the drift occurred and allows the model to modify itself or continue learning in the same way. A disadvantage of this method is the risk of having an imperfect mechanism that can produce false alarms, which is very common particularly in noisy datasets. In the passive mechanism, the learner believes that the environment can change at any time or can be continuously changing. The algorithm then continues to learn from the environment, building and organizing its knowledge base. If a change has occurred, it is learned. If nothing happened, the existing knowledge is reinforced [9]. The majority of literature ensembles follow a passive schema of adaptation, whereas active approaches are usually used with single online classifiers [27]. The models [24, 26, 48] are examples of passive approaches and the models [14, 5, 36,37,38, 32] are examples of active approaches.

Regarding data entry, it is worth emphasizing that individual patterns can be converted into batches or blocks of data. The opposite is also possible, but block data can come in large quantities, making instance-based processing very time-consuming [29].

Comparing single classifier x ensemble approaches, ensemble-based approaches are newer and tend to have better accuracy, flexibility, and efficiency than those using a single classifier [29]. It is important to remember that in massive datasets it is often preferred to use simple models - such as single classifiers - since there may not be time to execute and update an ensemble. On the other hand, some authors argue that a simple ensemble may be easier to use than certain simple adaptive classifiers, such as decision trees. When time is not the main concern, but high accuracy is required, an ensemble becomes the natural solution. For example, in mammography screening for tumors, it is acceptable to take a few minutes per image [30]. Ensemble approaches can use different methods to adapt to a concept drift.

As mentioned earlier, responding to several types of concept drift is a difficult task for a simple classifier. For this reason, several systems based on classifier ensembles have recently been proposed to deal with concept drift learning, such as [49, 48, 11, 12, 44, 24,25,26, 45, 9, 33, 53, 6, 50]. The main novelty proposed in this work is the possibility of using an active drift detection mechanism (DetectA) together with an ensemble of neural networks, trained and combined through quantum-inspired evolutionary algorithms, allowing automatic and dynamic adjustment of the classifiers and their weights in the ensemble, using less computational time.

2.2 Quantum-inspired evolutionary algorithms

Classical evolutionary algorithms have been used successfully to solve complex optimization problems in a wide range of fields, such as automatic circuit design and equipment, task planning, software engineering and data mining, among many others [1, 2]. The fact that this class of algorithms does not require rigorous mathematical formulations about the problem to be optimized, besides offering a high degree of parallelism in the search process, are some of the advantages of the use of evolutionary algorithms.

However, some problems are computationally costly regarding the evaluation of the fitness function during the search process, making optimization by evolutionary algorithms a slow process for situations where a fast response is desired (as in online optimization problems). In order to address these issues, quantum-inspired evolutionary algorithms have been developed, which are a class of estimation distribution algorithms that perform better in combinatorial and numerical optimization when compared to their homologous canonical genetic algorithms [1, 2, 8, 17, 18, 39, 52].

These algorithms are inspired by concepts of quantum physics, in particular in the concept of superposition of states, and were initially developed for optimization problems using binary representation, such as the Quantum-Inspired Evolutionary Algorithm (QIEA-B) [17,18,19,20], which uses a chromosome formed by q-bits. Each q-bit consists of a pair of numbers (α, β), where |α²| + |β²| = 1. The value |α²| indicates the probability that the q-bit has value 0 when observed, while value |β²| indicates the probability that the q-bit has value 1 when observed. Thus, in QIEA-B, a quantum individual is formed by M q-bits, according to (1):

$$ \mid {\displaystyle \begin{array}{l}{\alpha}_{i1}\\ {}{\beta}_{i1}\end{array}}\Big\Vert {\displaystyle \begin{array}{l}{\alpha}_{i2}\\ {}{\beta}_{i2}\end{array}}\mid ...\mid {\displaystyle \begin{array}{l}{\alpha}_{iM}\\ {}{\beta}_{iM}\end{array}}\mid $$

(1)

where i = 1, 2, 3, ..., M.

Quantum-inspired evolutionary algorithms were then extended to real representation, to better deal with numerical optimization problems. In these problems, the direct representation is more appropriate, in which real numbers are directly encoded in a chromosome rather than converting binary strings into numbers. With real numerical representation, the memory demand is reduced while the precision is increased [1]. Thus, the Quantum-Inspired Evolutionary Algorithm with Real Representation (QIEA-R) was developed [1, 2], inspired by the concept of multiple universes of quantum physics. In this scenario, the algorithm allows performing the optimization process with a smaller number of evaluations, substantially reducing the computational cost. Next sections describe the QIEA-R and QIEA-BR models, which are better suited to neuroevolution.

2.2.1 Quantum-inspired evolutionary algorithm with real representation (QIEA-R)

Originally proposed in [1], this algorithm was used to solve numerical optimization benchmark problems and the neural evolution of recurrent neural networks. The results obtained demonstrated the efficiency of this algorithm in the solution of these types of problems.

In QIEA-R, the quantum population Q(t) consists of N quantum individuals qi (i = 1, 2, 3, .., N) which are composed of G quantum genes. Each quantum gene is formed by a probability density function (PDF), which represents the superposition of states and is used to observe the classical gene. Quantum individuals can be represented by:

$$ {q}_i=\left[{g}_{i1}={p}_{i1}(x),{g}_{i2}={p}_{i2}(x),...,{g}_{iG}={p}_{iG}(x)\right] $$

(2)

where i = 1, 2, 3, ..., N, j = 1, 2, 3, ...,G and pij functions represent the probability density functions used by the QIEA-R to generate the values for the genes of the classical individuals. In other words, the pij(x) function represents the probability density of observing a given value for the quantum gene when its overlap is collapsed. The probability density function used by [1] is the square pulse, an uniform function of simple geometry, which can be defined by eq. 3:

$$ {p}_{ij}(x)=\Big\{{\displaystyle \begin{array}{l}{U}_{ij}-{L}_{ij},{L}_{ij}\le x\le {U}_{ij}\\ {}0, otherwise\end{array}} $$

(3)

where Lij is the lower limit and Uij is the upper limit of the interval in which the gene j of the i-th quantum individual can collapse, i.e., assume values when observed.

For the case where pij(x) is a square pulse, the quantum gene can be represented by storing the position of the center point of the square pulse and its width: μ_ij and σ_ij, respectively. The QIEA-R also uses a population of quantum individuals, which are observed to generate the classical individuals. The updating of the quantum individuals is carried out based on the evaluation of the classic individuals: μ_ij and σ_ij are altered in order to bring the pulse to the most promising region of the search space, increasing the probability of observing a certain set of values for the classical gene in the vicinity of the most successful individuals in the classical population. The pseudocode of the QIEA-R algorithm is shown in Appendix 1.

In this work, the QIEA-R is used to evolve voting weights for each classifier member of the ensemble and thus determine the final decision of the ensemble. In this way, the chromosome will have size n, where n represents the number of ensemble members. Each gene, in turn, will represent the voting weight associated with each classifier. Further details on QIEA-R can be found in [1, 2, 52].

2.3 Quantum-inspired evolutionary algorithm with binary-real representation (QIEA-BR)

The main motivation for creating an algorithm with mixed representation is that many real problems cannot be solved only by numerical decisions or combinatorial decisions. More specifically in the field of neural networks, the modeling process may involve combinatorial decisions (selection of the most relevant variables to the input layer, how many neurons should be used in the middle layer, etc.) and, simultaneously, numerical decisions (optimal values for synaptic weights).

With this motivation, [40] proposed the creation of an algorithm with quantum inspiration and binary-real representation, called QIEA-BR, for simultaneous optimization of combinatorial and numerical problems, that is, of mixed nature. The QIEA-BR algorithm was the first evolutionary algorithm with quantum inspiration and mixed representation proposed in the literature and will inherit the main characteristics of its precursors, such as global problem-solving ability and probabilistic representation of the search space. This mixed representation results in high population diversity in each quantum individual and the need of fewer individuals in the population to explore the search space.

The QIEA-BR algorithm also requires a population of quantum individuals that represents the overlap of possible states that the classical individuals can assume when observed. The quantum population Q(t), at any instant t of the evolutionary process, is formed by a set of N quantum individuals qi (i = 1, 2, 3, .., N). Each quantum individual qi of this population is formed by L genes gij (j = 1, 2, 3, ..., L). The main difference between the QIEA-BR and its predecessors is that part of the L genes is represented by q-bit, similar to QIEA-B, and another part by real quantum genes (q-real, similar to QIEA-R). Thus, the representation of a quantum individual i at any time instant t is given by:

$$ {q}_i=\left[{\left({q}_i\right)}_b{\left({q}_i\right)}_r\right] $$

(4)

where the index b represents the binary part (q-bit) and the index r represents the real part (q-real). Thus a quantum individual can be described by:

$$ {q}_i=\left[{\left({q}_i\right)}_b{\left({q}_i\right)}_r\right]={\left(|\begin{array}{l}{\alpha}_{i1}\\ {}{\beta}_{i1}\end{array}\Big\Vert \begin{array}{l}{\alpha}_{i2}\\ {}{\beta}_{i2}\end{array}|...|\begin{array}{l}{\alpha}_{iM}\\ {}{\beta}_{iM}\end{array}|\right)}_b{\left(|\begin{array}{l}{\mu}_{i1}\\ {}{\sigma}_{i1}\end{array}\Big\Vert \begin{array}{l}{\mu}_{i2}\\ {}{\sigma}_{i2}\end{array}|...|\begin{array}{l}{\mu}_{iG}\\ {}{\sigma}_{iG}\end{array}|\right)}_r $$

(5)

In this work, the QIEA-BR is used to perform the complete modeling of an artificial MLP neural network. The binary part selects the most appropriate input variables; defines which neurons (of a maximum number of neurons) are active in the hidden layer (1 active neuron, 0 inactive); and specifies the activation function of each neuron in the network (1 hyperbolic tangent and 0 sigmoid). The real part determines the values of all weights. Figure 1 illustrates the information that is encoded in each of the quantum genes, binary or real, of a QIEA-BR chromosome. This chromosome will be used in the neuroevolutionary models presented in Section 4.

In QIEA-BR, the evolution of the weights and activation function of a certain neuron in the quantum and classical chromosomes is conditioned to that neuron being active in the corresponding binary part. That is, the genes representing the weights and activation functions will remain unchanged by quantum and classical evolutionary process if this neuron is inactive.

The neural network created by QIEA-BR is similar to that shown in Fig. 2: the effective number of attributes in the input layer and of neurons in the hidden layer are evolved by the QIEA-BR, with the maximum size of inputs equal to the number of available attributes in the dataset (k) and the maximum number of neurons in the hidden layer (nh) configured by the user.

Thus, the number of genes is given by:

$$ nu{m}_{genes}={\left(k+2 nh+ nc\right)}_b+\left(\left(k+1\right)\times nh\right)+{\left(\left( nh+1\right)\times nc\right)}_r $$

(6)

where nc is the number of classes in the classification problem. In this case, the evaluation function used is the classification accuracy given by:

$$ Accuracy=1-\frac{1}{n}\sum \limits_i\mid {C}_i-{\hat{C}}_i\mid $$

(7)

where C_i is the class of the i-th pattern, while $ {\hat{C}}_i $ is the class predicted by the individual (MLP). When $ {C}_i={\hat{C}}_i $ then the result is zero, otherwise it is equal to one. Each individual is submitted to this evaluation function, in such a way that the best individuals are those who have greater accuracy. Further details on QIEA-BR can be found in [40, 52].

3 NEVE: Neuroevolutionary Model for Learning in Nonstationary Environments

This section presents the proposed new quantum-inspired neuroevolutionary model, which is a self-adaptive and flexible model, with good accuracy and suitable for learning in nonstationary environments. The model is based on an ensemble of neural networks Multi-Layer Perceptron (MLP), where each neural network member is trained and has its parameters (topology, weights, among others) optimized by QIEA-BR algorithm (see Section 2). This neuroevolutionary model is called NEVE (Neuroevolutionary Ensemble) and is composed of three main modules, detailed below and illustrated in Fig. 3:

Drift Detection;
Classifier Creation;
Evaluation and combination weights.

The Drift Detection module is optional. If activated, for each new input data block received, the detection module checks if any drift has occurred. The model works with data blocks of configurable size. If it is necessary (or desired) to work with individual data inputs, the block can be set to size to 1. However, it is important to mention that the strategy of working with one instance at a time is not the most suitable for this model, as it may compromise its computational performance. Two methods of detection were proposed: proactive and reactive detection methods, resulting in four different approaches implemented for this drift detection module [10]:

No detection;
Reactive detection: waits until the real data block labels are available to check if a drift has occurred in relation to the previous data block;
Proactive detection (Group Label approach): for each new data block received, a clustering algorithm is performed using the centroids of the previous labeled data block as initial centroids. Based on the results of the clustering algorithm, the detection mechanism checks if a drift has occurred in relation to the previous block and, if so, a new MLP is created and trained with the new block and the class labels suggested in the clustering;
Proactive detection (Pattern Mean Shift approach): similar to the Group Label approach, with the difference that when a drift is detected, instead of creating a new MLP with the new data block, the old data block is used to train the MLP and the drift is “removed” from the new data block. While in the Group Label approach the new MLP is adjusted to the new data, in Pattern Mean Shift approach the new data is adjusted to the old MLP.

The Classifier Creation Module is responsible for creating a new classifier, which may or may not be added to the ensemble, depending on its maximum size defined by the user. It is worth mentioning that the decision to create a new neural network is linked to the drift detection mechanism used, which will be better detailed in the following subsections. If created, the new classifier is added to the ensemble if space is available or by replacing an older classifier of worse accuracy. This approach gives the ensemble the ability to learn the new data without having to analyze the old data, as well as allowing to forget the data that is no longer needed. In short, the classifier creation module determines the complete configuration of the new MLP network ensemble member using the QIEA-BR algorithm (presented in Section 2). The algorithm selects the most relevant input variables, specifies the number of neurons in the hidden layer (respecting the maximum limit configured by the user), and determines the weights and activation functions of each neuron. The number of output neurons is equal to the number classes in the application.

Finally, the Evaluation Module is responsible for determining the final response of the classifier ensemble by combining the results presented by the classifier members. The QIEA-R algorithm is used to determine the most suitable voting weight for each classifier dynamically. The optimization of weights allows the model to easily adapt to sudden data changes by assigning higher weights to the classifiers best suited to the current concepts that govern the data. Three possible voting methods were implemented:

Linear Combination: It uses the QIEA-R algorithm to generate a voting weight for each classifier, which is multiplied by the output of each ensemble member (between 0 and 1), on a weighted average. The result of this weighted average is used to determine the ensemble response. If the problem has only two classes, the output is assigned to class 0 if the result is less than 0.5 and to class 1 otherwise; in case of problems with multiple classes, the class will be the one that presents the output with the highest value;
Weighted Majority Voting: As in the previous case, it uses the QIEA-R algorithm to generate a voting weight for each classifier. However, the outputs of the neurons from each ensemble network are first rounded (for values 0 or 1) and then multiplied by the corresponding classifier weight, thus forming a weighted average. Similar to the linear combination, in problems with only two classes, the output is defined as class 0 if the result of the weighted average is less than 0.5 and as class 1 otherwise; in the case of problems with multiple classes, the class associated with the output with the highest value is defined;
Simple Majority Voting: The output of each ensemble member is rounded to one of the possible classes, and the ensemble final output is the most chosen class among all classifiers. In this case, there is no need to determine voting weights.

In summary, considering the detection mechanism used, there are four possible variations of the NEVE model proposed and detailed in the following subsections:

ND-NEVE, without detection
RD-NEVE, with reactive detection
PDGL-NEVE, with proactive detection and the Group Label approach
PDPMS- NEVE, with proactive detection and the Pattern Mean Shift approach

The following subsections detail each of the four proposed NEVE variations. For each variation, an explanatory text and a pseudocode of the algorithm is presented.

3.1 ND-NEVE (without detection)

The first variation of NEVE, “NEVE without Detection” (ND-NEVE), as the name implies, does not use any detection mechanism. It consists of an ensemble of MLP neural networks that, with each new data block received, it trains a new MLP that can be added to the ensemble if space is available.

The operation of ND-NEVE can be generalized as: when a data block t arrives (without the class labels), a new MLP network is trained using the QIEA-BR algorithm and t-1 block with the real class labels. The new network is provisionally added to the ensemble and the ensemble is tested with block t. Voting weights of all networks are determined using the QIEA-R algorithm and block t-1. The final ensemble classification is calculated using the test results with block t, the voting weights and the chosen voting method. Finally, we assume that the actual labels of block t become available and then, the permanence of the new network in the ensemble is evaluated. The pseudocode of the ND-NEVE is demonstrated in Appendix 2.

3.2 RD-NEVE (with reactive detection)

The second variation of NEVE is “RD-NEVE (with reactive detection)”. This variation uses the reactive detection mechanism, detailed in [10]. For each new data block received, the ensemble classifies it and, as soon as the real class labels are obtained, the detection mechanism checks if a drift has occurred from the previous data block. If so, a new MLP is created, which is added to the ensemble if space is available.

The operation of RD-NEVE can be generalized as follows: when a data block t arrives, the voting weights of all ensemble members are determined using the QIEA-R algorithm and the t-1 block. The ensemble is tested with block t and classification results are combined with the weights calculated by QIEA-R, using the chosen voting method to determine the final ensemble classification. It is assumed that the real labels of block t are later available and the reactive detection can be applied [10]. If a drift has occurred in block t, a new MLP network is created using the QIEA-BR algorithm and trained with block t. The new network is added to the ensemble if space is available or if it is better than at least one of the old networks, replacing it on the ensemble. The pseudocode of the DE-NEVE is demonstrated in Appendix 2.

3.3 PDGL-NEVE (with proactive detection and Group Label approach)

The third variation of NEVE is “PDGL-NEVE (with proactive detection and Group Label approach)”. This variation uses the proactive mechanism of detection [10], where each new data block is clustered, using the centroids of the previous data block as the initial centroids of the algorithm. Based on the clustering results, the detection mechanism checks if a drift has occurred from the previous data block; if so, the model trains a new MLP with the new data block and the class labels suggested by the clustering algorithm.

The operation of PDGL-NEVE can be summarized as: when block t arrives, its instances are grouped using the real classes of block t-1 as the initial suggestion of centroids, since the real class labels of block t are still unknown. Then, it is verified if a drift has occurred in block t in relation to block t-1. If so, a new MLP network is created using the QIEA-BR algorithm and trained using block t with the class labels provided by the clustering algorithm. The new network is provisionally added to the ensemble, which is tested with block t. The voting weights for all networks are determined using the QIEA-R algorithm and block t, also with the labels provided the clustering algorithm. The classification results and weights are combined using the chosen voting method to determine the final ensemble classification. It is assumed that the real labels of block t are later available and the initial centroids for the next grouping are updated, now considering the real class labels of the data block. The permanency of the new network in the ensemble is evaluated: it stays if space is available or if it is better than at least one of the old networks, replacing it in the ensemble. The pseudocode of the PDGL-NEVE is demonstrated in Appendix 2.

3.4 PDPMS-NEVE (with proactive detection and Pattern Mean Shift approach)

The fourth variation of the NEVE is “PDPMS-NEVE (with proactive detection and Pattern Mean Shift approach)”. This variation also uses the proactive detection [10]. As in the previous variation, each new data block is grouped to verify if a drift has occurred from the previous data block. If so, a new MLP is trained with the previous labeled data block, and the new data block is “adjusted” towards the previous data block. In other words, when a drift is detected, instead of creating a new MLP using the new data block (as performed by the Group Label approach), the old data block is used to train the network and the drift is “removed” from the new data block. While in the Group Label approach the new network is suitable for the new data, in Pattern Mean Shift approach the new data is adjusted to the old network (trained with the old data). The pseudocode of the PDPMS-NEVE is demonstrated in Appendix 2.

Briefly, the main difference between PDGL-NEVE and PDPMS-NEVE is that in PDPMS-NEVE, when a drift is detected, a new MLP is created using the previous labeled data block (and not the new data block with the labels provided in the grouping, as in the PDGL-NEVE). Then, the new data block is “adjusted” in the direction of the previous data block and it is submitted to the ensemble classification. In the PDGL-NEVE, on the other hand, the new data block is tested by the ensemble without adjusting the data. Additionally, in PDPMS-NEVE the data block used to determine the weights of each MLP is the old data block with the real labels, while in the PDGL-NEVE the new data block is used with the labels provided by the grouping.

This section presented the neuroevolutionary model for learning in nonstationary environments proposed in this paper and detailed its four variations. The next section describes the experiments performed with the proposed detection methods.

4 Experiments

To assess the ability of the proposed model to learn in nonstationary environments and also to verify the best variations and configurations of the models regarding accuracy and computational performance, six different datasets were used on different simulations and scenarios. For the experiments, the four variations of the proposed model (described in Section 3) were used: ND-NEVE, RD-NEVE, PDGL-NEVE and PDPMS-NEVE. All experiments were run using standard libraries of MATLAB, as well as its Neural Networks package to train the baseline networks.

4.1 Datasets description

The datasets used in the experiments are: the SEA Concepts (an artificial dataset with a more controlled environment about the drifts) and four real datasets (Nebraska, Electricity, Cover Type and Poker Hand), where the exact moment that the drift occurs is unknown.

The SEA Concepts dataset was artificially created by [49]. It is characterized by extensive periods without major changes in the environment, but with occasional abrupt drifts. The Nebraska dataset presents a compilation of climate measurements from the Offutt Air Force Base substation in Bellevue, Nebraska. Its objective is to predict whether a rainfall may appear, using data from the last 30 days. Both datasets are available in [41]. The Electricity dataset is extracted from the Australian New South Wales Electricity Market and the class label defines the price change related to a moving average of the last 24 h. The purpose of the problem is to predict whether the price will go up or down. The Cover Type dataset contains information cells corresponding to a forest cover of 30 × 30 meters, extracted from the US Forest Service (USFS). Its goal is to predict the type of forest cover among seven possible values (therefore, a multi-class problem). The Poker Hand dataset has ten possible categories as output, representing the poker hand that contains 5 cards. The purpose is to identify the type of a Poker hand among the ten possibilities. These datasets are available in [34]. Table 2 presents the main features of each dataset, as well as the block size and number of blocks used in the experiments.

Table 2 – Datasets Details

Full size table

4.2 Execution details

All executions begin at t = 0 and end when consecutive T data blocks are presented for training and testing, with each block being able to suffer different scenarios of concept drift with unknown rates and natures.

As detailed in Section 3, the QIEA-BR algorithm evolves the topology of each new neural network, which is created following the criteria of each variation of the proposed model. The number of input variables is selected by QIEA-BR among the available variables in each dataset. For all datasets, a single hidden layer was used, whose number of neurons is evolved by QIEA-BR, having a maximum value specified by the user. The number of neurons in the output layer is equal to the number of classes in each dataset. The synaptic weights and activation functions of the hidden layer and the output layer are also determined by QIEA-BR.

The parameters of the quantum evolutionary algorithms are the same as those used by [1, 40] and they are detailed in Table 3. The three voting methods detailed in Section 3 were evaluated: linear combination, weighted majority voting and simple majority voting. The maximum ensemble size is also a parameter defined by the user. Table 3 presents the configuration of the parameters used in all the experiments.

Table 3 – Experiments Settings

Full size table

Thus, for each dataset, 72 different configurations of the model (4 × 3 × 3 × 2) were used, representing each possible combination of the parameters to be evaluated, as shown in Table 3. For each configuration, 30 simulations were performed and the average accuracy and computational time of these runs were calculated.

5 Results

The experiments presented below aimed at investigating the difference between accuracy (the ratio of number of correct predictions to the total number of input samples) and computational performance (execution time in seconds) among each of the four variations of the NEVE model, as well as the impact of the voting method, ensemble size and number of neurons in the hidden layer. Therefore, the objective of the experiment is to analyze how these modifications affect the results of the models for each dataset.

Tables 4, 5, 6 and 7 show the results of the experiments performed considering the accuracy and the computational performance measured in seconds. It should be noted that execution time is provided only for the SEA Concepts, Nebraska and Electricity datasets. Due to the considerable size of Poker Hand and Cover Type datasets, their execution required the parallelization on several computers, making the comparison of runtime between simulations impracticable. In all cases, the observed standard deviation was less than 2%. We highlighted the best 20% results in bold and gray and the worst 20% results in italics and underlined.

Table 4 – Results for Dataset SEA Concepts

Full size table

Table 5 – Results for Dataset Nebraska

Full size table

Table 6 – Results for Dataset Electricity

Full size table

Table 7 – Results for Datasets Poker Hand e Cover Type

Full size table

The analysis of Tables 4 to 7 shows that:

In general, the ND-NEVE, RD-NEVE and PDPMS-NEVE approaches provided the best accuracy, while the PDGL-NEVE had the worst accuracy;
Considering computational performance, the ND-NEVE, RD-NEVE and PDPMS-NEVE approaches presented the best computational times, and the PDGL-NEVE approach, the worst. It was observed, however, that the dataset also has a great influence on this criterion: the slowest was Electricity, which is the dataset that has the highest number of attributes and also a greater number of blocks among the datasets evaluated;
The best voting methods in terms of accuracy are, in this order: linear combination, followed by weighted majority and simple majority. This shows that the quantum algorithm is contributing positively to the accuracy of the model by determining the voting weights of the networks. Possibly, the early rounding performed in the weighted majority resulted in in attaining a lower average accuracy than the linear combination;
As for the computational performance, the best voting method was the simple majority, which was already expected since this method does not perform the determination of weights via quantum algorithm;
It is observed that, in general, the strategy of unlimited ensemble has lower accuracy than the limited ensembles. There was no significant difference in accuracy between the 5 and 10 ensemble size, which is a positive point, because the unlimited ensemble strategies also presented the worst computational performance, as expected. The unlimited ensemble tends to provide worse accuracy probably due to the increase in the search space of the QIEA-R for determining the voting weights when there are too many networks: it is enough to observe that, in all the datasets used, there are at least 400 data blocks, which allows ensembles of 400 networks for the unlimited case;
No substantial differences were observed either in the average accuracy or in the average computational performance considering the strategies of 5 and 10 neurons maximum in the hidden layer.

Figure 4 presents a comparative graph of the computational time for the three binary datasets: SEA, Nebraska and Electricity datasets. It can be observed that the computational time of the ND-NEVE approach is superior to the others, whereas approaches with some type of detection present a similar mean computational time. This confirms that the proposed detection mechanism contributes to reducing the average execution time of the models.

The accuracy of the proposed NEVE approaches was also compared with DWM [26], Learn ++, NSE [9], RCD [16], EFPT [55] and AMANDA [56] models. We used 3 different drift detectors for the RCD algorithm: DDM [14], EDDM [5] and ECDD [42]. These simulations were carried out using MOA [35], an open source framework for data mining that includes several learning algorithms implemented for classification, regression, clustering, concept drift detection, among others. For this comparison, we used the same block size chosen for NEVE simulations for all the datasets. In order to make a more coherent comparison with NEVE and to discard the influence of the base classifier on the accuracy of the model, in all other models, the MLP neural networks were used as base classifiers. All the models were parameterized using values indicated by the respective authors.

Table 8 presents the results of the best reached configuration (in terms of accuracy) of each NEVE variation, compared to the results of the other models. We highlighted the best results, by dataset, in bold and underlined, the second best in bold and the worst in italics and underlined. When more than one value is highlighted, it means that there is no statistically significant difference in the performance of the classifiers for 푝 ≤ 0.05, according to Wilcoxon test. We made 30 runs for each possible configuration and each dataset. In all cases, the observed standard deviation was less than 2%.

Table 8 – Comparison of results: Best case of NEVE x other models

Full size table

We can see from Table 8 that NEVE approaches obtained the best result in 2 datasets and the second best in the other 3. Apparently, the ND-NEVE and RD-NEVE approaches provide uniformly superior results in terms of accuracy. What is noticeable in this experiment, in general, is that the EFPT model is the main competitor of the NEVE in terms of accuracy considering SEA, Nebraska and Electricity datasets (as the author didn’t performed tests with Poker and Covtype datasets, we could not compare the models in this datasets) and the DWM models seems to be the main competitor of the NEVE in terms of accuracy considering Poker and Covtype datasets.

From the results presented, we can highlight that NEVE provides good results without the need for a detection method; however, by adding one, substantial gains in accuracy and computational performance can be obtained. This fact reinforces that the neuroevolutionary ensemble approach is a robust choice for situations in which datasets are subject to sudden behavioral changes.

6 Conclusion

This work presented a new neuroevolutive model with quantum-inspiration, based on a multi-layer perceptron (MLP) neural network ensemble for learning in nonstationary environments, called NEVE (Neuro-EVolutionary Ensemble). This model can be used in conjunction with the DetectA concept drift detection model [10], which has the ability to detect changes both proactively and reactively. The use of Quantum-Inspired Evolutionary Algorithms in conjunction with NEVE allows the automatic generation of new classifiers for the ensemble (including the decision of its topology, the most appropriate input variables and its weights) and determining the voting weights of each neural network member of the ensemble.

Four different variations of NEVE were implemented: ND-NEVE (without detection), RD-NEVE (with reactive detection), PDGL-NEVE (with proactive detection and Group Label approach), PDPMS-NEVE (with proactive detection and Pattern Mean Shift approach). These variations differ from each other in the way they detect and treat drifts, and were used in experiments with real and artificial datasets in order to evaluate which model variation and configurations achieved the best results. We varied the voting method, the maximum number of neurons in the hidden layer and the maximum size of the ensemble. It was found that the ND-NEVE, RD-NEVE and PDPMS-NEVE approaches produce best results in terms of accuracy and computational performance. It was also observed that the linear combination is the best voting method in terms of accuracy, and simple majority voting the best in terms of computational performance. The unlimited ensemble strategy has worse accuracy and computational performance than limited ensembles, with no significant difference between the 5 and 10 networks.

Compared with other consolidated models of the literature, the accuracy of NEVE was found to be superior in most cases. It appeared that the ND-NEVE and RD-NEVE approaches provide uniformly superior results in terms of accuracy, but the addition of the detection method in some cases has resulted in substantial gains. This fact reinforces that the neuroevolutionary ensemble approach was a robust choice for situations in which datasets are subject to sudden behavioral changes.

As future work, we intend to integrate, in a single evolutionary model, the creation of the neural network and the determination of voting weights, in order to perform the evolution process in a single integrated process. Also, it is intended to use NEVE for real applications, in order to validate its practical use, although it is very hard to know for sure if a dataset contains concept drift or not.

References

Abs da Cruz AV (2007) Algoritmos evolutivos com inspiração quântica para otimização de problemas com representação numérica. PhD Thesis, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, (in portuguese)
Abs da Cruz AV, Vellasco MMBR, Pacheco MAC (2008) Quantum-inspired evolutionary algorithm for numerical optimization. In Quantum inspired intelligent systems, pp. 115–132. Springer, Berlin Heidelberg
Alippi C, Liu D, Zhao D, Bu L (2014) Detecting and Reacting to Changes in Sensing Units: The Active Classifier Case. IEEE Transactions on Systems, Man, and Cybernetics: Systems 44(3):353–362
Article Google Scholar
Bach SH, Maloof MA (2012) Paired Learners for Concept Drift. Proc. of the 8th IEEE Int. Conf. on Data Mining (ICDM). IEEE, 23–32. Charts for Detecting Concept Drift. Pattern Recogn. Lett. 33, 2, pp. 191–198
Baena-García M, Del Campo-Ávila J, Fidalgo R, Bifet A (2006) Early drift detection method. Proc. of the 4th ECML PKDD International Workshop on Knowledge Discovery From Data Streams (IWKDDS’06), Berlin, Germany, pp. 77–86
Brzezinski D, Stefanowski J (2014) Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans on Neural Netw Learn Syst 25(1):81–94
Article Google Scholar
Carvalho V, Cohen W (2006) Single-Pass Online Learning: Performance, Voting Schemes and Online Feature Selection. Proc. of the 12th ACM SIGKDD Int. Conf. on Knowl. Disc. and DataMining (KDD) ACM, pp. 548–553
Dias DM, Pacheco MAC (2012) Quantum-inspired linear genetic programming as a knowledge management system. Comput J 56(9):1043–1062
Article Google Scholar
Elwell R, Polikar R (2011) Incremental Learning of Concept drift in Nonstationary Environments. IEEE Trans Neural Netw 22(10):1517–1531
Article Google Scholar
Escovedo T, Koshiyama A, Abs da Cruz A, Vellasco M (2017) DetectA: Abrupt Concept Drift Detection in Non-Stationary Environments. Appl Soft Comput (accepted for publication)
Fan W (2004) StreamMiner: a classifier ensemble-based engine to mine conceptdrifting data streams. In Proceedings of the 30th International Conference on Very Large Data Bases, pp. 1257–1260
Fan W (2004) Systematic data selection to mine concept-drifting data streams. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 128–137
Frías-Blanco I, del Campo-Avila J, Ramos-Jimenez G, Morales-Bueno R, Ortiz-Díaz A, Caballero-Mota Y (2015) Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds. IEEE Transaction On Knowledge Data Engineering 27(3):810–823
Article Google Scholar
Gama J, Medas P, Castillo G, Rodrigues PP (2004) Learning with drift detection. Advances in Artificial Intelligence - SBIA 2004, 17th Brazilian Symposium on Artificial Intelligence, São Luis, Maranhão, Brazil, pp. 286–295
Gama J, Žliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Computing Surveys (CSUR) 46(4):44
Article MATH Google Scholar
Gonçalves Júnior PM (2013) Multivariate Non-Parametric Statistical Tests to Reuse Classifiers in Recurring Concept Drifting Environments. PhD Thesis, Federal University of Pernambuco, Recife
Han K, Kim J (2000) Genetic quantum algorithm and its application to combinatorial optimization problem. Proceedings of the 2000 Congress on Evolutionary Computation 2:1354–1360
Article Google Scholar
Han K, Kim J (2002) Quantum-inspired evolutionary algorithm for a class of combinatorial optimization. IEEE Trans Evolutionary Computation 6(6):580–593
Article MathSciNet Google Scholar
Han K, Kim J (2003) On setting the parameters of qea for practical applications: Some guidelines based on empirical evidence. GECCO:427–428
Han K, Kim J (2004) Quantum-inspired evolutionary algorithms with a new termination criterion, He gate, and two-phase scheme. IEEE Trans Evolutionary Computation 8(2):156–169
Article MathSciNet Google Scholar
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In Proc. of The 2001 ACM Sigkdd Intl. Conf. on Knowledge Discovery and Data Mining, pp. 97–106
Karnick T, Ahiskali M, Muhlbaier M, Polikar R (2008) Learning concept drift in nonstationary environments using an ensemble of classifiers based approach. IJCNN, pp. 3455–3462
Khamassi I, Sayed-Mouchaweh M (2014) Drift detection and monitoring in non-stationary environments. Evolving and Adaptive Intelligent Systems (EAIS), 2014 IEEE Conference on, pp. 1–6. IEEE
Kolter J, Maloof M (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. Proceedings of the 3rd International IEEE Conference on Data Mining, pp. 123–130
Kolter J, Maloof M (2005) Using additive expert ensembles to cope with concept drift. In Proceedings of the 22nd International Conference on Machine Learninig, pp. 449–456
Kolter J, Maloof M (2007) Dynamic weighted majority: An ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790
MATH Google Scholar
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: A survey. Information Fusion 37:132–156
Article Google Scholar
Kuncheva LI, Faithfull WJ (2014) PCA Feature Extraction for Change Detection in Multidimensional Unlabeled Data. IEEE Transactions on Neural Networks and Learning Systems 25(1):69–80
Article Google Scholar
Kuncheva LI (2004) Classifier ensemble for changing environments. in Multiple Classifier Systems, vol. 3077. New York: Springer-Verlag
Kuncheva LI (2008) Classifier ensemble for detecting concept change in streaming data: Overview and perspectives. In Proc. Eur. Conf. Artif. Intell, pp. 5–10
Maayan H, Mannor S, El-Yaniv R, Crammer K (2014) Concept Drift Detection Through Resampling. In ICML, pp. 1009–1017
Minku L, White A, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742
Article Google Scholar
Minku L, Yao X (2012) DDD: A New Ensemble Approach for Dealing With Concept Drift. IEEE Transactions on Knowledge and Data Engineering, IEEE 24(4):619–633
Article Google Scholar
MOA Datasets (2018) MOA – Massive Online Analysis. Avaliable at: http://moa.cms.waikato.ac.nz/datasets/
MOA (2018) MOA – Massive Online Analysis. Available at: http://moa.cms.waikato.ac.nz/
Nishida K, Yamauchi K (2007) Adaptive classifiers-ensemble system for tracking concept drift. In Proceedings of the Sixth International Conference on Machine Learning and Cybernetics (ICMLC’07), Honk Kong, pp. 3607–3612
Nishida K, Yamauchi K (2007) Detecting concept drift using statistical testing, Discovery Science. Springer Berlin Heidelberg
Nishida K (2008) Learning and detecting concept drift. PhD Thesis, Hokkaido University, Japan
Pinho AG, Vellasco M, Abs da Cruz AV (2009) A new model for credit approval problems: A quantum-inspired neuro-evolutionary algorithm with binary-real representation. Nature & Biologically Inspired Computing (NaBIC). World Congress on. IEEE
Pinho AG (2010) Algoritmo evolucionário com inspiração quântica e representação mista aplicado a Neuroevolução. Master’s Dissertation, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, (in portuguese)
Polikar R, Elwell R (2013) Benchmark Datasets for Evaluating Concept drift/NSE Algorithms. Avaliable at: http://users.rowan.edu/~polikar/research/NSE
Ross GJ, Adams NM, Tasoulis DK, Hand DJ (2012) Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn Lett 33(2):191–198
Article Google Scholar
Schlimmer J, Granger R (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354
Google Scholar
Scholz M, Klinkenberg R (2005) An ensemble classifier for drifting concepts. In Proceedings of the 2nd International Workshop on Knowledge Discovery in Data Stream, pp. 53–64
Scholz M, Klinkenberg R (2007) Boosting classifiers for drifting concepts. Intelligent Data Analysis 11(1):3–28
Article Google Scholar
Sebastião R, Gama J, Mendonça T (2017) Fading histograms in detecting distribution and concept changes. International Journal of Data Science and Analytics:1–30
Silveira L, Tanscheit R, Vellasco M (2017) Quantum Inspired Evolutionary Algorithm for Ordering Problems. Expert Syst Appl 67:71–83
Article Google Scholar
Stanley KO (2003) Learning concept drift with a committee of decision trees. Department of Computer Sciences, University of Texas at Austin, Tech. Rep. AI-03-302
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for largescale classification. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 377–382
Sun Y, Wang Z, Liu H, Du C, Yuan J (2016) Online Ensemble Using Adaptive Windowing for Data Streams with Concept Drift. International Journal of Distributed Sensor Networks
Tsymbal A (2004) The problem of concept drift: Definitions and related work. Tech. Rep
Vellasco MBR, Abs da Cruz AV, Pinho AG (2010) Quantum-inspired evolutionary algorithms applied to neural network modeling. In IEEE world congress on computational intelligence (WCCI), pp. 125–150
Wozniak M, Kasprzak A, Cal P (2013) Application of combined classifiers to data stream classification. In Proceedings of the 10th International Conference on Flexible Query Answering Systems FQAS 2013, LNCS, page in press, Berlin, Heidelberg, SpringerVerlag
I. Zliobaite (2009) Learning under Concept Drift: An Overview. Tech. rep. Vilnius University
Jorge PMC (2018) Síntese de Comitê de Árvores de Padrões Fuzzy através da Programação Genética Cartesiana em Ambientes Não EstacionÁrios. MSc Dissertation, State University of Rio de Janeiro, Rio de Janeiro
Ferreira RS, Zimbrão G, Alvim LGM (2019) AMANDA: Semi-supervised density-based adaptive model for non-stationary data with extreme verification latency. Inf Sci 488:219–237
Article Google Scholar
Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68:677–692
Article Google Scholar
Ye R, Dai Q (2018) A novel greedy randomized dynamic ensemble selection algorithm. Neural Process Lett 47(2):565–599
Google Scholar
Cano A, Krawczyk B (2018) Learning classification rules with differential evolution for high-speed data stream mining on GPU s. 2018 IEEE Congress on Evolutionary Computation (CEC). IEEE
Cano A, Krawczyk B (2019) Evolving rule-based classifiers with genetic programming on gpus for drifting data streams. Pattern Recogn 87:248–268
Article Google Scholar
Angelov PP, Zhou X (2008) Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst 16(6):1462–1475
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, PUC-Rio, R. Marquês de São Vicente, 225 - Gávea, Rio de Janeiro, 22430-060, Brazil
Tatiana Escovedo & Marley Vellasco
Department of Computer Science, University College London, London, UK
Adriano Koshiyama
MDC Partners, Antwerp, Belgium
Andre Abs da Cruz

Authors

Tatiana Escovedo
View author publications
You can also search for this author in PubMed Google Scholar
Adriano Koshiyama
View author publications
You can also search for this author in PubMed Google Scholar
Andre Abs da Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Marley Vellasco
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adriano Koshiyama.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1 – Pseudocode of the QIEA-R algorithm

The pseudocode of the QIEA-R algorithm is shown as follows.

Appendix 2 – Pseudocode of NEVE algorithms

The pseudocode of the ND-NEVE is demonstrated as follows.

The pseudocode of the DE-NEVE is shown as follows.

The pseudocode of the PDGL-NEVE is demonstrated as follows:

The pseudocode of the PDPMS-NEVE is demonstrated as follows:

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Escovedo, T., Koshiyama, A., da Cruz, A.A. et al. Neuroevolutionary learning in nonstationary environments. Appl Intell 50, 1590–1608 (2020). https://doi.org/10.1007/s10489-019-01591-5

Download citation

Published: 30 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s10489-019-01591-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Neuroevolutionary learning in nonstationary environments

Abstract

Similar content being viewed by others

Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review

Particle swarm optimization algorithm: an overview

A Survey on ensemble learning under the era of deep learning

1 Introduction