Deep multi-layer perceptron-based evolutionary algorithm for dynamic multiobjective optimization

Dynamic multiobjective optimization problems (DMOPs) challenge multiobjective evolutionary algorithms (MOEAs) because of the varying Pareto-optimal sets (POS) over time. Research on DMOPs has attracted a great interest from academic, due to widespread applications of DMOPs. Recently, a few learning-based approaches have been proposed to predict new solutions in the following environments as an initial population for a multiobjective evolutionary algorithm. In this paper, we propose an alternative learning-based method for DMOPs, a deep multi-layer perceptron-based predictor to generate an initial population for the MOEA in the new environment. The historical optimal solutions are used to train a deep multi-layer perceptron which then predicts a new set of solutions as the initial population in the new environment. The deep multi-layer perceptron is incorporated with the multiobjective evolutionary algorithm based on decomposition to solve DMOPs. Empirical results demonstrate that our proposed algorithm is effective in tracking varying solutions over time and shows great superiority comparing with state-of-the-art methods.


Introduction
Evolutionary dynamic multiobjective optimization (EDMO) has been studied for a few years [1][2][3]. Unlike evolutionary multiobject optimization [4], an EDMO algorithm needs to track the time-varying Pareto-optimal solutions (POS)/Pareto-optimal front (POF) as they change over time [5,6]. Many real-world multiobjective optimization problems are dynamic in nature, e.g., job shop scheduling [7], online optimization of shift strategy for hydro-mechanical continuously variable transmission [8], control of timevarying unstable plants [9]. Evolutionary algorithms (EAs) are still challenged in solving EDMO due to their convergence nature and the problems' changes [10,11]. It is hard to track the time-varying POS or POF for a converged population, due to losing exploration ability [12,13]. A typical strategy is to treat as a new optimization problem when POS or POF changes, whereas reinitializing the converged population can recover EAs' exploration ability [5]. However, reinitializing the population does not make full use of the historical information about exploration, it may not locate the new POS and POF before changing again. In many real-world dynamic multiobjective optimization problems (DMOPs), the objective functions may change over time regularly, rather than randomly [14,15]. There may be some correlations between POS or POF between several consecutive time steps. Therefore, predicting new solutions according to the historical optimal solutions has been widely used for solving DMOPs [16][17][18].
A few prediction models have been proposed to assist EAs for solving DMOPs whose environments change regularly. Considering the determination of parameters in the prediction models, these methods can be loosely categorized into two types: non-learning based and learning based. The nonlearning based methods design a mathematical model whose free parameters are determined experimentally, theoretically or by experience. Linear model, kalman filter [16], differential prediction [19] belong to this kind of method. Zhou et al. [5] proposed a simple linear model to predict a new population which were then perturbed by a Gaussian noise whose variance was estimated according to previous changes.
The linear model only used two previous solutions to compute the moving distance of the adjacent two time steps, and assumed that the solution move in the same distance. Muruganantham et al. [16] proposed to use a Kalman Filter (KF) to track the movement of each solution. In the KF model, the state of a process represented solutions in the time step, and a state-transition matrix represented the motion of solutions. Cao et al. [19,30] proposed a differential model to predict the movement of centroid using its locations in three previous environments. Instead, the learning based methods determine the free parameters by learning from the historical data, which are implemented through some machine learning methods, e.g., linear regression [17], support vector machine [21,22], transfer learning [23][24][25]. Hatzakis and Wallace [6] proposed a forward-looking approach to predict new solutions in new environments, where an autoregressive (AR) model was used for forecasting. Cao et al. [22] proposed to use support vector regression (SVR) as the predictor to generate the initial population in each new environment. Jiang et al. [23] used a transfer learning technique to create the initial population via reusing past knowledge. They assumed that the distributions of solutions for a DMOP in different environments are different but correlated, the different distributions could be mapped into a latent space wherein they were similar.
Most of proposed prediction methods in literature are designed according to some prior knowledge or assumption, e.g., solutions at several successive time steps are linearly correlated. However, it is unknown about the correlations of solutions at different time steps before solving. A linear prediction model can only cover a limited range of DMOPs, it may perform badly if solutions at different time steps are nonlinear correlated or have more complex correlations.
Instead of assuming some specific correlations of historical solutions in DMOPs, we regard the prediction model as a blackbox, no matter what the correlations of historical solutions. Recently, artificial neural networks (ANNs) [26] have gained increasing attention again, since of development of computing power and large scale of data. ANNs theoretically are able to fit any functions among the given data, which could be employed to learn the correlations of historical optimal solutions in solving DMOPs. In addition, ANNs have strong robustness, which could tolerate noise existing in historical optimal solutions.
In this paper, to solve DMOPs, we adopt a branch of ANNs, called multi-layer perceptron (MLP), to construct a prediction model combined with a multiobjective evolutionary algorithm based on Decomposition (MOEA/D) [27,28]. The motivation is that we assume the solutions obtained in consecutive time steps are auto-correlated. In each environment, MOEA/D decomposes a multiobjective optimization problem into several scalar optimization subproblems and optimizers them simultaneously. Then each subproblem obtains its corresponding solutions in different environments to construct a sequence of solutions, which are used to train the MLP-based predictor. The trained predictor then predicts a novel solution in later new environment, which is placed into an initial population for optimizing a novel multiobjective optimization problem in later new environment. The experiments show that our proposed multi-layer perceptron-based multiobjective evolutionary algorithm based on decomposition (MOEA/D-MLP) is effective in tracking dynamic POS/POF, and outperforms the state-of-the-art methods on most of benchmarks.
The remainder of this paper is organized as follows. Section "Background" introduces the background and related work of EDMO. Section "Related works" gives a brief introduction of ANNs and details the proposed algorithm: MOEA/D-MLP. In Sect. "Proposed method", we design the experiments and present the experimental results in Sect. "Experimental design". Section "Result and discussion" gives the conclusion.

Definition of DMOPs
In this paper, we focus on the following dynamic multiobjective optimization problems: subjecttox ∈ where t represents the discrete time instants, x is the decision vector, m is the number of objectives, and represents the decision space. F(x, t) is composed of m time-varying objective functions.

Definition 1 (Pareto solution)
At time t, a solution x 1 ∈ Pareto dominates another solution x 2 ∈ , denoted by x 1 x 2 , if and only if: Definition 2 (Pareto-optimal set) If a solution x * ∈ is said to be nondominated if and only if there is no other solution x ∈ such that x x * . The Pareto-optimal set (POS) is the set of all Pareto-optimal solutions, that is: Definition 3 (Pareto-optimal front) At time t, the Paretooptimal front (POF) is the corresponding objective vectors of the POS: An EDMO algorithm is required to obtain a set of solutions as closely as possible to the POF at any discrete time instant.

Related works
A variety of prediction-based approaches have been proposed to solve DMOPs, which can be loosely categorized into two types: non-learning based and learning based.
Non-learning based: Zhou et al. [5] proposed a simple linear model to predict a new population which were then perturbed by a Gaussian noise whose variance was estimated according to previous changes. The linear model only used two previous solutions to compute the moving distance of the adjacent two time steps, and assumed that the solution move in the same distance. This simple strategy has been employed by many algorithms to associate with other ideas to solve DMOPs. Wu et al. [20] proposed to integrate a local search strategy with the linear model to generate an initial population. Instead of computing each solution's moving distance, Zou et al. [18] proposed to use the center point and knee point to compute the moving distance, and new solutions were predicted based on the movement of center point. Ruan et al. [29] also proposed to use the motion direction of center points, but without involving noise. Jiang and Yang [11] used this linear model to predict the motion direction and movement step-size of centroids of population. Instead of using two former solutions, Cao et al. [19,30] proposed a differential model to predict the movement of centroid using its locations in three previous environments. Muruganantham et al. [16] proposed to use a Kalman Filter (KF) to track the movement of each solution. In the KF model, the state of a process represented solutions in the time step, and a state-transition matrix represented the motion of solutions. They developed a 2-D KF and a 3-D KF model to describe the movement. In the 2-D KF, the sate-transition was the same with the above linear model, and a second-order linear model was used in the 3-D KF pattern. Rambabu et al. [31] proposed a mixture-of-expertsbased ensemble framework to solve DMOPs, in which the framework utilized multiple prediction mechanisms including two variants of KF and a linear model. A gating network was applied to switch these prediction models based on the performance of the predictors at different time steps. Rong et al. [32] proposed a multi-model prediction method for DMOPs, wherein they defined four types of the Pareto Solution change and a method of determining the type of change. For different types of change, they provided different prediction models, e.g., the linear movement of the centroids of solutions for translation change type, the linear movement of the cluster for rotation change type. In addition, Rong et al. [33] presented a multi-directional prediction strategy to enhance the performance of algorithm for DMOPs. The population was clustered into several representative groups by a proposed classification strategy, where the number of clusters was adapted according to the intensity of the environmental change. The evolutionary direction of each individual was estimated based on the movement from its location in the previous environment to the current location. Wang et al. [34] proposed a grey prediction model using the centroid point of each cluster to generate the initial population. Hu et al. [35] divided each individual into two parts: the micro-changing decision and macro-changing decision based on the intensity of environmental change. Ma et al. [15] proposed to predict new locations of center points using a series of center points in different subregions based on a difference model. Wang et al. [36] proposed to use the Gaussian Mixture Model to fit various data distributions for the prediction of the new POS. Liang et al. [37] proposed to divide the decision variables into two and three different groups in static optimization and changes response stages, respectively.
Learning based: Hatzakis and Wallace [6] proposed a forward-looking approach to predict new solutions in new environments, where an autoregressive (AR) model was used for forecasting. In the first 100 time steps, the EA algorithm was execute to collect data for training the AR model, then the fitted model could be used to predict new solutions in the later new time steps. The AR model was employed by Zhou et al. [17] in their algorithm. Instead of predicting some isolated points, the AR model tracked the centroid of solutions in the different time steps. Jiang et al. [23] used a transfer learning technique to create the initial population via reusing past knowledge. They assumed that the distributions of solutions for a DMOP in different environments are different but correlated, the different distributions could be mapped into a latent space wherein they were similar. In DMOPs, the obtained solutions in two successive environments were mapped into the latent space wherein they obeyed the same distribution. In addition, they proposed a new individual transfer-based evolutionary algorithm for DMOPs [38]. The motivation is the negative transfer would guide the searching to a wrong direction. Thus they proposed to find a few high-quality individuals with better diversity to avoid negative transfer caused by individual aggregation. Cao et al. [22] proposed to use support vector regression (SVR) as the predictor to generate the initial population in each new environment. Zhang et al. [39] proposed to use the centroid distance to measure the distance of the population centroid and reference points, which was combined with the transfer learning method to predict the initial population. Zou et al. [40] proposed a reinforcement learning approach for DMOPs, which relocated the individuals based on the severity degree of environmental changes. The motivation of this approach was that the reinforcement learning is effective to learn the optimal behavior by interactions between an agent and dynamic environments. Wang et al. [41] proposed an ensemble learning based prediction strategy to help algorithms reinitialize a new population for DMOPs, including linear model, knee point-based autoregressive model, population-based autoregressive model, and random re-initialization model. Feng et al. [42] proposed to predict the moving of POS via an autoencoding evolutionary search method.

MOEA/D
The algorithm MOEA/D was proposed by Zhang and Li [27] in 2007, which has been a very popular and effective approach for solving MOPs. Instead of using nondominated sorting strategy for handling multiple objective optimization problems, MOEA/D decomposes multiple objective optimization problems into a few single-objective optimization problems by an aggregation function. In this paper, we adopt the Tchebycheff approach to decompose MOPs. Let λ 1 , . . . , λ N be a set of even spread weight vectors, a multiobjective optimization problem at time t can be decomposed into N scalar optimization problems, and the i-th subproblem (i 1, · · · , N ) at time t is given by: x ∈ } (for a minimization problem) for each j 1, . . . , m. MOEA/D minimizes these N subproblems simultaneously in a single run.
In MOEA/D, a neighborhood of weight vector λ i is defined as a set of its several closest weight vectors in λ 1 , · · · , λ N .The neighborhood of the i-th subproblem consists of all the subproblems with the weight vectors from the neighborhood of λ i . A population of N solutions is randomly generated and each solution is randomly allocated to a particular subproblem.

Multi-layer perceptron
Artificial neural networks (ANNs) are well-known datadriven machine learning methodologies, which simulate the neural systems of human brains [43]. These networks can model any linear/nonlinear functions by fitting datasets and generalize to unseen situations, which have been widely used for solving classification and regression problems. Multilayer perceptron (MLP) is the most utilized class of ANNs, which consists of an input layer followed by a few hidden layers and an output layer [44]. Each layer consists of a few nodes, which represent neurons (processing units) with a nonlinear activation functions. Nodes between adjacent layers are connected with weighting values [45].
where I i is the input vector,W l i is the connection weights between I i and the node l; m is the number of input vectors, j is the number of nodes in the layer, β l is the bias of the l th node, and ϕ is an activation function, e.g., the standard logistic sigmoid function, that is: The use of MLPs can be divided into two phases, training and inference. In training phase, a set of training samples is used to determine the weights and bias values of MLPs. In inference phase, a trained MLP model output the result according to the input value. Training the MLP is performed by modifying the weights and bias in successive iterations, such that the error is minimized. The objective of the training is to minimize the mean of squares of the network errors (MSE) [46]: where y * i and y i are the target output and predicted output, respectively, for i-th training iteration and M is the total number of training iterations.

A MLP-based predictor
We build a MLP-based predictor to assist MOEA/D in searching the new Pareto-optimal solutions in the new environment. The predictor learns from the historical optimal solutions and predicts a new set of solutions, which acts as the initial population for MOEA/D. Assuming the historical optimal solutions in the previous environments are denoted as { P 1 , P 2 , · · · , P t }, which are expected to be learned to estimate a new set of solutions P t+1 in the later environment.
Supposing there are N solutions in each set P i (i 1, · · · , t), where each solution is the obtained optimal result of the corresponding subproblem in an environment. In MOEA/D, each solution is associated with its subproblem determined by the weight vector. The weight vectors are generated in the initialization stage, which remains unchanged. Therefore, the solutions, as long as obtained in different environments associated with the same weight vector, can comprise a time-series solutions automatically. Therefore, the historical optimal solutions sets { P 1 , P 2 , · · · , P t } can comprise N time series, i.e. (x 1 i , · · · , x t i ∈ P t , i 1, · · · , N . We assume that each solution has d variables which are independent each other, i.e. v k i x k i, 1 , x k i, 2 , · · · , x k i, d , k 1, 2, · · · , t, i 1, · · · , N . Thus, each time series (x 1 i , x 2 i , · · · , x t i ) T can be further divided into d series of variables,(x 1 i, j , x 2 i, j , · · · , x t i, j ), i 1, 2, · · · , N , j 1, 2, · · · , d, which means that we need build N*d individual prediction models to estimate a new set of solutions. Considering different variables have different correlations in time series, thus each model should be trained separately.
For each sequence of a solution in j-th dimension, (x 1 i, j , x 2 i, j , · · · , x t i, j ), i 1, 2, · · · , N , j 1, 2, · · · , d, we believe that there exists a hidden function to describe the correlation of the sequence. We assume each value is strongly correlated with s preceding ones, thus there is: (9) where f M L P denotes the MLP-based predictor.
For the sequence of the solution, (x 1 i, j , x 2 i, j , · · · , x t i, j ), i 1, 2, · · · , N , j 1, 2, · · · , d, we can get (t-s) training samples like via sliding the time windows forward, which can be used for training the predictor f M L P . Supposing t 8 and s 4, then we would have a sequence of the solution as (x 1 i, j , x 2 i, j , · · · , x 8 i, j ), i 1, 2, · · · , N , j 1, 2, · · · , d, and we can split the sequence as: The pseudo-code of training the MLP-based predictor is given as Algorithm 1.

The framework of MOEA/D-MLP
The whole framework of our dynamic multiobjective optimization algorithm-MOEA/D-MLP consists of three parts: an environmental change detection mechanism, a MLPbased predictor, and a multiobjective optimization algorithm implemented in the static environment. The pseudo-code of MOEA/D-MLP is shown in Algorithm 2, wherein the static multiobjective optimization algorithm is based on MOEA/D-DE. In Line 6, we randomly choose 10% of individuals as sensors to evaluate if the environment changes. If the average objective function of these sensors change over iteration, this is judged as the environmental changing. In Line 10, the MLP-based predictor starts to work from the (s + 2)-th time step, which ensures that at least one sample can provide for training the MLP network.

Benchmark and performance metric
We choose the DF Benchmark [47] as the test suite to evaluate our proposed algorithm, wherein the dynamic characteristics simulate the real-world scenarios of DMOPs from various properties. There are fourteen test problems, including nine biobjective and five triobjective problems. The time step t is defined as t 1/n t * τ /τ t , wherein n t ,τ and τ t represent the change severity, the generation counter, and the change frequency, respectively. In this paper, the severity of change is set as: n t 10, 5, and the frequency of change is set as τ t 5, 10.
To observe the difference between the obtained POF and the real POF in each test suite, the modified Inverted Generational Distance (MIGD) is employed as the performance metric. We first introduce the IGD metric, which is defined as follows: where P O F * is the true POF of a multiobjective optimization problem, P O F is an approximation set of P O F * obtained by a multiobjective optimization algorithm, and n is the number of individuals in the P O F * .
The MIGD is defined as the average of the IGD in a certain number of time steps: where T is the set of time steps in a single run.

Compared algorithms and parameter settings
In this paper, we compare our proposed MOEA/D-MLP with four state-of-the-art algorithms in the empirical studies, including PPS [17], MOEA/D-KF [16], Tr-DMOEA [23] and MOEA/D-SVR [22]. PPS is the first proposed learning-based method for solving DMOPs, which used a linear regression technology to predict new solutions in dynamic environments. MOEA/D-KF is a typical non-learning method to solve DMOPs, which employed a Kalman Filter (KF) model to predict new solutions. Tr-DMOEA is a novel transferlearning method which used a domain adaptation approach to build a prediction model. MOEA/D-SVR is a recently proposed non-linear learning-based method wherein the Support Vector Regression was used as the predictor. To compare fairly, the MOEA/D framework are used as the MOEA algorithm in PPS and Tr-DMOEA. All algorithms use the DE operator and the polynomial mutation operator to generate new solutions. The population size is set as 100 and 300 for biobjective and triobjective problems, respectively. The parameters in the DE operator are set as CR 0.5, F 0.5, and η 20, p m 1/d in the polynomial mutation operator. The number of hidden layers is set to be 10, the number of nodes in the input layer is set to be 4 which represents that the solution in the current environment is correlated with its locations in four previous environments. Since the number of training samples is positively related to the number of environmental changes, online training the MLP-based predictor would be time-consuming beginning from some time steps due to with many training samples. Here we set a hyper parameter to control the number of training samples beginning from the 45-th environmental change, which ensures that no more than 40 samples are used to train the predictor.

Results and discussion
(1) Solution Quality: Tables 1 and 2 present the experimental results of MOEA/D-MLP and the comparing algorithms in terms of MIGD values on DF1-DF14. A smaller value of n t signifies that the benchmark changes in a larger severity, which causes the algorithms to track the changing POS and/or POF much harder. The MIGD values obtained on benchmarks with n t 5 would be worse. A smaller value of τ t implies that the benchmark changes more frequently, the algorithms are given less evaluations to approximate new POS. Therefore, the MIGD values obtained on the problems with τ t 10 would be better. It is clear that MOEA/D-MLP achieves superior performance in terms of the MIGD values against the other methods on most of benchmarks under different dynamic configurations. The POS of DF2 and DF3 change over time with simple dynamic characteristics, and the POF remains stationary over time, but the switch of the position-related variable is challenging that causes the correlations of positions over time are not clear. Tr-DMOEA performs best on DF2 and DF3, instead of assuming any correlations of solutions between the consecutive time steps, it predicts a distribution of solutions. Therefore, Tr-DMOEA is more suitable for solving problems with a stationary POF or similar POF in sequential environments. Since of specific dynamic characteristics of DF6 and DF7, it is hard to approximate POS within the given evaluation budget. Thus the historical solutions can not indicate the hidden correlation between solutions over time steps. All algorithms perform badly comparing with their performance on the other benchmarks. For the triobjective optimization problems DF10-DF14,  " + ", "-" and "~" indicate MOEA/D-MLP performs significantly better or worse than or not significantly different from the corresponding algorithm, respectively.

Table 2
Mean and standard deviations of MIGD results of MOEA/D-MLP and Competitors on DF10-DF14 Benchmarks  " + ", "-" and "~" indicate MOEA/D-MLP performs significantly better or worse than or not significantly different from the corresponding algorithm, respectively our algorithm MOEA/D-MLP performs competitively against the other algorithms. It is worth noting that MLP and SVR are both typical machine learning methods, and the MLP-based predictor performs better than the SVR-based predictor on most of these benchmarks. The better generalization capability of MLP may contribute to the superiority of our predictor. hidden layer. In our MLP-based predictor, the number of nodes in the input layer and hidden layer is set to 4 and 10, respectively. To investigate how MOEA/D-MLP is sensitive to these two parameters, we study four different numbers of nodes in the input layer and hidden layer, respectively. Table 3 presents the results of varying the number of nodes in the input layer of MOEA/D-MLP on some benchmarks, which illustrates that too small or too large number of nodes cannot bring good benefits for our algorithm. A larger number of nodes in the input layer represents the target solution is correlated with many preceding solutions, which does not improve the performance of the MLP-based predictor. Table 4 presents the results of varying the number of nodes in the hidden layer of MOEA/D-MLP on some benchmarks. The performance of MOEA/D-MLP should be improved by with a larger number of nodes in the hidden layer theoretically, which however relies on more iterations for training the network and more training data. Thus, the limited training data and iterations for training on these benchmarks cannot always make our algorithm with larger number of nodes acquire better results.

Conclusion
In this paper, we present a deep multi-layer perceptron based predictor associated with the framework of MOEA/D to solve DMOPs. In several initial environments, MOEA/D collects the approximated solutions as the training data. In the beginning of a new environment, the training data is used to train a deep MLP-based predictor to fit the hidden correlations within the training data, and predict new solutions as the initial population for MOEA/D in this new environment. The motivation of this predictor is to assume there are some hidden correlations between POS in consecutive environments, and we attempt to use a function to describe this correlations. Theoretically, the MLP can fit any function among the given data, which therefore can be used to describe the hidden correlations between the approximated POS by MOEA/D from several environments. Experimental results  demonstrate that our proposed algorithm is effective in tracking varying solutions over time and shows great superiority comparing with state-of-the-art methods. There are still several possible directions for future work. The performance of MLP-based predictor greatly depends on the quality of the historical data, which may result in bad performance if the environment change rapidly or severely.