A special point-based transfer component analysis for dynamic multi-objective optimization

To solve dynamic multi-objective optimization problems better, the key is to adapt quickly to environmental changes and track the possible changing optimal solutions in time. In this paper, we propose a special point-based transfer component analysis for dynamic multi-objective optimization algorithm (SPTr-RM-MEDA). To be specific, when a change occurs, the neighbors of some special points are selected from the optimal set at previous time, and the transfer component analysis makes the use of minimizing the distance between the mapped previous optima and the mapped current optima. Accordingly, the purpose is to predict a part of next initial population from the neighborhoods of special points by transfer component analysis. To adapt to the change well, SPTr-RM-MEDA also reevaluates the previous optimal set. In addition, an adaptive diversity introduction strategy is adopted to maintain the population size. SPTr-RM-MEDA is performed on 12 test problems under 8 kinds of environmental changes, and experimental results show that it is superior to other five state-of-the-art algorithms on most of test problems.


Introduction
Dynamic multi-objective optimization problem (DMOP) refers to a problem containing multiple objectives under a time-varying environment. Every time the environment is changed, the optimization problems may also be changed. In the real world, DMOP exists everywhere, such as dynamic portfolio optimization problem in deregulated electricity markets, hydro-thermal power scheduling problem, training neural networks, and so on [5,11,13]. Different from the static multi-objective optimization problems, the optimal solutions of DMOPs may always change with time. Therefore, the traditional static optimization algorithms are not suitable to deal with DMOPs. Affected by some factors like the severity and frequency of environmental changes, the objective function, B Ruochen Liu ruochenliu@xidian.edu.cn 1 Key Lab of Intelligent Perception and Image Understanding of Ministry of Education, International Center of Intelligent Perception and Computation, Xidian University, Xi'an 710071, China the constraint function, and other parameters of DMOPs will be changed [4]. A good algorithm for DMOPs not only needs a fast convergent rate before the next change happens, but also can maintain the diversity after each change [8], which is still a trade-off to be balanced for dynamic multi-objective optimization.
In recent years, many studies have made great achievements by taking evolutionary algorithms and swarm intelligence algorithms as the common frameworks in DMOPs [1,2,17,18,21]. According to the response strategy after changing, the existing algorithms can be divided into four categories: diversity-based, memory-based, multi-populationbased, and prediction-based method [1].
The diversity-based method introduces the individuals with great difference into a population to preserve diversity after a change occurs. Hypermutation [16] is introduced to make the population tend to diverge by increasing the mutation rate when a change is detected. In 2007, Deb modified the static NSGA-II into a dynamic version (i.e., dynamic NSGA-II) [5]. After a change was detected, part of solutions were introduced in a random way, or they were mutated. To adapt to the change well, the key to maintain diversity is to introduce some solutions different from ones in the populations last time in each iteration. However, the method of how many solution to be replaced is just out.
The memory-based method improves the adaptability by reusing the stored historical information when changes occur. Goh and Tan [8] used a temporary memory method in their proposed algorithm to deal with previous solutions in the archive (an external set used to store non-dominant solutions). However, this method could not make the full use of history information. Later, another algorithm developed an adaptive population management strategy [1] to select the number of random and historical solutions according to the severity of changes. This kind of methods is more suitable for the DMOPs with periodic changes.
The multi-population-based method is to track the change of multiple peaks with several populations at the same time, which is capable of increasing diversity. The self-organizing scouts' method was proposed by Branke et al. [3]. When a peak was found, a sub-population would be generated to monitor the mountain, and the rest of sub-populations would continue to search for new peaks. Li and Yang [15] proposed a multi-population particle swarm optimization algorithm, in which evolutionary programming explored the most promising region in the whole search space. Meanwhile, several sub-populations used the fast particle swarm optimization algorithm to find the local optimal solutions. The difficulty of multi-population method was that the operation of each sub-population needs to coordinate with each other. Therefore, it took more computing time than the single-population method in most situation [7].
The method based on prediction is to design a model by combining the existing historical information with some technologies from machine learning, which can predict a new initial population and adapt to changes better. Hatzakis and Wallace [9] adopted a prediction method based on the forward-looking model to predict the optimal solutions after an environmental change. Zhou et al. [24] proposed an algorithm based on population prediction strategy, which takes into account the distribution characteristic of optima in objective space, and establishes time series on center points of population. Jiang et al. [13] applied the transfer learning into DMOP. The main shortcoming of prediction-based method is that the solutions sampled from prediction models will be adopted into the next population no matter whether they will help the population evolve toward the right direction or not.
To use historical information and construct a suitable prediction model, this paper proposes a new special point-based transfer component analysis (TCA) for dynamic multiobjective optimization algorithm (SPTr-RM-MEDA). The main contributions can be given as follows: (1) To predict the solutions closer to the potential optimal sets in target domain using TCA, it is necessary to select some special points in the source domain which are preferably approximately uniformly distributed at the Pareto fronts. Therefore, we select the five kinds of special points and their neighborhoods to build the prediction model using TCA. (2) The non-dominated set at previous time are reevaluated using the objective functions at current time, and then, those non-dominated solution at current time will go into the next generation directly together with those predicted by TCA. (3) Moreover, if the number of solutions gotten by the two strategies above is not enough, the adaptive diversity introduction strategy is acted as supplement by introducing remain number of individuals which are generated randomly by Gaussian mutation. (4) This automatic adaptation mechanism is combined with RM-MEDA and experimental results show that SPTr-RM-MEDA is superior to other five typical dynamic multi-objective evolutionary algorithms.
The rest of this paper is organized as follows. Section 2 introduces the related work. Section 3 elaborates the framework of the proposed algorithm in detail. Section 4 shows the experimental settings, comparative experimental results, and analysis. Section 5 gives a conclusion and discusses the potential research work in the future.

Dynamic multi-objective optimization problem
Generally, a DMOP can be defined as follows: where t is the time variable, x = (x 1 , x 2 , . . . , x n ) T is a vector including n decision variables, is the search space for decision variables, g (x, t) and h (x, t) are the equality and inequality constraints, respectively. f (x, t) is an Mdimension objective vector. The obtained optimal solutions to Eq. (1) are called Pareto fronts (PFs) in the objective space and Pareto solutions (PSs) in the decision space.
According to the different characteristics of change, DMOPs can be divided into the following four types [6]: -Type I: the PS is changed, the PF remains unchanged. -Type II: both the PS and the PF are changed.
-Type III: the PF is changed, but the PS remains unchanged. -Type IV: both the PS and PF remain unchanged.

Transfer component analysis
TCA mainly aims at the problem of domain adaptive learning in the process of transfer learning [12,19]. It reuses the knowledge obtained from the source domain and performs tasks in the target domain, which is related to but different from the source domain. Because the domains are in different distributions, the data from them will be mapped together into a reproducing kernel Hilbert space (RKHS) H [20], where the distance between the source and target is minimized and their internal attributes are retained to the maximum extent. Intuitively, it is not easy to minimize the distance between them in the current space, so a mapping that makes them closest in the space is necessary to establish. TCA is designed to calculate distances more universal and simpler.
TCA [12] assumes that the marginal distribution of the source domain and the target domain is different, i.e., P(X s ) = P(Y t ), where X s and Y t represent the source domain and the target domain, respectively. It assumes that there is a feature mapping φ, so that P(φ(X s )) ≈ P(φ(Y t )). Maximum mean discrepancy (MMD) is used to calculate the distance, and is given as follows: where X s = {x 1 , x 2 , . . . , x n 1 }, and Y t = {y 1 , y 2 , . . . , y n 2 }. X s and Y t represent φ(X s ) and φ(Y t ), respectively. φ (·) : X → H is the domain mapped in RKHS. To facilitate the calculation in a easier way, TCA introduces a kernel matrix K and a coefficient matrix L as follows: where K u,v is a kernel matrix. Taking K s,t as an example, each element k i, j = k x i , y j = φ(x i ) T φ(y j ). K s,s , K t,s , and K t,t have the similar process to be calculated. Thus, Eq. (2) above can be changed into the following form: where tr(·) is the trace of matrix. Here, Eq. (2) can be transformed into a semi-definite programming problem as Eq. (4).
To reduce the computation cost, let ϕ(x) = W T k x ∈ R d , and then the MMD can be calculated in a dimensional reduction way, where W is an (n 1 + n 2 ) × d (d n 1 + n 2 ) weight matrix of samples, k x = [k(x 1 , x), . . . , k(x n 1 , x), k(y 1 , x), . . . , k(y n 2 , x)] T , and the kernel function K can be converted as the following formula: where K is a symmetric matrix and, therefore, K T = K .
After sorting out the above formulas, TCA can be expressed as where H = I n 1 +n 2 − 1 n 1 +n 2 11 T is a central matrix to maintain the different data characteristics, I is the identity matrix, and 1 is an all-ones matrix. For solving the above optimization problem, the second regularization term of Eq. (6) is added by the Lagrange duality to get W . It can be concluded that the solution of W is the first m eigenvectors.
In conclusion, TCA can be concluded as follows. The inputs are two matrices of source and target domain, respectively, which are two populations generated randomly from the previous and current time. We first calculate L and H matrices, select a common kernel function for mapping such as linear kernel and Gaussian kernel, and then calculate K . Finally, the output is W , which is constructed by the first m eigenvectors of (K L K + μI ) −1 K H K .

Regularity model-based multi-objective estimation of distribution algorithm
When the environmental change is not detected, the static algorithm is utilized for seeking the optimal solutions at current time. In 2008, the regularity model-based multiobjective estimation of distribution algorithm (RM-MEDA) was proposed by Zhang [22], which is as the static optimization algorithm after each change in this paper.
RM-MEDA makes the advantage of the regularity of PS. In other words, the distribution of PS in the feasible region is an (M − 1)-dimension manifold, where M is the number of objectives. To be more exact, the shape of parent population from PS is mapped into (M − 1)-dimension linear space, RM-MEDA algorithm divides the parent population into K classes, and each class forms a linear subspace. The process is as follows: first, K linear Spaces are generated by dividing population into K sub-populations randomly. And then calculate the distance between each point in parent population and centers from different linear subspace, and cluster sub-populations again by the minimum distance. After dividing points and calculating the new centers, principal component analysis is used to update the new linear subspace. The process is repeated until each class is no longer changed. As for sampling, when linear fitting above is over, the upper and lower bounds of each linear subspace need to be demarcated. Then, the probability of the new solution in the truncated linear subspace is based on the ratio of the volume of each truncated linear space to the total volume. A new solution is obtained by selecting a manifold with this probability, sampling on the manifold uniformly, and adding noise. The offspring population can be generated by repeating the sampling process.

Overall flowchart of SPTr-RM-MEDA
This section introduces the proposed SPTr-RM-MEDA. Figure 1 shows the diagram of the proposed algorithm. When a change is detected, a mechanism of automatic adaptation to change would generate the initial population under new environment, using TCA with special points and an adaptive diversity introduction. And then, the initial population is optimized by RM-MEDA [22]. Compared with NSGA-II, RM-MEDA showed more advantages with TCA in experiments [13]. The pseudo-code of SPTr-RM-MEDA is shown in Algorithm 1, and the N Pop sp and N Pop non are the size of Pop sp and Pop non , respectively.

Input:
A DMOP; the termination condition: Ter; the population size: N ;a kernel function.

Output:
Approximation of PS: X ; Approximation of PF: F(X

Prediction strategy
Special points can outline the PF for most multi-objective problems. Five kinds of special points are introduced in this paper as follows. The boundary point [23] refers to the point of minimum value of each object in the objective space for a minimization problem. Its mathematical expression is given as follows: where M is the number of objectives and B represents the set of boundary points. To be more specific, there are two or three boundary points in two-objective or three-objective optimization, respectively. The knee point [23] has the maximum marginal regression rate in the objective space. In the other word, it can be defined as the point farthest from the vertical distance of hyperplane S determined by boundary points. The dimension of hyperplane varies which is equal to the dimension of objective space minus one in this paper. For example, if the number of objectives M is 2 or 3, the dimension of S is 1 or 2, respectively.
For ∀x ∈ S, the equation The knee point can be calculated by maximizing the vertical distance from the point X to the hyperplane S where X represents those points out of hyperplane S, and P stands for the non-dominated set at previous time t. If the point X is on the hyperplane S, the vertical distance is 0.
The point closest to the boundary (PCTB) can be understood as the point with the smallest distance to the sum of all boundary points. The formula is presented in the following: where B j is the boundary point, and N p is the number of non-dominated points X i . For the minimization problem, the ideal point is composed of the minimum values in each dimension of the objective space. However, in general, the ideal point does not exist in the non-dominated set. E = [E 1 , E 2 , . . . , E M ] T is the ideal point, which is given in the following: where P represents the non-dominated set at previous time t.
The center point usually refers to the center of all points in the non-dominated set. The calculation formula of the center point is shown as follows: where P is the non-dominated set. These special points and their neighbor points will be found from the PS and PF at previous time to form special point populations, which will participate in predicting points by TCA model in this paper. Because the center point or the ideal point in Eqs. (11) and (12) does not exist in the nondominated set, the closest point from the non-dominated set P to either of them in the objective space is chosen as the actual center or ideal point, respectively. All the neighbor points are selected based on the Euclidean distance to the actual special points in the objective space. Note that the size of each neighborhood is 10. And only the neighborhoods of special points are used for predicting by TCA in the following strategy.
When the environmental change is detected, the prediction part is mainly based on the special points in the non-dominated set at previous moment. Each special point and its neighborhood will be partly sampled, and then, the sampling set will be acquired through TCA as a part of initial population at next time. In TCA prediction method, the objective values of individuals at t time are in the source domain, and that at t + 1 time is in the target domain. When establishing the mapping relationship, the individuals in new environment can be predicted by the transfer learning of the sampled special points' population. The prediction strategy based on TCA with special points is given in detail in Algorithm 2. At Step 1, the points on the neighborhood refer to closet to the special point and from the non-dominated set. In Steps 6 and 7, l represents the individuals from special points' neighborhoods S Pop.

Input:
The objective function: F(·); PF at time t; a kernel function: k (· , ·); the sampling rate: γ s ; the size of population: N .

Output:
Pop sp at time t + 1.
Step1: Calculate the special points, and the sampling size around each special point's neighborhood is γ s N . Then partial samplings are generated and denoted as S Pop.
Step2: Randomly generate two sets of initial solutions X s and Y t , and then calculate objective function values F t (X s ) and F t+1 (Y t ) at time t and t + 1, respectively.
Step3: Use TCA on F t (X s ) and F t+1 (Y t ) to determine the mapping relationship and obtain the matrix W by Eq. (6).
Step4: Map the population S Pop at time t into high-dimension space by k (· , ·).
Step5: Use ϕ(x) = W T k x to calculate the characteristic mapping of Pop sp in high-dimension space, ϕ(F t (l)) = W T k l , l ∈ S Pop.
Step6: ϕ(F t (l)) will search one by one for mapping solutions at t + 1, and ϕ(F t+1 (x)) is the closest to it. x ← arg min Step7: The solution obtained in Step 6 is the predicted initial population Pop sp .

Adaptive diversity introduction strategy
Because the number of boundary points is related to the objective dimension, the number of special points in two or three objectives is 6 or 7, respectively, so the size of total partial sampling (denoted as N Pop sp ) is 6 × γ s × N or 7 × γ s × N , respectively. And, the non-dominated set at t time are reevaluated using the objective functions at t + 1 time, and then, those non-dominated solutions (the size is denoted as N Pop non ) will go into the next generation together with those predicted by TCA.

Experimental settings
To test the performance of SPTr-RM-MEDA, three types of DMOPs [10] were selected, as shown in Table 1. What's more, the time parameter in Eq.(1) is t = τ/τ t /n t , where n t , τ T and τ t are the severity of change, the maximum number of iterations, and frequency of change, respectively, and τ is from τ T /20 to τ T , there are totally 20 environment changes. The parameters of SPTr-RM-MEDA, SPTr-NSGA-II, Tr-RM-MEDA, and Tr-NSGA-II are presented in the following: the population size N is 200; the sample rate γ s is 0.05. In TCA, the kernel function is a Gaussian kernel function, the expected dimension value m is 20, and the trade-off parameter μ is set at 0.5. Our algorithm was performed on 12 kinds of test functions under 8 groups of environmental changes, which is denoted as C1-C8, and different environmental parameter settings are shown in Table 2.

Performance metrics
MIGD [24] is used to calculate the average inverted generational distance (IGD) value in different time, and it is adopted  to evaluate the performance in this paper. The definition is shown as follows: where T means a set of discrete time points in a run. P F t is Pareto optimal points set distributed uniformly in PF, P t represents the approximated PF obtained by algorithms, d (v, P t ) = min u∈P t F(v) − F(u) represents the distance between the point v from P F t and P t . The parameter C is a symbol for eight environments, ranging from C1 to C8 shown in Table 2.
To evaluate algorithms under different environments, DMIGD is defined based on the MIGD, calculated by [13] DMIGD(P t , where E means a set of different environment, and |E| is the number of different environment in experiments. In our experiments, |E| is equal to 8.

Comparison with other algorithms
To prove the universal applicability of the proposed SPTr-RM-MEDA, we compared it with Tr-RM-MEDA [13], DNSGA-II-A [5], DNSGA-II-B [5], and IT-DMOEA [14]. The population size and the maximum of iterations are the same with SPTr-RM-MEDA to make a fair comparison. The test functions are consistent with Table 1. Here, we choose the average MIGD of each problem under 8 different environments and DMIGD of each problem as the performance metrics. Note that "++" represents that SPTr-RM-MEDA performs better than this algorithm except for the best one in boldface. The experimental result for comparison is shown in Table 3.
As can be seen, the experimental results on 9 out of 12 groups of problems show that SPTr-RM-MEDA has better performance than DNSGA-II-A and DNSGA-II-B. More than half of the results of SPTr-RM-MEDA have the higher MIGD than the transfer learning algorithms like Tr-RM-MEDA and IT-DMOEA. And for type I and type II problems, SPTr-RM-MEDA is superior to DNSGA-II-A and DNSGA-II-B for most problems, especially for a series of DMOP. For type III problems (i.e., the complex HE7 and HE9), although our algorithm is better than DNSGA-II-A and DNSGA-II-B, it has poor convergence compared with Tr-RM-MEDA and IT-DMOEA. However, from the DMIGD value, SPTr-RM-MEDA takes up the first with 5 out of 12 best, and IT-DMOEA has 4 out of 12.

Analysis of key technologies in SPTr-RM-MEDA
To figure out how the other static algorithm and the proposed automatic adaptation to change work, we compare Tr-RM-MEDA and Tr-NSGA-II [13] with SPTr-RM-MEDA and SPTr-NSGA-II in experiments, including MIGD metric in Table 4 and the CPU running time in Table 5. SPTr-NSGA-II with NSGA-II has the only difference with SPTr-RM-MEDA. In Table 4, on the comparison between two static algorithms, including RM-MEDA and NSGA-II. From Tr-RM-MEDA and Tr-NSGA-II, or SPTr-RM-MEDA and SPTr-NSGA-II, there is not doubt that RM-MEDA is a good optimizer on the contribution of convergence and diversity than NSGA-II.
Otherwise, when Tr-NSGA-II is compared with SPTr-NSGA-II and Tr-RM-MEDA is compared with SPTr-RM-MEDA, it can still be seen that the automatic adaptation to change we proposed improves the efficiency of TCA model in Table 4. SPTr-RM-MEDA performs well on almost half of problems under different environments. This is because the PFs of these problems are changed significantly, and SPTr-RM-MEDA can accurately find the special points, improving the convergence rate of the algorithm.
Furthermore, in terms of experimental cost, CPU running time metric is adopted to show the efficiency of automatic adaptation to change. Here, we give the average CPU time under the 8 changes for each test function. The results are shown in Table 5. SPTr-RM-MEDA and SPTr-NSGA-II have less time complexity and faster convergent speed, which greatly saves the calculation cost. According to SPTr-RM-MEDA and SPTr-NSGA-II, almost all functions run half as fast. Especially for DMOP2 and FDA5, the running time is shortened by more than 70%.
To confirm the efficiency of automatic adaptation to change by another way, Tr-RM-MEDA is used to compare with SPTr-RM-MEDA in Fig. 2. Take the environmental setting C3 as an example, the two-objective PFs under 5 different environment changes are shown. The parameter T is the environment change varying from 1 to 20. The green dots are the true PFs, and the red dots are the PFs from the chosen algorithm. Altogether, the SPTr-RM-MEDA converges better than the Tr-RM-MEDA. However, there are still some problems with complex or static PFs like DIMP2, HE7, and HE9, the PFs of which are far away from the true PFs.
Furthermore, to confirm the convergence of three-objective PFs, SPTr-RM-MEDA is compared with Tr-RM-MEDA under 4 environment changes from C3 in Fig. 3. The blue dots are the true PFs, and the red star-like dots are the PFs from algorithm. Apart from the problem FDA4, the PFs from SPTr-RM-MEDA are more close to the true PFs than Tr-RM-MEDA. Because the true PFs are unchanged on FDA4, the difference between two algorithms is unclear.

Analysis of the size of neighborhood for each special point
The size of neighborhood of special points is an important coefficient in SPTr-RM-MEDA. In this section, the size ranges from 5 to 15 on DMOP2 iso problem under 8 different environments, so as to confirm the effects of different size. The experimental result shows the MIGD of them in Table  6.
From Table 6, it is obvious that the size of neighborhood has no effect on the MIGD value under different environments. And the larger size of neighborhood is, the more time need for computing. Hence, we chose the middle scale of the size like 10.

Conclusion
In this paper, SPTr-RM-MEDA based on special points and TCA prediction strategy was proposed for DMOPs. The main improvements focused on what the strategy could be undertook after each change happened, which is called as automatic adaptation to change. To improve the convergence, after partly sampling on special points and their neighborhoods, these points are used for transfer learning prediction. Meanwhile, the diversity strategy is introduced adaptively to balance the convergence and diversity. These strategies are prepared for the initial population, which acts as the input of static algorithm when the environment is changed.
The statistical results show that the proposed SPTr-RM-MEDA is very competitive in DMOPs compared with the other five representative algorithms, especially for type I and II problems. And, comparative experiments have shown that RM-MEDA and a mechanism of automatic adaptation to change contributed greatly to the proposed algorithm on different parts. The convergent speed of the population is accelerated with the automatic adaptation to change working from the perspective of CPU running time.
However, the presented work still needs improvements in the future. In terms of the selection of transfer learning methods and the diversity of introduction strategies, we can conduct a further research on the type III of DMOPs in our future work. That is still a question how to react when PS is not changed. And every time, the change happens, generating initial points by a more various and appropriate way needs to be researched.